update readme
This commit is contained in:
parent
1b864453fa
commit
eb875da306
85
README.md
85
README.md
@ -28,7 +28,10 @@
|
||||
* 本开源项目旨在完全从0开始,最快仅用3小时!即可训练出仅为26M大小的微型语言模型**MiniMind**。
|
||||
* **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$,力求做到最普通的个人GPU也可快速推理甚至训练。
|
||||
* **MiniMind**改进自DeepSeek-V2、Llama3结构,项目包含整个数据处理、pretrain、sft、dpo的全部阶段,包含混合专家(MoE)模型。
|
||||
* 这是一个既是开源项目,又是入门LLM教程,同时也是一个初具雏形的开源模型,希望能起到抛砖引玉的作用。
|
||||
* 这不仅是一个开源模型的实现,也是入门大语言模型(LLM)的教程。
|
||||
* 希望此项目能为研究者提供一个抛砖引玉的入门示例,帮助大家快速上手并对LLM领域产生更多的探索与创新。
|
||||
|
||||
> 为防止误读,「最快3小时」是指您需要具备>本人硬件配置的机器,具体规格的详细信息将在下文提供。
|
||||
|
||||
---
|
||||
|
||||
@ -53,7 +56,7 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
|
||||
直接从0开始训练一个极其轻量的语言模型。
|
||||
|
||||
> [!TIP]
|
||||
> (截至2024-9-17)minimind训练了3个型号模型,最小仅需26M(0.02B),即可具备流畅的对话能力!
|
||||
> (截至2024-9-17)MiniMind系列已完成了3个型号模型的预训练,最小仅需26M(0.02B),即可具备流畅的对话能力!
|
||||
|
||||
| 模型 (大小) | tokenizer长度 | 推理占用 | release | 主观评分(/100) |
|
||||
|-------------------------|-------------|--------|------------|------------|
|
||||
@ -61,7 +64,7 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
|
||||
| minimind-v1-moe (4×26M) | 6400 | 1.0 GB | 2024.09.17 | 55' |
|
||||
| minimind-v1 (108M) | 6400 | 1.0 GB | 2024.09.01 | 60' |
|
||||
|
||||
> 该分析在一个带有Torch 2.1.2、CUDA 12.2和Flash Attention 2的RTX 3090 GPU上运行。
|
||||
> 该分析在具有Torch 2.1.2、CUDA 12.2和Flash Attention 2的2×RTX 3090 GPU上进行。
|
||||
|
||||
|
||||
|
||||
@ -77,10 +80,19 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
|
||||
|
||||
### 👉**最近更新**
|
||||
|
||||
<details close>
|
||||
<summary> <b>2024-10-05 (newest 🎉)</b> </summary>
|
||||
|
||||
- 为MiniMind拓展了多模态能力之---视觉
|
||||
|
||||
- 移步孪生项目[minimind-v](https://github.com/jingyaogong/minimind-v)查看详情!
|
||||
|
||||
</details>
|
||||
|
||||
<details close>
|
||||
<summary> <b>2024-09-27</b> </summary>
|
||||
|
||||
- 👉09-27更新pretrain数据集的预处理方式,为了保证文本完整性,放弃预处理成.bin训练的形式(轻微牺牲训练速度)。
|
||||
- 09-27更新pretrain数据集的预处理方式,为了保证文本完整性,放弃预处理成.bin训练的形式(轻微牺牲训练速度)。
|
||||
|
||||
- 目前pretrain预处理后的文件命名为:pretrain_data.csv。
|
||||
|
||||
@ -119,6 +131,13 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
|
||||
|
||||
仅是我个人的软硬件环境配置,自行酌情更改:
|
||||
|
||||
```bash
|
||||
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
||||
内存:128 GB
|
||||
显卡:NVIDIA GeForce RTX 3090(24GB) * 2
|
||||
环境:python 3.9 + Torch 2.1.2 + DDP单机多卡训练
|
||||
```
|
||||
|
||||
* Ubuntu == 20.04
|
||||
* Python == 3.9
|
||||
* Pytorch == 2.1.2
|
||||
@ -182,17 +201,18 @@ streamlit run fast_inference.py
|
||||
* 2.1 下载[数据集下载地址](#数据集下载地址)放到`./dataset`目录下
|
||||
|
||||
* 2.2 `python data_process.py`处理数据集,例如pretrain数据提前进行token-encoder、sft数据集抽离qa到csv文件
|
||||
|
||||
|
||||
* 2.3 在`./model/LMConfig.py` 中调整model的参数配置
|
||||
* 2.4 `python 1-pretrain.py` 执行预训练
|
||||
* 2.5 `python 3-full_sft.py` 执行指令微调
|
||||
> 这里仅需调整dim和n_layers和use_moe参数,分别是`(512+8)`或`(768+16)`,对应于`minimind-v1-small`和`minimind-v1`
|
||||
* 2.4 `python 1-pretrain.py` 执行预训练,得到 `pretrain_*.pth` 作为预训练的输出权重
|
||||
* 2.5 `python 3-full_sft.py` 执行指令微调,得到 `full_sft_*.pth` 作为指令微调的输出权重
|
||||
* 2.6 `python 4-lora_sft.py` 执行lora微调(非必须)
|
||||
* 2.7 `python 5-dpo_train.py` 执行DPO人类偏好强化学习对齐(非必须)
|
||||
* 3、测试模型推理效果
|
||||
* 确保需要使用的,训练完成的参数权重位于`./out/`目录下
|
||||
* 也可以直接去[训练完成的模型权重](#训练完成的模型权重)下载使用我训练好的
|
||||
* 确保需要使用的,训练完成的参数权重`*.pth`文件位于`./out/`目录下
|
||||
* 也可以直接去[训练完成的模型权重](#训练完成的模型权重)下载使用我训练好的`*.pth`权重文件
|
||||
```text
|
||||
out
|
||||
minimind/out
|
||||
├── multi_chat
|
||||
│ ├── full_sft_512.pth
|
||||
│ ├── full_sft_512_moe.pth
|
||||
@ -211,26 +231,26 @@ streamlit run fast_inference.py
|
||||
|
||||
🍭 【Tip】预训练和全参微调pretrain和full_sft均支持多卡加速
|
||||
|
||||
* 单机N卡启动训练(DDP)
|
||||
```bash
|
||||
torchrun --nproc_per_node N 1-pretrain.py
|
||||
# and
|
||||
torchrun --nproc_per_node N 3-full_sft.py
|
||||
```
|
||||
* 单机N卡启动训练(DeepSpeed)
|
||||
```bash
|
||||
deepspeed --master_port 29500 --num_gpus=N 1-pretrain.py
|
||||
# and
|
||||
deepspeed --master_port 29500 --num_gpus=N 3-full_sft.py
|
||||
```
|
||||
* 单机N卡启动训练(DDP)
|
||||
```bash
|
||||
torchrun --nproc_per_node N 1-pretrain.py
|
||||
# and
|
||||
torchrun --nproc_per_node N 3-full_sft.py
|
||||
```
|
||||
* 单机N卡启动训练(DeepSpeed)
|
||||
```bash
|
||||
deepspeed --master_port 29500 --num_gpus=N 1-pretrain.py
|
||||
# and
|
||||
deepspeed --master_port 29500 --num_gpus=N 3-full_sft.py
|
||||
```
|
||||
|
||||
* 记录训练过程
|
||||
```bash
|
||||
torchrun --nproc_per_node N 1-pretrain.py --use_wandb
|
||||
# and
|
||||
python 1-pretrain.py --use_wandb
|
||||
```
|
||||
通过添加`--use_wandb`参数,可以记录训练过程,训练完成后,可以在wandb网站上查看训练过程。通过修改`wandb_project`和`wandb_run_name`参数,可以指定项目名称和运行名称。
|
||||
* 记录训练过程
|
||||
```bash
|
||||
torchrun --nproc_per_node N 1-pretrain.py --use_wandb
|
||||
# and
|
||||
python 1-pretrain.py --use_wandb
|
||||
```
|
||||
通过添加`--use_wandb`参数,可以记录训练过程,训练完成后,可以在wandb网站上查看训练过程。通过修改`wandb_project`和`wandb_run_name`参数,可以指定项目名称和运行名称。
|
||||
|
||||
# 📌 Data sources
|
||||
|
||||
@ -345,13 +365,6 @@ minimind目前训练的模型版本见下表:
|
||||
|
||||
# 📌 Experiment
|
||||
|
||||
```bash
|
||||
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
||||
内存:128 GB
|
||||
显卡:NVIDIA GeForce RTX 3090(24GB) * 2
|
||||
环境:python 3.9 + Torch 2.1.2 + DDP多卡训练
|
||||
```
|
||||
|
||||
| Model Name | params | len_vocab | batch_size | pretrain_time | sft_single_time | sft_multi_time |
|
||||
|-------------------|--------|-----------|------------|-------------------|-------------------|---------------------|
|
||||
| minimind-v1-small | 26M | 6400 | 64 | ≈2 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |
|
||||
|
29
README_en.md
29
README_en.md
@ -31,8 +31,10 @@
|
||||
inference and even training on CPUs.
|
||||
* **MiniMind** is an improvement on the DeepSeek-V2 and Llama3 architectures. The project includes all stages of data
|
||||
processing, pretraining, SFT, and DPO, and features a Mixture of Experts (MoE) model.
|
||||
* This project is not only an open-source initiative but also a beginner's tutorial for LLMs, and serves as a nascent
|
||||
open-source model with the hope of inspiring further development.
|
||||
* This is not only the implementation of an open-source model, but also a tutorial for getting started with large language models (LLMs).
|
||||
* We hope that this project serves as a stepping stone for researchers and developers, providing an introductory example to help them quickly get started and foster more exploration and innovation in the LLM field.
|
||||
|
||||
> To avoid any misunderstanding, "fastest 3 hours" refers to the requirement of using hardware with higher specifications than the author's setup. Detailed specifications will be provided below.
|
||||
|
||||
---
|
||||
|
||||
@ -84,6 +86,15 @@ We hope this open-source project helps LLM beginners get started quickly!
|
||||
|
||||
### 👉**Recent Updates**
|
||||
|
||||
<details close>
|
||||
<summary> <b>2024-10-05 (newest 🎉)</b> </summary>
|
||||
|
||||
- Added visual capabilities to MiniMind-V(ision)
|
||||
|
||||
- Check out the twin project [minimind-v](https://github.com/jingyaogong/minimind-v) for more details!
|
||||
|
||||
</details>
|
||||
|
||||
<details close>
|
||||
<summary> <b>2024-09-27</b> </summary>
|
||||
|
||||
@ -127,6 +138,14 @@ We hope this open-source project helps LLM beginners get started quickly!
|
||||
|
||||
These are my personal software and hardware environment configurations. Please adjust according to your own setup:
|
||||
|
||||
|
||||
```bash
|
||||
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
||||
Memory: 128 GB
|
||||
GPU: NVIDIA GeForce RTX 3090 (24GB) * 2
|
||||
Environment: python 3.9 + Torch 2.1.2 + DDP multi-GPU training
|
||||
```
|
||||
|
||||
* Ubuntu == 20.04
|
||||
* Python == 3.9
|
||||
* Pytorch == 2.1.2
|
||||
@ -380,12 +399,6 @@ shown in the table below:
|
||||
|
||||
# 📌 Experiment
|
||||
|
||||
```bash
|
||||
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
||||
Memory: 128 GB
|
||||
GPU: NVIDIA GeForce RTX 3090 (24GB) * 2
|
||||
Environment: python 3.9 + Torch 2.1.2 + DDP multi-GPU training
|
||||
```
|
||||
|
||||
| Model Name | params | len_vocab | batch_size | pretrain_time | sft_single_time | sft_multi_time |
|
||||
|-------------------|--------|-----------|------------|-------------------|-------------------|---------------------|
|
||||
|
Loading…
x
Reference in New Issue
Block a user