update readme

This commit is contained in:
gongjy 2024-10-05 00:35:54 +08:00
parent 1b864453fa
commit eb875da306
2 changed files with 70 additions and 44 deletions

View File

@ -28,7 +28,10 @@
* 本开源项目旨在完全从0开始最快仅用3小时即可训练出仅为26M大小的微型语言模型**MiniMind**。
* **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$力求做到最普通的个人GPU也可快速推理甚至训练。
* **MiniMind**改进自DeepSeek-V2、Llama3结构项目包含整个数据处理、pretrain、sft、dpo的全部阶段包含混合专家(MoE)模型。
* 这是一个既是开源项目又是入门LLM教程同时也是一个初具雏形的开源模型希望能起到抛砖引玉的作用。
* 这不仅是一个开源模型的实现也是入门大语言模型LLM的教程。
* 希望此项目能为研究者提供一个抛砖引玉的入门示例帮助大家快速上手并对LLM领域产生更多的探索与创新。
> 为防止误读「最快3小时」是指您需要具备本人硬件配置的机器具体规格的详细信息将在下文提供。
---
@ -53,7 +56,7 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
直接从0开始训练一个极其轻量的语言模型。
> [!TIP]
> 截至2024-9-17minimind训练了3个型号模型最小仅需26M0.02B),即可具备流畅的对话能力!
> 截至2024-9-17MiniMind系列已完成了3个型号模型的预训练最小仅需26M0.02B),即可具备流畅的对话能力!
| 模型 (大小) | tokenizer长度 | 推理占用 | release | 主观评分(/100 |
|-------------------------|-------------|--------|------------|------------|
@ -61,7 +64,7 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
| minimind-v1-moe (4×26M) | 6400 | 1.0 GB | 2024.09.17 | 55' |
| minimind-v1 (108M) | 6400 | 1.0 GB | 2024.09.01 | 60' |
> 该分析在一个带有Torch 2.1.2、CUDA 12.2和Flash Attention 2的RTX 3090 GPU上运行。
> 该分析在具有Torch 2.1.2、CUDA 12.2和Flash Attention 2的2×RTX 3090 GPU上进行。
@ -77,10 +80,19 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
### 👉**最近更新**
<details close>
<summary> <b>2024-10-05 (newest 🎉)</b> </summary>
- 为MiniMind拓展了多模态能力之---视觉
- 移步孪生项目[minimind-v](https://github.com/jingyaogong/minimind-v)查看详情!
</details>
<details close>
<summary> <b>2024-09-27</b> </summary>
- 👉09-27更新pretrain数据集的预处理方式为了保证文本完整性放弃预处理成.bin训练的形式轻微牺牲训练速度
- 09-27更新pretrain数据集的预处理方式为了保证文本完整性放弃预处理成.bin训练的形式轻微牺牲训练速度
- 目前pretrain预处理后的文件命名为pretrain_data.csv。
@ -119,6 +131,13 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
仅是我个人的软硬件环境配置,自行酌情更改:
```bash
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
内存128 GB
显卡NVIDIA GeForce RTX 3090(24GB) * 2
环境python 3.9 + Torch 2.1.2 + DDP单机多卡训练
```
* Ubuntu == 20.04
* Python == 3.9
* Pytorch == 2.1.2
@ -182,17 +201,18 @@ streamlit run fast_inference.py
* 2.1 下载[数据集下载地址](#数据集下载地址)放到`./dataset`目录下
* 2.2 `python data_process.py`处理数据集例如pretrain数据提前进行token-encoder、sft数据集抽离qa到csv文件
* 2.3 在`./model/LMConfig.py` 中调整model的参数配置
* 2.4 `python 1-pretrain.py` 执行预训练
* 2.5 `python 3-full_sft.py` 执行指令微调
> 这里仅需调整dim和n_layers和use_moe参数分别是`(512+8)``(768+16)`,对应于`minimind-v1-small``minimind-v1`
* 2.4 `python 1-pretrain.py` 执行预训练,得到 `pretrain_*.pth` 作为预训练的输出权重
* 2.5 `python 3-full_sft.py` 执行指令微调,得到 `full_sft_*.pth` 作为指令微调的输出权重
* 2.6 `python 4-lora_sft.py` 执行lora微调非必须
* 2.7 `python 5-dpo_train.py` 执行DPO人类偏好强化学习对齐非必须
* 3、测试模型推理效果
* 确保需要使用的,训练完成的参数权重位于`./out/`目录下
* 也可以直接去[训练完成的模型权重](#训练完成的模型权重)下载使用我训练好的
* 确保需要使用的,训练完成的参数权重`*.pth`文件位于`./out/`目录下
* 也可以直接去[训练完成的模型权重](#训练完成的模型权重)下载使用我训练好的`*.pth`权重文件
```text
out
minimind/out
├── multi_chat
│   ├── full_sft_512.pth
│   ├── full_sft_512_moe.pth
@ -211,26 +231,26 @@ streamlit run fast_inference.py
🍭 【Tip】预训练和全参微调pretrain和full_sft均支持多卡加速
* 单机N卡启动训练(DDP)
```bash
torchrun --nproc_per_node N 1-pretrain.py
# and
torchrun --nproc_per_node N 3-full_sft.py
```
* 单机N卡启动训练(DeepSpeed)
```bash
deepspeed --master_port 29500 --num_gpus=N 1-pretrain.py
# and
deepspeed --master_port 29500 --num_gpus=N 3-full_sft.py
```
* 单机N卡启动训练(DDP)
```bash
torchrun --nproc_per_node N 1-pretrain.py
# and
torchrun --nproc_per_node N 3-full_sft.py
```
* 单机N卡启动训练(DeepSpeed)
```bash
deepspeed --master_port 29500 --num_gpus=N 1-pretrain.py
# and
deepspeed --master_port 29500 --num_gpus=N 3-full_sft.py
```
* 记录训练过程
```bash
torchrun --nproc_per_node N 1-pretrain.py --use_wandb
# and
python 1-pretrain.py --use_wandb
```
通过添加`--use_wandb`参数可以记录训练过程训练完成后可以在wandb网站上查看训练过程。通过修改`wandb_project``wandb_run_name`参数,可以指定项目名称和运行名称。
* 记录训练过程
```bash
torchrun --nproc_per_node N 1-pretrain.py --use_wandb
# and
python 1-pretrain.py --use_wandb
```
通过添加`--use_wandb`参数可以记录训练过程训练完成后可以在wandb网站上查看训练过程。通过修改`wandb_project``wandb_run_name`参数,可以指定项目名称和运行名称。
# 📌 Data sources
@ -345,13 +365,6 @@ minimind目前训练的模型版本见下表
# 📌 Experiment
```bash
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
内存128 GB
显卡NVIDIA GeForce RTX 3090(24GB) * 2
环境python 3.9 + Torch 2.1.2 + DDP多卡训练
```
| Model Name | params | len_vocab | batch_size | pretrain_time | sft_single_time | sft_multi_time |
|-------------------|--------|-----------|------------|-------------------|-------------------|---------------------|
| minimind-v1-small | 26M | 6400 | 64 | ≈2 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |

View File

@ -31,8 +31,10 @@
inference and even training on CPUs.
* **MiniMind** is an improvement on the DeepSeek-V2 and Llama3 architectures. The project includes all stages of data
processing, pretraining, SFT, and DPO, and features a Mixture of Experts (MoE) model.
* This project is not only an open-source initiative but also a beginner's tutorial for LLMs, and serves as a nascent
open-source model with the hope of inspiring further development.
* This is not only the implementation of an open-source model, but also a tutorial for getting started with large language models (LLMs).
* We hope that this project serves as a stepping stone for researchers and developers, providing an introductory example to help them quickly get started and foster more exploration and innovation in the LLM field.
> To avoid any misunderstanding, "fastest 3 hours" refers to the requirement of using hardware with higher specifications than the author's setup. Detailed specifications will be provided below.
---
@ -84,6 +86,15 @@ We hope this open-source project helps LLM beginners get started quickly!
### 👉**Recent Updates**
<details close>
<summary> <b>2024-10-05 (newest 🎉)</b> </summary>
- Added visual capabilities to MiniMind-V(ision)
- Check out the twin project [minimind-v](https://github.com/jingyaogong/minimind-v) for more details!
</details>
<details close>
<summary> <b>2024-09-27</b> </summary>
@ -127,6 +138,14 @@ We hope this open-source project helps LLM beginners get started quickly!
These are my personal software and hardware environment configurations. Please adjust according to your own setup:
```bash
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
Memory: 128 GB
GPU: NVIDIA GeForce RTX 3090 (24GB) * 2
Environment: python 3.9 + Torch 2.1.2 + DDP multi-GPU training
```
* Ubuntu == 20.04
* Python == 3.9
* Pytorch == 2.1.2
@ -380,12 +399,6 @@ shown in the table below:
# 📌 Experiment
```bash
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
Memory: 128 GB
GPU: NVIDIA GeForce RTX 3090 (24GB) * 2
Environment: python 3.9 + Torch 2.1.2 + DDP multi-GPU training
```
| Model Name | params | len_vocab | batch_size | pretrain_time | sft_single_time | sft_multi_time |
|-------------------|--------|-----------|------------|-------------------|-------------------|---------------------|