update readme
This commit is contained in:
parent
eb875da306
commit
e4b8789d8c
15
README.md
15
README.md
@ -188,14 +188,16 @@ streamlit run fast_inference.py
|
|||||||
|
|
||||||
# 📌 Quick Start
|
# 📌 Quick Start
|
||||||
|
|
||||||
* 0、环境安装
|
* 0、克隆项目代码
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/jingyaogong/minimind.git & cd minimind
|
||||||
|
```
|
||||||
|
|
||||||
|
* 1、环境安装
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
|
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||||
```
|
```
|
||||||
* 1、克隆项目代码
|
|
||||||
```text
|
|
||||||
git clone https://github.com/jingyaogong/minimind.git
|
|
||||||
```
|
|
||||||
* 2、如果你需要自己训练
|
* 2、如果你需要自己训练
|
||||||
|
|
||||||
* 2.1 下载[数据集下载地址](#数据集下载地址)放到`./dataset`目录下
|
* 2.1 下载[数据集下载地址](#数据集下载地址)放到`./dataset`目录下
|
||||||
@ -250,7 +252,8 @@ streamlit run fast_inference.py
|
|||||||
# and
|
# and
|
||||||
python 1-pretrain.py --use_wandb
|
python 1-pretrain.py --use_wandb
|
||||||
```
|
```
|
||||||
通过添加`--use_wandb`参数,可以记录训练过程,训练完成后,可以在wandb网站上查看训练过程。通过修改`wandb_project`和`wandb_run_name`参数,可以指定项目名称和运行名称。
|
通过添加`--use_wandb`参数,可以记录训练过程,训练完成后,可以在wandb网站上查看训练过程。通过修改`wandb_project`
|
||||||
|
和`wandb_run_name`参数,可以指定项目名称和运行名称。
|
||||||
|
|
||||||
# 📌 Data sources
|
# 📌 Data sources
|
||||||
|
|
||||||
|
88
README_en.md
88
README_en.md
@ -31,10 +31,13 @@
|
|||||||
inference and even training on CPUs.
|
inference and even training on CPUs.
|
||||||
* **MiniMind** is an improvement on the DeepSeek-V2 and Llama3 architectures. The project includes all stages of data
|
* **MiniMind** is an improvement on the DeepSeek-V2 and Llama3 architectures. The project includes all stages of data
|
||||||
processing, pretraining, SFT, and DPO, and features a Mixture of Experts (MoE) model.
|
processing, pretraining, SFT, and DPO, and features a Mixture of Experts (MoE) model.
|
||||||
* This is not only the implementation of an open-source model, but also a tutorial for getting started with large language models (LLMs).
|
* This is not only the implementation of an open-source model, but also a tutorial for getting started with large
|
||||||
* We hope that this project serves as a stepping stone for researchers and developers, providing an introductory example to help them quickly get started and foster more exploration and innovation in the LLM field.
|
language models (LLMs).
|
||||||
|
* We hope that this project serves as a stepping stone for researchers and developers, providing an introductory example
|
||||||
|
to help them quickly get started and foster more exploration and innovation in the LLM field.
|
||||||
|
|
||||||
> To avoid any misunderstanding, "fastest 3 hours" refers to the requirement of using hardware with higher specifications than the author's setup. Detailed specifications will be provided below.
|
> To avoid any misunderstanding, "fastest 3 hours" refers to the requirement of using hardware with higher
|
||||||
|
specifications than the author's setup. Detailed specifications will be provided below.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -77,7 +80,8 @@ The project includes:
|
|||||||
- Public MiniMind model code (including Dense and MoE models), code for Pretrain, SFT instruction fine-tuning, LoRA
|
- Public MiniMind model code (including Dense and MoE models), code for Pretrain, SFT instruction fine-tuning, LoRA
|
||||||
fine-tuning, and DPO preference optimization, along with datasets and sources.
|
fine-tuning, and DPO preference optimization, along with datasets and sources.
|
||||||
- Compatibility with popular frameworks such as `transformers`, `accelerate`, `trl`, and `peft`.
|
- Compatibility with popular frameworks such as `transformers`, `accelerate`, `trl`, and `peft`.
|
||||||
- Training support for single-GPU and multi-GPU setups(DDP、DeepSpeed), Use wandb to visualize the training process. The training process allows for stopping and resuming at any point.
|
- Training support for single-GPU and multi-GPU setups(DDP、DeepSpeed), Use wandb to visualize the training process. The
|
||||||
|
training process allows for stopping and resuming at any point.
|
||||||
- Code for testing the model on the Ceval dataset.
|
- Code for testing the model on the Ceval dataset.
|
||||||
- Implementation of a basic chat interface compatible with OpenAI's API, facilitating integration into third-party Chat
|
- Implementation of a basic chat interface compatible with OpenAI's API, facilitating integration into third-party Chat
|
||||||
UIs (such as FastGPT, Open-WebUI, etc.).
|
UIs (such as FastGPT, Open-WebUI, etc.).
|
||||||
@ -98,7 +102,8 @@ We hope this open-source project helps LLM beginners get started quickly!
|
|||||||
<details close>
|
<details close>
|
||||||
<summary> <b>2024-09-27</b> </summary>
|
<summary> <b>2024-09-27</b> </summary>
|
||||||
|
|
||||||
- 👉Updated the preprocessing method for the pretrain dataset on 09-27 to ensure text integrity, opting to abandon the preprocessing into .bin training format (slightly sacrificing training speed).
|
- 👉Updated the preprocessing method for the pretrain dataset on 09-27 to ensure text integrity, opting to abandon the
|
||||||
|
preprocessing into .bin training format (slightly sacrificing training speed).
|
||||||
|
|
||||||
- The current filename for the pretrain data after preprocessing is: pretrain_data.csv.
|
- The current filename for the pretrain data after preprocessing is: pretrain_data.csv.
|
||||||
|
|
||||||
@ -138,7 +143,6 @@ We hope this open-source project helps LLM beginners get started quickly!
|
|||||||
|
|
||||||
These are my personal software and hardware environment configurations. Please adjust according to your own setup:
|
These are my personal software and hardware environment configurations. Please adjust according to your own setup:
|
||||||
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
||||||
Memory: 128 GB
|
Memory: 128 GB
|
||||||
@ -197,22 +201,19 @@ The project has been deployed to ModelScope makerspace, where you can experience
|
|||||||
|
|
||||||
# 📌 Quick Start
|
# 📌 Quick Start
|
||||||
|
|
||||||
*
|
* 0.Clone the project code
|
||||||
0. Install the required dependencies
|
|
||||||
|
```text
|
||||||
|
git clone https://github.com/jingyaogong/minimind.git & cd minimind
|
||||||
|
```
|
||||||
|
|
||||||
|
* 1.Install the required dependencies
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
*
|
* 2.If you need to train the model yourself
|
||||||
1. Clone the project code
|
|
||||||
|
|
||||||
```text
|
|
||||||
git clone https://github.com/jingyaogong/minimind.git
|
|
||||||
```
|
|
||||||
|
|
||||||
*
|
|
||||||
2. If you need to train the model yourself
|
|
||||||
|
|
||||||
* 2.1 Download the [dataset download link](#dataset-download-links) and place it in the `./dataset` directory.
|
* 2.1 Download the [dataset download link](#dataset-download-links) and place it in the `./dataset` directory.
|
||||||
|
|
||||||
@ -225,8 +226,7 @@ git clone https://github.com/jingyaogong/minimind.git
|
|||||||
* 2.6 Perform LoRA fine-tuning (optional) with `python 4-lora_sft.py`.
|
* 2.6 Perform LoRA fine-tuning (optional) with `python 4-lora_sft.py`.
|
||||||
* 2.7 Execute DPO human preference reinforcement learning alignment (optional) with `python 5-dpo_train.py`.
|
* 2.7 Execute DPO human preference reinforcement learning alignment (optional) with `python 5-dpo_train.py`.
|
||||||
|
|
||||||
*
|
* 3.Test model inference performance
|
||||||
3. Test model inference performance
|
|
||||||
|
|
||||||
* Ensure that the required trained parameter weights are located in the `./out/` directory.
|
* Ensure that the required trained parameter weights are located in the `./out/` directory.
|
||||||
* You can also directly download and use the trained model weights
|
* You can also directly download and use the trained model weights
|
||||||
@ -270,7 +270,9 @@ git clone https://github.com/jingyaogong/minimind.git
|
|||||||
# and
|
# and
|
||||||
python 1-pretrain.py --use_wandb
|
python 1-pretrain.py --use_wandb
|
||||||
```
|
```
|
||||||
By adding the `--use_wandb` parameter, you can record the training process. After training is complete, you can view the training process on the wandb website. You can specify the project name and run name by modifying the `wandb_project` and `wandb_run_name` parameters.
|
By adding the `--use_wandb` parameter, you can record the training process. After training is complete, you can view
|
||||||
|
the training process on the wandb website. You can specify the project name and run name by modifying
|
||||||
|
the `wandb_project` and `wandb_run_name` parameters.
|
||||||
|
|
||||||
# 📌 Data sources
|
# 📌 Data sources
|
||||||
|
|
||||||
@ -399,7 +401,6 @@ shown in the table below:
|
|||||||
|
|
||||||
# 📌 Experiment
|
# 📌 Experiment
|
||||||
|
|
||||||
|
|
||||||
| Model Name | params | len_vocab | batch_size | pretrain_time | sft_single_time | sft_multi_time |
|
| Model Name | params | len_vocab | batch_size | pretrain_time | sft_single_time | sft_multi_time |
|
||||||
|-------------------|--------|-----------|------------|-------------------|-------------------|---------------------|
|
|-------------------|--------|-----------|------------|-------------------|-------------------|---------------------|
|
||||||
| minimind-v1-small | 26M | 6400 | 64 | ≈2 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |
|
| minimind-v1-small | 26M | 6400 | 64 | ≈2 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |
|
||||||
@ -505,7 +506,7 @@ better with the scaling law for small models.
|
|||||||
[baidu](https://pan.baidu.com/s/1KUfSzEkSXYbCCBj0Pw-9fA?pwd=6666)
|
[baidu](https://pan.baidu.com/s/1KUfSzEkSXYbCCBj0Pw-9fA?pwd=6666)
|
||||||
|
|
||||||
| Model Name | params | Config | pretrain_model | single_sft_model | multi_sft_model |
|
| Model Name | params | Config | pretrain_model | single_sft_model | multi_sft_model |
|
||||||
|-------------------|--------|-----------------------------|-----------------------------------------------------------------|----------------------------------------------------------------|----------------------------------------------------------------|
|
|-------------------|--------|-----------------------------|-----------------------------------------------------------------|-----------------------------------------------------------------|-----------------------------------------------------------------|
|
||||||
| minimind-v1-small | 26M | d_model=512<br/>n_layers=8 | [URL](https://pan.baidu.com/s/1wP_cAIc8cgaJ6CxUmR9ECQ?pwd=6666) | [URL](https://pan.baidu.com/s/1_COe0FQRDmeapSsvArahCA?pwd=6666) | [URL](https://pan.baidu.com/s/1GsGsWSL0Dckl0YPRXiBIFQ?pwd=6666) |
|
| minimind-v1-small | 26M | d_model=512<br/>n_layers=8 | [URL](https://pan.baidu.com/s/1wP_cAIc8cgaJ6CxUmR9ECQ?pwd=6666) | [URL](https://pan.baidu.com/s/1_COe0FQRDmeapSsvArahCA?pwd=6666) | [URL](https://pan.baidu.com/s/1GsGsWSL0Dckl0YPRXiBIFQ?pwd=6666) |
|
||||||
| minimind-v1-moe | 4×26M | d_model=512<br/>n_layers=8 | [URL](https://pan.baidu.com/s/1IZdkzPRhbZ_bSsRL8vInjg?pwd=6666) | [URL](https://pan.baidu.com/s/1tqB-GMvuiGQBvEl-yZ-oBw?pwd=6666) | [URL](https://pan.baidu.com/s/1GHJ2T4904EcT1u8l1rVqtg?pwd=6666) |
|
| minimind-v1-moe | 4×26M | d_model=512<br/>n_layers=8 | [URL](https://pan.baidu.com/s/1IZdkzPRhbZ_bSsRL8vInjg?pwd=6666) | [URL](https://pan.baidu.com/s/1tqB-GMvuiGQBvEl-yZ-oBw?pwd=6666) | [URL](https://pan.baidu.com/s/1GHJ2T4904EcT1u8l1rVqtg?pwd=6666) |
|
||||||
| minimind-v1 | 108M | d_model=768<br/>n_layers=16 | [URL](https://pan.baidu.com/s/1B60jYo4T8OmJI0ooqsixaA?pwd=6666) | [URL](https://pan.baidu.com/s/1p713loS7EfwHQf3G9eYI3Q?pwd=6666) | [URL](https://pan.baidu.com/s/12iHGpAs6R0kqsOnGtgK6vQ?pwd=6666) |
|
| minimind-v1 | 108M | d_model=768<br/>n_layers=16 | [URL](https://pan.baidu.com/s/1B60jYo4T8OmJI0ooqsixaA?pwd=6666) | [URL](https://pan.baidu.com/s/1p713loS7EfwHQf3G9eYI3Q?pwd=6666) | [URL](https://pan.baidu.com/s/12iHGpAs6R0kqsOnGtgK6vQ?pwd=6666) |
|
||||||
@ -618,14 +619,26 @@ better with the scaling law for small models.
|
|||||||
|
|
||||||
## 👉 Summary of Effects
|
## 👉 Summary of Effects
|
||||||
|
|
||||||
* The ranking of the minimind series (ABC) aligns with intuition, with minimind-v1(0.1B) scoring the highest, and its responses to common sense questions are mostly error-free and free of hallucinations.
|
* The ranking of the minimind series (ABC) aligns with intuition, with minimind-v1(0.1B) scoring the highest, and its
|
||||||
|
responses to common sense questions are mostly error-free and free of hallucinations.
|
||||||
* Surprisingly, minimind-v1-small(0.02B), with only 26M parameters, can perform nearly as well as minimind-v1(0.1B).
|
* Surprisingly, minimind-v1-small(0.02B), with only 26M parameters, can perform nearly as well as minimind-v1(0.1B).
|
||||||
* minimind-v1(0.1B) underwent less than 2 epochs of SFT (Supervised Fine-Tuning) due to being prematurely killed to free up resources for smaller models. Despite not being fully trained, it still achieved the best performance, demonstrating that larger models generally outperform smaller ones.
|
* minimind-v1(0.1B) underwent less than 2 epochs of SFT (Supervised Fine-Tuning) due to being prematurely killed to
|
||||||
* minimind-v1-moe(0.1B) performed only slightly better than minimind-v1-small(0.02B), also due to early termination to free up resources for other training. However, the MoE (Mixture of Experts) model, with its sparse multi-Experts mode, requires more training epochs to fully activate and train all FFN (Feed-Forward Network) layer experts. In the current setup with 3 epochs, the training is not yet sufficient.
|
free up resources for smaller models. Despite not being fully trained, it still achieved the best performance,
|
||||||
Early experiments with minimind on the Yi-Tokenizer showed that a fully trained MoE version could outperform dense small models visibly. This aspect may need to be reserved for future training and updates to v2 and v3 versions when more server resources are available.
|
demonstrating that larger models generally outperform smaller ones.
|
||||||
|
* minimind-v1-moe(0.1B) performed only slightly better than minimind-v1-small(0.02B), also due to early termination
|
||||||
|
to free up resources for other training. However, the MoE (Mixture of Experts) model, with its sparse
|
||||||
|
multi-Experts mode, requires more training epochs to fully activate and train all FFN (Feed-Forward Network) layer
|
||||||
|
experts. In the current setup with 3 epochs, the training is not yet sufficient.
|
||||||
|
Early experiments with minimind on the Yi-Tokenizer showed that a fully trained MoE version could outperform dense
|
||||||
|
small models visibly. This aspect may need to be reserved for future training and updates to v2 and v3 versions
|
||||||
|
when more server resources are available.
|
||||||
|
|
||||||
* The responses from Model E appear to be quite good to the naked eye, although there are occasional instances of hallucinations and fabrications. However, both GPT-4o and Deepseek's evaluations consistently noted that it "provides overly verbose and repetitive information, and contains hallucinations."
|
* The responses from Model E appear to be quite good to the naked eye, although there are occasional instances of
|
||||||
This evaluation seems somewhat strict, as even a small number of hallucinated words in a 100-word response can easily result in a low score. Given that Model E was pre-trained on longer texts and a larger dataset, its responses appear more comprehensive. In models of similar size, both the quantity and quality of the data are crucial.
|
hallucinations and fabrications. However, both GPT-4o and Deepseek's evaluations consistently noted that it "provides
|
||||||
|
overly verbose and repetitive information, and contains hallucinations."
|
||||||
|
This evaluation seems somewhat strict, as even a small number of hallucinated words in a 100-word response can easily
|
||||||
|
result in a low score. Given that Model E was pre-trained on longer texts and a larger dataset, its responses appear
|
||||||
|
more comprehensive. In models of similar size, both the quantity and quality of the data are crucial.
|
||||||
|
|
||||||
> 🙋♂️ Personal Subjective Evaluation: E>C>B≈A>D
|
> 🙋♂️ Personal Subjective Evaluation: E>C>B≈A>D
|
||||||
|
|
||||||
@ -759,16 +772,22 @@ your model with third-party UIs, such as fastgpt, OpenWebUI, etc.
|
|||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> If you find `MiniMind` helpful, please give us a ⭐ on GitHub.<br/>
|
> If you find `MiniMind` helpful, please give us a ⭐ on GitHub.<br/>
|
||||||
> Given the length and the limitations of our expertise, there may be errors. We welcome discussions and corrections in the Issues section.<br/>
|
> Given the length and the limitations of our expertise, there may be errors. We welcome discussions and corrections in
|
||||||
|
> the Issues section.<br/>
|
||||||
> Your support is the driving force behind our continuous improvement of the project!
|
> Your support is the driving force behind our continuous improvement of the project!
|
||||||
|
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> An individual's resources, energy, and time are limited, so we encourage everyone to participate and contribute collectively. If you have trained model weights, you are welcome to share them in the Discussions or Issues sections.<br/>
|
> An individual's resources, energy, and time are limited, so we encourage everyone to participate and contribute
|
||||||
> These models can be new versions of MiniMind tailored for specific downstream tasks or vertical domains (such as sentiment recognition, healthcare, psychology, finance, legal Q&A, etc.).<br/>
|
> collectively. If you have trained model weights, you are welcome to share them in the Discussions or Issues
|
||||||
> They can also be new versions of MiniMind models that have undergone extended training, exploring longer text sequences, larger volumes (such as 0.1B+), or more extensive datasets.<br/>
|
> sections.<br/>
|
||||||
|
> These models can be new versions of MiniMind tailored for specific downstream tasks or vertical domains (such as
|
||||||
|
> sentiment recognition, healthcare, psychology, finance, legal Q&A, etc.).<br/>
|
||||||
|
> They can also be new versions of MiniMind models that have undergone extended training, exploring longer text
|
||||||
|
> sequences, larger volumes (such as 0.1B+), or more extensive datasets.<br/>
|
||||||
> Each contribution is unique, and all attempts are valuable and encouraged.<br/>
|
> Each contribution is unique, and all attempts are valuable and encouraged.<br/>
|
||||||
> Any shared contributions will be promptly recognized and compiled in the acknowledgments list. Thank you once again for everyone's support!
|
> Any shared contributions will be promptly recognized and compiled in the acknowledgments list. Thank you once again
|
||||||
|
> for everyone's support!
|
||||||
|
|
||||||
## 🤝[Contributors](https://github.com/jingyaogong/minimind/graphs/contributors)
|
## 🤝[Contributors](https://github.com/jingyaogong/minimind/graphs/contributors)
|
||||||
|
|
||||||
@ -817,7 +836,6 @@ your model with third-party UIs, such as fastgpt, OpenWebUI, etc.
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
|
||||||
## 🫶Supporter
|
## 🫶Supporter
|
||||||
|
|
||||||
<a href="https://github.com/jingyaogong/minimind/stargazers">
|
<a href="https://github.com/jingyaogong/minimind/stargazers">
|
||||||
@ -842,8 +860,6 @@ your model with third-party UIs, such as fastgpt, OpenWebUI, etc.
|
|||||||
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=jingyaogong/minimind&type=Date"/>
|
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=jingyaogong/minimind&type=Date"/>
|
||||||
</picture>
|
</picture>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# License
|
# License
|
||||||
|
|
||||||
This repository is licensed under the [Apache-2.0 License](LICENSE).
|
This repository is licensed under the [Apache-2.0 License](LICENSE).
|
Loading…
x
Reference in New Issue
Block a user