update readme

This commit is contained in:
gongjy 2024-10-09 09:12:55 +08:00
parent 4542ecf858
commit 5dbd6174b3
2 changed files with 13 additions and 14 deletions

View File

@ -25,9 +25,9 @@
</div>
* 本开源项目旨在完全从0开始最快仅用3小时即可训练出仅为26M大小的微型语言模型**MiniMind**。
* **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$力求做到最普通的个人GPU也可快速推理甚至训练。
* **MiniMind**改进自DeepSeek-V2、Llama3结构项目包含整个数据处理、pretrain、sft、dpo的全部阶段包含混合专家(MoE)模型
* 本开源项目旨在完全从0开始最快仅用3小时即可训练出仅为26.88M大小的微型语言模型**MiniMind**。
* **MiniMind**极其轻量,最小版本体积约是 GPT3 的 $\frac{1}{7000}$力求做到最普通的个人GPU也可快速推理甚至训练。
* **MiniMind**发布了大模型极简结构,数据集清洗和预处理、监督预训练(Pretrain)、有监督指令微调(SFT)、低秩自适应(LoRA)微调,无奖励强化学习直接偏好对齐(DPO)的全阶段代码,也包含拓展共享混合专家(MoE)的稀疏模型拓展视觉多模态VLM: [MiniMind-V](https://github.com/jingyaogong/minimind-v)
* 这不仅是一个开源模型的实现也是入门大语言模型LLM的教程。
* 希望此项目能为研究者提供一个抛砖引玉的入门示例帮助大家快速上手并对LLM领域产生更多的探索与创新。

View File

@ -26,18 +26,17 @@
</div>
* This open-source project aims to train a miniature language model **MiniMind** from scratch, with a size of just 26MB.
* **MiniMind** is extremely lightweight, approximately $\frac{1}{7000}$ the size of GPT-3, designed to enable fast
inference and even training on CPUs.
* **MiniMind** is an improvement on the DeepSeek-V2 and Llama3 architectures. The project includes all stages of data
processing, pretraining, SFT, and DPO, and features a Mixture of Experts (MoE) model.
* This is not only the implementation of an open-source model, but also a tutorial for getting started with large
language models (LLMs).
* We hope that this project serves as a stepping stone for researchers and developers, providing an introductory example
to help them quickly get started and foster more exploration and innovation in the LLM field.
* This open-source project aims to train a tiny language model called **MiniMind** from scratch in just 3 hours, with a model size of only 26.88M.
> To avoid any misunderstanding, "fastest 3 hours" refers to the requirement of using hardware with higher
specifications than the author's setup. Detailed specifications will be provided below.
* **MiniMind** is extremely lightweight, with the smallest version being approximately $\frac{1}{7000}$ the size of GPT3, making it possible for even an ordinary personal GPU to perform quick inference and even training.
* **MiniMind** provides the full-stage code for a simplified large model structure, dataset cleaning and preprocessing, supervised pretraining, supervised instruction fine-tuning (SFT), low-rank adaptation (LoRA) fine-tuning, and direct preference alignment with reinforcement learning without rewards (DPO). It also includes code for expanding to sparse models with mixed experts (MoE) and multi-modal vision language models (VLM): [MiniMind-V](https://github.com/jingyaogong/minimind-v).
* This is not just an implementation of an open-source model but also a tutorial for getting started with large language models (LLM).
* We hope this project will serve as an introductory example for researchers, helping them quickly get started and inspiring more exploration and innovation in the LLM field.
> To avoid misinterpretation, "fastest 3 hours" means you need a machine with hardware configuration superior to mine. Detailed specifications will be provided below.
---