update readme
This commit is contained in:
parent
4542ecf858
commit
5dbd6174b3
@ -25,9 +25,9 @@
|
||||
|
||||
</div>
|
||||
|
||||
* 本开源项目旨在完全从0开始,最快仅用3小时!即可训练出仅为26M大小的微型语言模型**MiniMind**。
|
||||
* **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$,力求做到最普通的个人GPU也可快速推理甚至训练。
|
||||
* **MiniMind**改进自DeepSeek-V2、Llama3结构,项目包含整个数据处理、pretrain、sft、dpo的全部阶段,包含混合专家(MoE)模型。
|
||||
* 本开源项目旨在完全从0开始,最快仅用3小时!即可训练出仅为26.88M大小的微型语言模型**MiniMind**。
|
||||
* **MiniMind**极其轻量,最小版本体积约是 GPT3 的 $\frac{1}{7000}$,力求做到最普通的个人GPU也可快速推理甚至训练。
|
||||
* **MiniMind**发布了大模型极简结构,数据集清洗和预处理、监督预训练(Pretrain)、有监督指令微调(SFT)、低秩自适应(LoRA)微调,无奖励强化学习直接偏好对齐(DPO)的全阶段代码,也包含拓展共享混合专家(MoE)的稀疏模型;拓展视觉多模态VLM: [MiniMind-V](https://github.com/jingyaogong/minimind-v)。
|
||||
* 这不仅是一个开源模型的实现,也是入门大语言模型(LLM)的教程。
|
||||
* 希望此项目能为研究者提供一个抛砖引玉的入门示例,帮助大家快速上手并对LLM领域产生更多的探索与创新。
|
||||
|
||||
|
21
README_en.md
21
README_en.md
@ -26,18 +26,17 @@
|
||||
|
||||
</div>
|
||||
|
||||
* This open-source project aims to train a miniature language model **MiniMind** from scratch, with a size of just 26MB.
|
||||
* **MiniMind** is extremely lightweight, approximately $\frac{1}{7000}$ the size of GPT-3, designed to enable fast
|
||||
inference and even training on CPUs.
|
||||
* **MiniMind** is an improvement on the DeepSeek-V2 and Llama3 architectures. The project includes all stages of data
|
||||
processing, pretraining, SFT, and DPO, and features a Mixture of Experts (MoE) model.
|
||||
* This is not only the implementation of an open-source model, but also a tutorial for getting started with large
|
||||
language models (LLMs).
|
||||
* We hope that this project serves as a stepping stone for researchers and developers, providing an introductory example
|
||||
to help them quickly get started and foster more exploration and innovation in the LLM field.
|
||||
* This open-source project aims to train a tiny language model called **MiniMind** from scratch in just 3 hours, with a model size of only 26.88M.
|
||||
|
||||
> To avoid any misunderstanding, "fastest 3 hours" refers to the requirement of using hardware with higher
|
||||
specifications than the author's setup. Detailed specifications will be provided below.
|
||||
* **MiniMind** is extremely lightweight, with the smallest version being approximately $\frac{1}{7000}$ the size of GPT3, making it possible for even an ordinary personal GPU to perform quick inference and even training.
|
||||
|
||||
* **MiniMind** provides the full-stage code for a simplified large model structure, dataset cleaning and preprocessing, supervised pretraining, supervised instruction fine-tuning (SFT), low-rank adaptation (LoRA) fine-tuning, and direct preference alignment with reinforcement learning without rewards (DPO). It also includes code for expanding to sparse models with mixed experts (MoE) and multi-modal vision language models (VLM): [MiniMind-V](https://github.com/jingyaogong/minimind-v).
|
||||
|
||||
* This is not just an implementation of an open-source model but also a tutorial for getting started with large language models (LLM).
|
||||
|
||||
* We hope this project will serve as an introductory example for researchers, helping them quickly get started and inspiring more exploration and innovation in the LLM field.
|
||||
|
||||
> To avoid misinterpretation, "fastest 3 hours" means you need a machine with hardware configuration superior to mine. Detailed specifications will be provided below.
|
||||
|
||||
---
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user