From 4d1d4fae0a985cb8ca7c73653d1e745d67490aa4 Mon Sep 17 00:00:00 2001 From: gongjy <2474590974@qq.com> Date: Wed, 28 Aug 2024 16:50:40 +0800 Subject: [PATCH] update readme format --- README.md | 13 +++++-------- README_en.md | 10 +++++----- 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 89f593b..863dd98 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,10 @@ +
+

"大道至简"

+
+
中文 | [English](./README_en.md) @@ -16,12 +20,6 @@
-

- - “大道至简”
-
-

- * 本开源项目旨在完全从0开始,训练出仅为26M大小的微型语言模型**MiniMind**。 * **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$,力求做到CPU也可快速推理甚至训练。 * **MiniMind**改进自DeepSeek-V2、Llama3结构,项目包含整个数据处理、pretrain、sft、dpo的全部阶段,包含混合专家(MoE)模型。 @@ -182,8 +180,7 @@ python 2-eval.py --- -- -📙【Pretrain数据】:[seq-monkey通用文本数据集](https://github.com/mobvoi/seq-monkey-data/blob/main/docs/pretrain_open_corpus.md) +- 📙【Pretrain数据】:[seq-monkey通用文本数据集](https://github.com/mobvoi/seq-monkey-data/blob/main/docs/pretrain_open_corpus.md) 是由多种公开来源的数据(如网页、百科、博客、开源代码、书籍等)汇总清洗而成。 整理成统一的JSONL格式,并经过了严格的筛选和去重,确保数据的全面性、规模、可信性和高质量。 总量大约在10B token,适合中文大语言模型的预训练。 diff --git a/README_en.md b/README_en.md index 7801608..ead94e3 100644 --- a/README_en.md +++ b/README_en.md @@ -8,17 +8,17 @@ [![Collection](https://img.shields.io/badge/🤗-MiniMind%20%20Collection-blue)](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5) + +
+

"The Greatest Path is the Simplest"

+
+
[中文](./README.md) | English
-

- - "The Greatest Path is the Simplest"
-
-

* This open-source project aims to train a miniature language model **MiniMind** from scratch, with a size of just 26MB. * **MiniMind** is extremely lightweight, approximately $\frac{1}{7000}$ the size of GPT-3, designed to enable fast