update

2025-05-13 15:05:56 +08:00 · 2025-05-13 15:05:56 +08:00 · f31e17030c
commit f31e17030c
parent 7ba51b8571
28 changed files with 192 additions and 1853 deletions
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@ -1,128 +0,0 @@
-# Contributor Covenant Code of Conduct
-
-## Our Pledge
-
-We as members, contributors, and leaders pledge to make participation in our
-community a harassment-free experience for everyone, regardless of age, body
-size, visible or invisible disability, ethnicity, sex characteristics, gender
-identity and expression, level of experience, education, socio-economic status,
-nationality, personal appearance, race, religion, or sexual identity
-and orientation.
-
-We pledge to act and interact in ways that contribute to an open, welcoming,
-diverse, inclusive, and healthy community.
-
-## Our Standards
-
-Examples of behavior that contributes to a positive environment for our
-community include:
-
-* Demonstrating empathy and kindness toward other people
-* Being respectful of differing opinions, viewpoints, and experiences
-* Giving and gracefully accepting constructive feedback
-* Accepting responsibility and apologizing to those affected by our mistakes,
-  and learning from the experience
-* Focusing on what is best not just for us as individuals, but for the
-  overall community
-
-Examples of unacceptable behavior include:
-
-* The use of sexualized language or imagery, and sexual attention or
-  advances of any kind
-* Trolling, insulting or derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or email
-  address, without their explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
-  professional setting
-
-## Enforcement Responsibilities
-
-Community leaders are responsible for clarifying and enforcing our standards of
-acceptable behavior and will take appropriate and fair corrective action in
-response to any behavior that they deem inappropriate, threatening, offensive,
-or harmful.
-
-Community leaders have the right and responsibility to remove, edit, or reject
-comments, commits, code, wiki edits, issues, and other contributions that are
-not aligned to this Code of Conduct, and will communicate reasons for moderation
-decisions when appropriate.
-
-## Scope
-
-This Code of Conduct applies within all community spaces, and also applies when
-an individual is officially representing the community in public spaces.
-Examples of representing our community include using an official e-mail address,
-posting via an official social media account, or acting as an appointed
-representative at an online or offline event.
-
-## Enforcement
-
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported to the community leaders responsible for enforcement at
-.
-All complaints will be reviewed and investigated promptly and fairly.
-
-All community leaders are obligated to respect the privacy and security of the
-reporter of any incident.
-
-## Enforcement Guidelines
-
-Community leaders will follow these Community Impact Guidelines in determining
-the consequences for any action they deem in violation of this Code of Conduct:
-
-### 1. Correction
-
-**Community Impact**: Use of inappropriate language or other behavior deemed
-unprofessional or unwelcome in the community.
-
-**Consequence**: A private, written warning from community leaders, providing
-clarity around the nature of the violation and an explanation of why the
-behavior was inappropriate. A public apology may be requested.
-
-### 2. Warning
-
-**Community Impact**: A violation through a single incident or series
-of actions.
-
-**Consequence**: A warning with consequences for continued behavior. No
-interaction with the people involved, including unsolicited interaction with
-those enforcing the Code of Conduct, for a specified period of time. This
-includes avoiding interactions in community spaces as well as external channels
-like social media. Violating these terms may lead to a temporary or
-permanent ban.
-
-### 3. Temporary Ban
-
-**Community Impact**: A serious violation of community standards, including
-sustained inappropriate behavior.
-
-**Consequence**: A temporary ban from any sort of interaction or public
-communication with the community for a specified period of time. No public or
-private interaction with the people involved, including unsolicited interaction
-with those enforcing the Code of Conduct, is allowed during this period.
-Violating these terms may lead to a permanent ban.
-
-### 4. Permanent Ban
-
-**Community Impact**: Demonstrating a pattern of violation of community
-standards, including sustained inappropriate behavior,  harassment of an
-individual, or aggression toward or disparagement of classes of individuals.
-
-**Consequence**: A permanent ban from any sort of public interaction within
-the community.
-
-## Attribution
-
-This Code of Conduct is adapted from the [Contributor Covenant][homepage],
-version 2.0, available at
-https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
-
-Community Impact Guidelines were inspired by [Mozilla's code of conduct
-enforcement ladder](https://github.com/mozilla/diversity).
-
-[homepage]: https://www.contributor-covenant.org
-
-For answers to common questions about this code of conduct, see the FAQ at
-https://www.contributor-covenant.org/faq. Translations are available at
-https://www.contributor-covenant.org/translations.
--- a/README.md
+++ b/README.md
@ -1,199 +0,0 @@
-<div align="center">
-
-![logo](./images/logo.png)
-
-</div>
-
-<div align="center">
-
-![visitors](https://visitor-badge.laobi.icu/badge?page_id=jingyaogong/minimind)
-[![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
-[![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
-[![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
-[![GitHub pull request](https://img.shields.io/badge/PRs-welcome-blue)](https://github.com/jingyaogong/minimind/pulls)
-[![Collection](https://img.shields.io/badge/🤗-MiniMind%20%20Collection-blue)](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5)
-
-</div>
-
-
-# 📌 数据介绍
-
-## Ⅰ Tokenizer
-
-分词器将单词从自然语言通过“词典”映射到`0, 1, 36`这样的数字，可以理解为数字就代表了单词在“词典”中的页码。
-可以选择自己构造词表训练一个“词典”，代码可见`./scripts/train_tokenizer.py`（仅供学习参考，若非必要无需再自行训练，MiniMind已自带tokenizer）。
-或者选择比较出名的开源大模型分词器，
-正如同直接用新华/牛津词典的优点是token编码压缩率很好，缺点是页数太多，动辄数十万个词汇短语；
-自己训练的分词器，优点是词表长度和内容随意控制，缺点是压缩率很低（例如"hello"也许会被拆分为"h e l l o"
-五个独立的token），且生僻词难以覆盖。
-“词典”的选择固然很重要，LLM的输出本质上是SoftMax到词典N个词的多分类问题，然后通过“词典”解码到自然语言。
-因为MiniMind体积需要严格控制，为了避免模型头重脚轻（词嵌入embedding层参数在LLM占比太高），所以词表长度短短益善。
-
-<details style="color:rgb(128,128,128)">
-<summary>Tokenizer介绍</summary>
-
-第三方强大的开源模型例如Yi、qwen、chatglm、mistral、Llama3的tokenizer词表长度如下：
-
-<table>
-  <tr><th>Tokenizer模型</th><th>词表大小</th><th>来源</th></tr>
-  <tr><td>yi tokenizer</td><td>64,000</td><td>01万物（中国）</td></tr>
-  <tr><td>qwen2 tokenizer</td><td>151,643</td><td>阿里云（中国）</td></tr>
-  <tr><td>glm tokenizer</td><td>151,329</td><td>智谱AI（中国）</td></tr>
-  <tr><td>mistral tokenizer</td><td>32,000</td><td>Mistral AI（法国）</td></tr>
-  <tr><td>llama3 tokenizer</td><td>128,000</td><td>Meta（美国）</td></tr>
-  <tr><td>minimind tokenizer</td><td>6,400</td><td>自定义</td></tr>
-</table>
-
-> 👉2024-09-17更新：为了防止过去的版本歧义&控制体积，minimind所有模型均使用minimind_tokenizer分词，废弃所有mistral_tokenizer版本。
-
-```
-# 一些自言自语
-> 尽管minimind_tokenizer长度很小，编解码效率弱于qwen2、glm等中文友好型分词器。
-> 但minimind模型选择了自己训练的minimind_tokenizer作为分词器，以保持整体参数轻量，避免编码层和计算层占比失衡，头重脚轻，因为minimind的词表大小只有6400。
-> 且minimind在实际测试中没有出现过生僻词汇解码失败的情况，效果良好。
-> 由于自定义词表压缩长度到6400，使得LLM总参数量最低只有25.8M。
-> 训练数据`tokenizer_train.jsonl`均来自于`匠数大模型数据集`，这部分数据相对次要，如需训练可以自由选择。
-```
-
-</details>
-
-## Ⅱ Pretrain数据
-
-经历了MiniMind-V1的低质量预训练数据，导致模型胡言乱语的教训，`2025-02-05` 之后决定不再采用大规模无监督的数据集做预训练。
-进而尝试把[匠数大模型数据集](https://www.modelscope.cn/datasets/deepctrl/deepctrl-sft-data)的中文部分提取出来，
-清洗出字符`<512`长度的大约1.6GB的语料直接拼接成预训练数据 `pretrain_hq.jsonl`，hq即为high
-quality（当然也还不算high，提升数据质量无止尽）。
-
-文件`pretrain_hq.jsonl` 数据格式为
-
-```bash
-{"text": "如何才能摆脱拖延症？ 治愈拖延症并不容易，但以下建议可能有所帮助..."}
-```
-
-## Ⅲ SFT数据
-
-[匠数大模型SFT数据集](https://www.modelscope.cn/datasets/deepctrl/deepctrl-sft-data)
-“是一个完整、格式统一、安全的大模型训练和研究资源。
-从网络上的公开数据源收集并整理了大量开源数据集，对其进行了格式统一，数据清洗，
-包含10M条数据的中文数据集和包含2M条数据的英文数据集。”
-以上是官方介绍，下载文件后的数据总量大约在4B tokens，肯定是适合作为中文大语言模型的SFT数据的。
-但是官方提供的数据格式很乱，全部用来sft代价太大。
-我将把官方数据集进行了二次清洗，把含有符号污染和噪声的条目去除；另外依然只保留了总长度`<512`
-的内容，此阶段希望通过大量对话补充预训练阶段欠缺的知识。
-导出文件为`sft_512.jsonl`(~7.5GB)。
-
-[Magpie-SFT数据集](https://www.modelscope.cn/organization/Magpie-Align)
-收集了~1M条来自Qwen2/2.5的高质量对话，我将这部分数据进一步清洗，把总长度`<2048`的部分导出为`sft_2048.jsonl`(~9GB)。
-长度`<1024`的部分导出为`sft_1024.jsonl`(~5.5GB)，用大模型对话数据直接进行sft就属于“黑盒蒸馏”的范畴。
-
-进一步清洗前两步sft的数据（只保留中文字符占比高的内容），筛选长度`<512`的对话，得到`sft_mini_512.jsonl`(~1.2GB)。
-
-所有sft文件 `sft_X.jsonl` 数据格式均为
-
-```text
-{
-    "conversations": [
-        {"role": "user", "content": "你好"},
-        {"role": "assistant", "content": "你好！"},
-        {"role": "user", "content": "再见"},
-        {"role": "assistant", "content": "再见！"}
-    ]
-}
-```
-
-## Ⅳ RLHF数据
-
-来自[Magpie-DPO数据集](https://www.modelscope.cn/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1)
-大约200k条偏好数据（均是英文）生成自Llama3.1-70B/8B，可以用于训练奖励模型，优化模型回复质量，使其更加符合人类偏好。
-这里将数据总长度`<3000`的内容重组为`dpo.jsonl`(~0.9GB)，包含`chosen`和`rejected`两个字段，`chosen`
-为偏好的回复，`rejected`为拒绝的回复。
-
-文件 `dpo.jsonl` 数据格式为
-
-```text
-{
-  "chosen": [
-    {"content": "Q", "role": "user"}, 
-    {"content": "good answer", "role": "assistant"}
-  ], 
-  "rejected": [
-    {"content": "Q", "role": "user"}, 
-    {"content": "bad answer", "role": "assistant"}
-  ]
-}
-```
-
-## Ⅴ Reason数据集：
-
-不得不说2025年2月谁能火的过DeepSeek...
-也激发了我对RL引导的推理模型的浓厚兴趣，目前已经用Qwen2.5复现了R1-Zero。
-如果有时间+效果work（但99%基模能力不足）我会在之后更新MiniMind基于RL训练的推理模型而不是蒸馏模型。
-时间有限，最快的低成本方案依然是直接蒸馏（黑盒方式）。
-耐不住R1太火，短短几天就已经存在一些R1的蒸馏数据集[R1-Llama-70B](https://www.modelscope.cn/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B)、[R1-Distill-SFT](https://www.modelscope.cn/datasets/AI-ModelScope/R1-Distill-SFT)、
-[Alpaca-Distill-R1](https://huggingface.co/datasets/shareAI/Alpaca-Distill-R1-ZH)、
-[deepseek_r1_zh](https://huggingface.co/datasets/jinliuxi/deepseek_r1_zh)等等，纯中文的数据可能比较少。
-最终整合它们，导出文件为`r1_mix_1024.jsonl`，数据格式和`sft_X.jsonl`一致。
-
-## Ⅵ 更多数据集
-
-目前已经有[HqWu-HITCS/Awesome-Chinese-LLM](https://github.com/HqWu-HITCS/Awesome-Chinese-LLM)
-在收集和梳理中文LLM相关的开源模型、应用、数据集及教程等资料，并持续更新这方面的最新进展。全面且专业，Respect！
-
---
-
-## Ⅷ 数据集下载
-
-> [!NOTE]
-> 2025-02-05后，开源MiniMind最终训练所用的所有数据集，因此无需再自行预处理大规模数据集，避免重复性的数据处理工作。
-
-MiniMind训练数据集 ([ModelScope](https://www.modelscope.cn/datasets/gongjy/minimind-dataset/files) | [HuggingFace](https://huggingface.co/datasets/jingyaogong))
-
-> 无需全部clone，可单独下载所需的文件
-
-将下载的数据集文件放到`./dataset/`目录下（✨为推荐的必须项）
-
-```bash
-./dataset/
-├── dpo.jsonl (909MB)
-├── lora_identity.jsonl (22.8KB)
-├── lora_medical.jsonl (34MB)
-├── pretrain_hq.jsonl (1.6GB, ✨)
-├── r1_mix_1024.jsonl (340MB)
-├── sft_1024.jsonl (5.6GB)
-├── sft_2048.jsonl (9GB)
-├── sft_512.jsonl (7.5GB)
-├── sft_mini_512.jsonl (1.2GB, ✨)
-└── tokenizer_train.jsonl (1GB)
-```
-
-<details style="color:rgb(128,128,128)">
-<summary>注：各数据集简介</summary>
-
-* `dpo.jsonl` --RLHF阶段数据集
-* `lora_identity.jsonl` --自我认知数据集（例如：你是谁？我是minimind...），推荐用于lora训练（亦可用于全参SFT，勿被名字局限）
-* `lora_medical.jsonl` --医疗问答数据集，推荐用于lora训练（亦可用于全参SFT，勿被名字局限）
-* `pretrain_hq.jsonl`✨ --预训练数据集，整合自jiangshu科技
-* `r1_mix_1024.jsonl` --DeepSeek-R1-1.5B蒸馏数据，每条数据字符最大长度为1024（因此训练时设置max_seq_len=1024）
-* `sft_1024.jsonl` --整合自Qwen2.5蒸馏数据（是sft_2048的子集），每条数据字符最大长度为1024（因此训练时设置max_seq_len=1024）
-* `sft_2048.jsonl` --整合自Qwen2.5蒸馏数据，每条数据字符最大长度为2048（因此训练时设置max_seq_len=2048）
-* `sft_512.jsonl` --整合自匠数科技SFT数据，每条数据字符最大长度为512（因此训练时设置max_seq_len=512）
-* `sft_mini_512.jsonl`✨ --极简整合自匠数科技SFT数据+Qwen2.5蒸馏数据（用于快速训练Zero模型），每条数据字符最大长度为512（因此训练时设置max_seq_len=512）
-* `tokenizer_train.jsonl` --均来自于`匠数大模型数据集`，这部分数据相对次要，（不推荐自己重复训练tokenizer，理由如上）如需自己训练tokenizer可以自由选择数据集。
-
-</details>
-
-
-![dataset](./images/dataset.jpg)
-
-<details style="color:rgb(128,128,128)">
-<summary>说明 & 推荐训练方案</summary>
-
-* MiniMind2 Series均经过共约20GB语料训练，大约4B tokens，即对应上面的数据组合训练结果（开销：💰💰💰💰💰💰💰💰，效果：😊😊😊😊😊😊）
-
-* 想要最快速度从0实现Zero模型，推荐使用`pretrain_hq.jsonl` + `sft_mini_512.jsonl` 的数据组合，具体花销和效果可查看下文表格（开销：💰，效果：😊😊）
-
-* 推荐具备一定算力资源或更在意效果的朋友可以考虑前者完整复现MiniMind2；仅有单卡GPU或在乎短时间快速复现的朋友强烈推荐后者；
-
-* 【折中方案】亦可选择例如`sft_mini_512.jsonl`、`sft_1024.jsonl`中等规模数据进行自由组合训练（开销：💰💰💰，效果：😊😊😊😊）。
-
-</details>
--- a/README_en.md
+++ b/README_en.md
--- a/images/1-wiki.png
+++ b/images/1-wiki.png
--- a/images/2-wiki.png
+++ b/images/2-wiki.png
--- a/images/3-wiki.png
+++ b/images/3-wiki.png
--- a/images/4-wiki.png
+++ b/images/4-wiki.png
--- a/images/5-wiki.png
+++ b/images/5-wiki.png
--- a/images/LLM-structure-moe.png
+++ b/images/LLM-structure-moe.png
--- a/images/LLM-structure.png
+++ b/images/LLM-structure.png
--- a/images/and_huggingface.png
+++ b/images/and_huggingface.png
--- a/images/and_modelscope.png
+++ b/images/and_modelscope.png
--- a/images/compare_radar.png
+++ b/images/compare_radar.png
--- a/images/dataset.jpg
+++ b/images/dataset.jpg
--- a/images/gpt3_config.png
+++ b/images/gpt3_config.png
--- a/images/logo.png
+++ b/images/logo.png
--- a/images/logo2.png
+++ b/images/logo2.png
--- a/images/minimind2.gif
+++ b/images/minimind2.gif
--- a/images/pre_512_loss.png
+++ b/images/pre_512_loss.png
--- a/images/pre_768_loss.png
+++ b/images/pre_768_loss.png
--- a/images/sft_512_loss.png
+++ b/images/sft_512_loss.png
--- a/images/sft_768_loss.png
+++ b/images/sft_768_loss.png
--- a/model/LMConfig.py
+++ b/model/LMConfig.py
@ -36,6 +36,9 @@ class LMConfig(PretrainedConfig):
            aux_loss_alpha: float = 0.1,
            seq_aux: bool = True,
            norm_topk_prob: bool = True,
+            ####################################################
+            knowlwdge_num: int = 64*64,
+            knowlwdge_length: int = 8,
            **kwargs,
    ):
        self.dim = dim
@ -66,4 +69,7 @@ class LMConfig(PretrainedConfig):
        self.aux_loss_alpha = aux_loss_alpha  # 辅助损失的alpha参数
        self.seq_aux = seq_aux  # 是否在序列级别上计算辅助损失
        self.norm_topk_prob = norm_topk_prob  # 是否标准化top-k概率
+        ####################################################
+        self.knowlwdge_num = knowlwdge_num
+        self.knowlwdge_length = knowlwdge_length
        super().__init__(**kwargs)
--- a/model/model.py
+++ b/model/model.py
@ -528,15 +528,15 @@ class ExtractDB(nn.Module):
        self.batch_size = None
        self.dim = params.dim
        self.dim_key = self.dim // 2
-        self.num_experts = 10 * 10  # 100专家，确保是完全平方数
+        self.knowlwdge_num = params.knowlwdge_num  # 100专家，确保是完全平方数
        # 将knowledge_dim设置为与head_dim相同，以便在attention中直接使用
        self.head_dim = params.dim // params.n_heads
-        self.knowledge_dim = 8*params.dim
+        self.knowledge_length = params.knowlwdge_length*params.dim

        # 使用register_buffer代替nn.Parameter，避免梯度问题
-        self.register_buffer('weight_down_embed', torch.randn(self.num_experts, self.knowledge_dim) * 0.02)
+        self.register_buffer('weight_down_embed', torch.randn(self.knowlwdge_num, self.knowledge_length) * 0.02)

-        self.num_keys = int(math.sqrt(self.num_experts)) if self.num_experts > 0 else 0
+        self.num_keys = int(math.sqrt(self.knowlwdge_num)) if self.knowlwdge_num > 0 else 0
        self.product_key_topk = min(16, self.num_keys)
        self.keys = nn.Parameter(torch.randn(self.num_keys, 2, self.dim_key) * 0.02)
        self.num_experts_per_head_topk = 1
--- a/requirements.txt
+++ b/requirements.txt
@ -1,30 +1,147 @@
+accelerate==1.6.0
+aiohappyeyeballs==2.6.1
+aiohttp==3.11.17
+aiosignal==1.3.2
+altair==5.5.0
+annotated-types==0.7.0
+anyio==4.9.0
+async-timeout==5.0.1
+attrs==25.3.0
+blinker==1.9.0
+cachetools==5.5.2
+certifi==2025.1.31
+charset-normalizer==3.4.1
+click==8.1.8
+contourpy==1.3.2
+cycler==0.12.1
 datasets==2.21.0
 datasketch==1.6.4
+deepspeed==0.16.7
+dill==0.3.8
+distro==1.9.0
+docker-pycreds==0.4.0
+einops==0.8.1
+exceptiongroup==1.2.2
+filelock==3.18.0
 Flask==3.0.3
-Flask_Cors==4.0.0
+Flask-Cors==4.0.0
+fonttools==4.57.0
+frozenlist==1.6.0
+fsspec==2024.6.1
+gitdb==4.0.12
+GitPython==3.1.44
+h11==0.14.0
+hjson==3.1.0
+httpcore==1.0.8
+httpx==0.28.1
+huggingface-hub==0.30.2
+idna==3.10
+importlib_metadata==7.2.1
+itsdangerous==2.2.0
 jieba==0.42.1
+Jinja2==3.1.2
+jiter==0.9.0
+joblib==1.4.2
 jsonlines==4.0.0
+jsonschema==4.23.0
+jsonschema-specifications==2024.10.1
+kiwisolver==1.4.8
+markdown-it-py==3.0.0
+MarkupSafe==3.0.2
 marshmallow==3.22.0
 matplotlib==3.10.0
+mdurl==0.1.2
+modelscope==1.25.0
+mpmath==1.3.0
+msgpack==1.1.0
+multidict==6.4.3
+multiprocess==0.70.16
+narwhals==1.35.0
+networkx==3.4.2
 ngrok==1.4.0
+ninja==1.11.1.4
 nltk==3.8
 numpy==1.26.4
+nvidia-cublas-cu11==11.11.3.6
+nvidia-cublas-cu12==12.1.3.1
+nvidia-cuda-cupti-cu11==11.8.87
+nvidia-cuda-cupti-cu12==12.1.105
+nvidia-cuda-nvrtc-cu11==11.8.89
+nvidia-cuda-nvrtc-cu12==12.1.105
+nvidia-cuda-runtime-cu11==11.8.89
+nvidia-cuda-runtime-cu12==12.1.105
+nvidia-cudnn-cu11==9.1.0.70
+nvidia-cudnn-cu12==8.9.2.26
+nvidia-cufft-cu11==10.9.0.58
+nvidia-cufft-cu12==11.0.2.54
+nvidia-curand-cu11==10.3.0.86
+nvidia-curand-cu12==10.3.2.106
+nvidia-cusolver-cu11==11.4.1.48
+nvidia-cusolver-cu12==11.4.5.107
+nvidia-cusparse-cu11==11.7.5.86
+nvidia-cusparse-cu12==12.1.0.106
+nvidia-nccl-cu11==2.21.5
+nvidia-nccl-cu12==2.19.3
+nvidia-nvjitlink-cu12==12.8.93
+nvidia-nvtx-cu11==11.8.86
+nvidia-nvtx-cu12==12.1.105
 openai==1.59.6
+packaging==23.2
 pandas==1.5.3
 peft==0.7.1
+pillow==10.4.0
+platformdirs==4.3.7
+propcache==0.3.1
+protobuf==4.25.6
 psutil==5.9.8
+py-cpuinfo==9.0.0
+pyarrow==19.0.1
 pydantic==2.8.2
+pydantic_core==2.20.1
+pydeck==0.9.1
+Pygments==2.19.1
+pyparsing==3.2.3
+python-dateutil==2.9.0.post0
+pytz==2025.2
+PyYAML==6.0.2
+referencing==0.36.2
+regex==2024.11.6
+requests==2.32.3
 rich==13.7.1
-scikit_learn==1.5.1
-sentence_transformers==2.3.1
+rpds-py==0.24.0
+safetensors==0.5.3
+scikit-learn==1.5.1
+scipy==1.15.2
+sentence-transformers==2.3.1
+sentencepiece==0.2.0
+sentry-sdk==2.26.1
+setproctitle==1.3.5
 simhash==2.1.2
-tiktoken==0.5.1
-transformers==4.48.0
-jinja2==3.1.2
-jsonlines==4.0.0
-trl==0.13.0
-ujson==5.1.0
-wandb==0.18.3
+six==1.17.0
+smmap==5.0.2
+sniffio==1.3.1
 streamlit==1.30.0
-torch==2.2.2
-torchvision==0.17.2
+sympy==1.13.3
+tenacity==8.5.0
+threadpoolctl==3.6.0
+tiktoken==0.5.1
+tokenizers==0.21.1
+toml==0.10.2
+torch==2.7.0+cu118
+torchvision==0.22.0+cu118
+tornado==6.4.2
+tqdm==4.67.1
+transformers==4.48.0
+triton==3.3.0
+trl==0.13.0
+typing_extensions==4.13.2
+tzlocal==5.3.1
+ujson==5.1.0
+urllib3==2.4.0
+validators==0.34.0
+wandb==0.18.3
+watchdog==6.0.0
+Werkzeug==3.1.3
+xxhash==3.5.0
+yarl==1.20.0
+zipp==3.21.0
--- a/run_file/DynamicKV-LLM_Mini_Minimind.sh
+++ b/run_file/DynamicKV-LLM_Mini_Minimind.sh
@ -0,0 +1,48 @@
+#!/bin/bash
+
+# 激活conda环境
+source $(conda info --base)/etc/profile.d/conda.sh
+conda activate ycz_accelerate
+
+# 设置环境变量以帮助调试
+export NCCL_DEBUG=INFO
+export PYTHONFAULTHANDLER=1
+
+# 方法1: 使用预先配置的accelerate配置文件
+# accelerate launch --config_file accelerate_config.yaml train_pretrain_accelerate.py \
+#     --epochs 3 \
+#     --batch_size 24 \
+#     --learning_rate 2e-4 \
+#     --dtype bfloat16 \
+#     --accumulation_steps 32 \
+#     --grad_clip 1.0 \
+#     --log_interval 100 \
+#     --save_interval 10000 \
+#     --dim 1024 \
+#     --n_layers 32 \
+#     --max_seq_len 1024 \
+#     --use_flash_attn \
+#     --profile \
+#     --profile_interval 10
+
+# 方法2: 使用命令行参数直接配置accelerate
+CUDA_VISIBLE_DEVICES=0 accelerate launch \
+    --multi_gpu \
+    --num_processes=4 \
+    --mixed_precision=bf16 \
+    --main_process_port=29500 \
+    train_pretrain_accelerate.py \
+    --epochs 3 \
+    --batch_size 24 \
+    --learning_rate 2e-4 \
+    --dtype bfloat16 \
+    --accumulation_steps 32 \
+    --grad_clip 1.0 \
+    --log_interval 100 \
+    --save_interval 10000 \
+    --dim 512 \
+    --n_layers 12 \
+    --max_seq_len 512 \
+    --use_flash_attn \
+    --profile \
+    --profile_interval 10
--- a/run_file/DynamicKV-LLM_Small_Minimind.sh
+++ b/run_file/DynamicKV-LLM_Small_Minimind.sh
--- a/train_pretrain_accelerate.py
+++ b/train_pretrain_accelerate.py
@ -275,6 +275,8 @@ def main():
    parser.add_argument("--profile", action="store_true", default=True, help="启用性能分析")
    parser.add_argument("--profile_interval", type=int, default=10, help="性能分析打印间隔（步数）")
    parser.add_argument("--use_flash_attn", action="store_true", default=True, help="启用FlashAttention")
+    parser.add_argument("--knowlwdge_num", type=int, default=64*64,help="知识库的数据数目")
+    parser.add_argument("--knowlwdge_length", type=int, default=8,help="知识库的句子长度")
    args = parser.parse_args()

    # 初始化accelerator
@ -304,7 +306,9 @@ def main():
        max_seq_len=args.max_seq_len,
        use_moe=args.use_moe,
        disable_db=args.disable_db,
-        flash_attn=args.use_flash_attn
+        flash_attn=args.use_flash_attn,
+        knowlwdge_num=args.knowlwdge_num,
+        knowlwdge_length=args.knowlwdge_length
    )

    # 创建保存目录