update readme
This commit is contained in:
parent
6210a24b6f
commit
288ab3ccb8
30
README.md
30
README.md
@ -1,5 +1,3 @@
|
||||
|
||||
|
||||
<div align="center">
|
||||
|
||||

|
||||
@ -36,13 +34,12 @@
|
||||
|
||||
<div align="center">
|
||||
|
||||
https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
|
||||
https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
|
||||
|
||||
[Bilibili视频链接](https://www.bilibili.com/video/BV12dHPeqE72/?share_source=copy_web&vd_source=670c2504f88726f8cf4a21ef6147c0e8)
|
||||
[Bilibili视频链接](https://www.bilibili.com/video/BV12dHPeqE72/?share_source=copy_web&vd_source=670c2504f88726f8cf4a21ef6147c0e8)
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
# 📌 Introduction
|
||||
|
||||
大语言模型(LLM)领域,如 GPT、LLaMA、GLM 等,虽然它们效果惊艳,
|
||||
@ -132,11 +129,14 @@ git clone https://huggingface.co/jingyaogong/minimind-v1
|
||||
# step 2
|
||||
python 2-eval.py
|
||||
```
|
||||
|
||||
或者启动streamlit,启动网页聊天界面
|
||||
|
||||
```bash
|
||||
# or step 3, use streamlit
|
||||
streamlit run fast_inference.py
|
||||
```
|
||||
|
||||

|
||||
|
||||
<div align="center">
|
||||
@ -230,7 +230,8 @@ streamlit run fast_inference.py
|
||||
|
||||
---
|
||||
|
||||
- 📙【Pretrain数据】:[seq-monkey通用文本数据集](https://github.com/mobvoi/seq-monkey-data/blob/main/docs/pretrain_open_corpus.md)
|
||||
-
|
||||
📙【Pretrain数据】:[seq-monkey通用文本数据集](https://github.com/mobvoi/seq-monkey-data/blob/main/docs/pretrain_open_corpus.md)
|
||||
是由多种公开来源的数据(如网页、百科、博客、开源代码、书籍等)汇总清洗而成。
|
||||
整理成统一的JSONL格式,并经过了严格的筛选和去重,确保数据的全面性、规模、可信性和高质量。
|
||||
总量大约在10B token,适合中文大语言模型的预训练。
|
||||
@ -554,7 +555,8 @@ MobileLLM提出架构的深度比宽度更重要,「深而窄」的「瘦长
|
||||
* minimind-MoE(0.16B)表现很差,甚至不如它同配置的dense模型minimind(0.05B)
|
||||
,其实这并非MoE的锅。同样是因为偷懒提前kill腾出资源给小模型,但是MoE模型多专家模式需要的训练轮次本来就需要酌情更高,在epochs设置为2时训练的极其不充分。minimind不久前实验阶段在Yi
|
||||
tokenizer上试验过MoE的充分训练版本,可以做到比dense表现肉眼可见的好。现在先这样了hh,日后腾出服务器再训练更新v2 v3版本。
|
||||
* F模型的回答看起来是这里最完美的,尽管存在些许幻觉瞎编的情况。但GPT-4o和kimi的评分都一致认为它“信息过度冗长,且有重复内容,存在幻觉”。其实这种评价太严格了,100个字中有10个字是幻觉,就很容易把它归到0分。由于F模型训练文本默认长度更长,数据集大得多,所以回答的看起来很完备,在体积近似的情况下,数据比模型更重要得多。
|
||||
*
|
||||
F模型的回答看起来是这里最完美的,尽管存在些许幻觉瞎编的情况。但GPT-4o和kimi的评分都一致认为它“信息过度冗长,且有重复内容,存在幻觉”。其实这种评价太严格了,100个字中有10个字是幻觉,就很容易把它归到0分。由于F模型训练文本默认长度更长,数据集大得多,所以回答的看起来很完备,在体积近似的情况下,数据比模型更重要得多。
|
||||
|
||||
> 🙋♂️个人主观评价:F>D>A≈B>C>E
|
||||
|
||||
@ -673,7 +675,8 @@ minimind模型本身没有使用较大的数据集训练,也没有针对回答
|
||||
|
||||
* [./export_model.py](./export_model.py)可以导出模型到transformers格式,推送到huggingface
|
||||
|
||||
* MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5)
|
||||
*
|
||||
MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5)
|
||||
|
||||
---
|
||||
|
||||
@ -725,24 +728,18 @@ minimind模型本身没有使用较大的数据集训练,也没有针对回答
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# 📌 Acknowledge
|
||||
|
||||
> [!NOTE]
|
||||
> 如果您觉得 `MiniMind`对您有所帮助,请在 GitHub 上给一个⭐<br/>
|
||||
> 您的支持是我们持续改进项目的动力!篇幅不短水平有限难免纰漏,欢迎在issue交流和指正。
|
||||
|
||||
|
||||
|
||||
## 🤝贡献者
|
||||
|
||||
<br/>
|
||||
|
||||
<a href="https://github.com/jingyaogong/minimind/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=jingyaogong/minimind" />
|
||||
<img src="https://contrib.rocks/image?repo=jingyaogong/minimind&v=2" />
|
||||
</a>
|
||||
|
||||
## 🫶感谢支持!
|
||||
@ -751,11 +748,8 @@ minimind模型本身没有使用较大的数据集训练,也没有针对回答
|
||||
|
||||
[](https://github.com/jingyaogong/minimind/network/members)
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
# License
|
||||
|
||||
This repository is licensed under the [Apache-2.0 License](LICENSE).
|
||||
|
@ -816,8 +816,9 @@ This suggests that the model performs well in logical reasoning, foundational sc
|
||||
|
||||
## 🤝Contributors
|
||||
<br/>
|
||||
|
||||
<a href="https://github.com/jingyaogong/minimind/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=jingyaogong/minimind" />
|
||||
<img src="https://contrib.rocks/image?repo=jingyaogong/minimind&v=2" />
|
||||
</a>
|
||||
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user