update readme

This commit is contained in:
gongjy 2024-10-08 23:40:29 +08:00
parent 000b0a496b
commit 772834148e
4 changed files with 20 additions and 30 deletions

View File

@ -35,11 +35,16 @@
--- ---
<div align="center"> <div align="center">
https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055 ![](./images/minimind-demo.gif)
[Bilibili视频链接](https://www.bilibili.com/video/BV12dHPeqE72/?share_source=copy_web&vd_source=670c2504f88726f8cf4a21ef6147c0e8) [ModelScope在线测试](https://www.modelscope.cn/studios/gongjy/minimind) | [Bilibili视频链接](https://www.bilibili.com/video/BV12dHPeqE72/?share_source=copy_web&vd_source=670c2504f88726f8cf4a21ef6147c0e8)
---
</div> </div>
@ -116,7 +121,7 @@ https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055
- 项目已部署至ModelScope创空间可以在此网站上体验 - 项目已部署至ModelScope创空间可以在此网站上体验
- [ModelScope在线体验](https://www.modelscope.cn/studios/gongjy/minimind) - [🔗ModelScope在线体验🔗](https://www.modelscope.cn/studios/gongjy/minimind)
</details> </details>
@ -175,16 +180,6 @@ python 2-eval.py
streamlit run fast_inference.py streamlit run fast_inference.py
``` ```
![](./images/streamlit.png)
<div align="center">
项目已部署至ModelScope创空间可以在此网站上体验
[ModelScope在线体验](https://www.modelscope.cn/studios/gongjy/minimind)
</div>
# 📌 Quick Start Train # 📌 Quick Start Train
@ -198,7 +193,7 @@ streamlit run fast_inference.py
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
``` ```
```python ```text
# 测试torch是否可用cuda # 测试torch是否可用cuda
import torch import torch
print(torch.cuda.is_available()) print(torch.cuda.is_available())

View File

@ -43,12 +43,15 @@
<div align="center"> <div align="center">
https://github.com/user-attachments/assets/88b98128-636e-43bc-a419-b1b1403c2055 ![](./images/minimind-demo.gif)
[Bilibili Video](https://www.bilibili.com/video/BV12dHPeqE72/?share_source=copy_web&vd_source=670c2504f88726f8cf4a21ef6147c0e8) [ModelScope Online Testing](https://www.modelscope.cn/studios/gongjy/minimind) | [Bilibili Video Link](https://www.bilibili.com/video/BV12dHPeqE72/?share_source=copy_web&vd_source=670c2504f88726f8cf4a21ef6147c0e8)
---
</div> </div>
# 📌 Introduction # 📌 Introduction
In the field of large language models (LLMs) such as GPT, LLaMA, GLM, etc., while their performance is impressive, the In the field of large language models (LLMs) such as GPT, LLaMA, GLM, etc., while their performance is impressive, the
@ -187,18 +190,6 @@ or you can run streamlit, launch a web page to chat with minimind-v1
streamlit run fast_inference.py streamlit run fast_inference.py
``` ```
![](./images/streamlit.png)
<div align="center">
The project has been deployed to ModelScope makerspace, where you can experience:
[ModelScope Online](https://www.modelscope.cn/studios/gongjy/minimind)
</div>
# 📌 Quick Start Train # 📌 Quick Start Train
* 0.Clone the project code * 0.Clone the project code
@ -213,7 +204,7 @@ The project has been deployed to ModelScope makerspace, where you can experience
pip install -r requirements.txt pip install -r requirements.txt
``` ```
```python ```text
# Test if torch can use CUDA # Test if torch can use CUDA
import torch import torch
print(torch.cuda.is_available()) print(torch.cuda.is_available())

BIN
images/minimind-demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.2 MiB

View File

@ -27,11 +27,15 @@ class RMSNorm(torch.nn.Module):
return output * self.weight return output * self.weight
def precompute_pos_cis(dim: int, end: int, theta: float = 10000.0): def precompute_pos_cis(dim: int, end: int, theta: float = 10000.0, train_len: int = 512):
freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim)) freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))
t = torch.arange(end, device=freqs.device) # type: ignore t = torch.arange(end, device=freqs.device) # type: ignore
freqs = torch.outer(t, freqs).float() # type: ignore freqs = torch.outer(t, freqs).float() # type: ignore
pos_cis = torch.polar(torch.ones_like(freqs), freqs) # complex64 pos_cis = torch.polar(torch.ones_like(freqs), freqs) # complex64
# # 计算缩放因子
# scale = train_len / end
# # 缩放旋转嵌入实现线性的长度外推注释掉不用是因为小模型依赖pos_cis拟合严重直接做线性外推效果并不好
# pos_cis = pos_cis * scale
return pos_cis return pos_cis