update readme's error
This commit is contained in:
parent
79a616ac15
commit
6d6510eefc
@ -162,8 +162,8 @@ python 2-eval.py
|
||||
因为LLM体积非常小,为了避免模型头重脚轻(词嵌入embedding层参数占整个LLM比太高),所以词表长度需要选择比较小。
|
||||
强大的开源模型例如01万物、千问、chatglm、mistral、Llama3等,它们的tokenizer词表长度如下:
|
||||
|
||||
| Tokenizer 模型 | 词表大小 | 来源 |
|
||||
|--------------------|---------|------------|
|
||||
| Tokenizer 模型 | 词表大小 | 来源 |
|
||||
|--------------------|---------|------------|
|
||||
| yi tokenizer | 64,000 | 01万物(中国) |
|
||||
| qwen2 tokenizer | 151,643 | 阿里云(中国) |
|
||||
| glm tokenizer | 151,329 | 智谱AI(中国) |
|
||||
|
@ -192,7 +192,7 @@ git clone https://github.com/jingyaogong/minimind.git
|
||||
sizes:
|
||||
|
||||
| Tokenizer Model | Vocabulary Size | Source |
|
||||
|----------------------|------------------|-----------------------|
|
||||
|----------------------|------------------|-----------------------|
|
||||
| yi tokenizer | 64,000 | 01-AI (China) |
|
||||
| qwen2 tokenizer | 151,643 | Alibaba Cloud (China) |
|
||||
| glm tokenizer | 151,329 | Zhipu AI (China) |
|
||||
|
Loading…
x
Reference in New Issue
Block a user