Jax922
|
83f5cfe6ca
|
update
|
2025-05-12 19:11:04 +08:00 |
|
Jax922
|
803d1f1b72
|
检查速度慢的原因
|
2025-05-12 17:46:18 +08:00 |
|
Jax922
|
48f0018432
|
update
|
2025-05-12 14:16:42 +08:00 |
|
|
decec67b78
|
update
|
2025-05-12 12:24:03 +08:00 |
|
Jax922
|
d93889194d
|
update
archive/HPC
|
2025-05-12 11:53:10 +08:00 |
|
|
a3ea93597c
|
DynamicKV-LLM Pretrain v1.1.0
|
2025-05-12 00:21:07 +08:00 |
|
|
da5ac6a5c0
|
Merge branch 'SLM' into HPC
|
2025-05-12 00:05:45 +08:00 |
|
|
8dd7cfaf72
|
修正了loss为nan的错误
|
2025-05-11 23:57:34 +08:00 |
|
|
cb286d26d1
|
wandb包含config信息
|
2025-05-10 20:23:52 +08:00 |
|
|
0c8c6e5d1a
|
添加了忽视数据库模式
|
2025-05-09 15:19:41 +08:00 |
|
|
b6bd97aaaa
|
抽取self.downsample_v与self.downsample_q的共同部分,并使用可分离卷积降低参数量
|
2025-05-09 15:01:06 +08:00 |
|
|
bed6faa379
|
DynamicKV-LLM 1.0.1 交叉注意力添加多头;bf16代替fp16
|
2025-05-08 15:47:00 +00:00 |
|
|
10f15724b4
|
添加了train_embedding用于预训练嵌入模型
|
2025-05-08 15:41:04 +00:00 |
|
|
0859f54a88
|
DynamicKV-LLM 1.0.0 完成了核心架构,模型可以正常训练
|
2025-04-25 16:49:05 +08:00 |
|
|
e3120f5e62
|
fix
|
2025-04-25 16:29:28 +08:00 |
|
Jax922
|
1ddfd310ec
|
将Million MoE的思想加入
|
2025-04-24 21:29:33 +08:00 |
|
Jax922
|
c55dfc0b46
|
添加了注释
|
2025-04-24 15:58:39 +08:00 |
|
Jax922
|
21fdaaa59e
|
更新了忽视列表
|
2025-04-24 15:58:33 +08:00 |
|
jingyaogong
|
7da201a944
|
update chat-openai-api
|
2025-04-18 12:43:57 +08:00 |
|
jingyaogong
|
d9453ed9a3
|
update moe note
|
2025-04-09 17:38:31 +08:00 |
|
jingyaogong
|
d503093ec4
|
update eval
|
2025-04-09 16:56:57 +08:00 |
|
jingyaogong
|
4a758564e4
|
fix top_p float bug
|
2025-04-09 16:52:20 +08:00 |
|
jingyaogong
|
4a7c1c49e8
|
update rlaif
|
2025-04-05 16:06:08 +08:00 |
|
jingyaogong
|
9e67798397
|
update generate
|
2025-04-05 15:53:55 +08:00 |
|
jingyaogong
|
399d526fbd
|
add hidden state
|
2025-04-05 14:39:56 +08:00 |
|
jingyaogong
|
885661f47d
|
update inference
|
2025-04-05 12:04:38 +08:00 |
|
jingyaogong
|
ed01c5d84a
|
update inference
|
2025-04-05 12:03:04 +08:00 |
|
jingyaogong
|
7fcc46b39a
|
update seed set
|
2025-04-04 11:39:41 +08:00 |
|
jingyaogong
|
08e9a22a25
|
update web_demo
|
2025-04-04 11:25:40 +08:00 |
|
jingyaogong
|
278ec760a1
|
update dpo_loss
|
2025-04-01 17:32:50 +08:00 |
|
jingyaogong
|
4f95e23a98
|
update structure image
|
2025-04-01 16:15:26 +08:00 |
|
jingyaogong
|
edc8d26189
|
update structure image
|
2025-04-01 16:11:54 +08:00 |
|
jingyaogong
|
bf81fd5f5e
|
rmsnorm float convert
|
2025-04-01 16:03:44 +08:00 |
|
jingyaogong
|
e369b33265
|
fix chat mask bug
|
2025-04-01 13:44:55 +08:00 |
|
jingyaogong
|
258507ff89
|
delete __pycache__
|
2025-04-01 11:51:54 +08:00 |
|
gongjy
|
04b56ea86c
|
update readme
|
2025-02-23 20:07:26 +08:00 |
|
gongjy
|
e34d4e9371
|
update tokenizer load
|
2025-02-19 23:24:29 +08:00 |
|
gongjy
|
45c0d12049
|
update images
|
2025-02-19 22:59:42 +08:00 |
|
gongjy
|
f475e4e407
|
update images
|
2025-02-19 22:54:57 +08:00 |
|
gongjy
|
ef7dff9fd4
|
update structure figure
|
2025-02-18 23:35:16 +08:00 |
|
gongjy
|
dcf5fcdb08
|
update structure figure
|
2025-02-18 23:24:51 +08:00 |
|
gongjy
|
844e79148c
|
update generate args
|
2025-02-15 23:56:09 +08:00 |
|
gongjy
|
19b388cd87
|
update generate args
|
2025-02-15 23:55:10 +08:00 |
|
gongjy
|
5b65bc767e
|
update cis init
|
2025-02-15 20:26:34 +08:00 |
|
gongjy
|
c1a77f5c0f
|
update web_demo
|
2025-02-14 19:38:55 +08:00 |
|
gongjy
|
d519d2a233
|
update web_demo
|
2025-02-14 19:37:27 +08:00 |
|
gongjy
|
e7ed05834b
|
fix bug
|
2025-02-13 21:07:43 +08:00 |
|
gongjy
|
b5d10d9a7d
|
fix bugs
|
2025-02-13 20:56:14 +08:00 |
|
gongjy
|
416cc90b58
|
update ckp-path
|
2025-02-12 20:34:47 +08:00 |
|
gongjy
|
bab480073e
|
update lr
|
2025-02-11 23:53:48 +08:00 |
|