Skip to content

Commit

Permalink
add performance for albert_tiny & albert_xlarge
Browse files Browse the repository at this point in the history
  • Loading branch information
brightmart committed Oct 16, 2019
1 parent 5376c5a commit 22c6dec
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 45 deletions.
89 changes: 44 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,49 @@ Different version of ALBERT pre-trained model for Chinese, including TensorFlow,

更多数据集、基线模型、不同任务上模型效果的详细对比,见<a href="https://github.com/chineseGLUE/chineseGLUE">中文任务基准测评chineseGLUE</a>

##### Update
**\*\*\*\*\* 2019-10-15: add albert_tiny_zh, speed 10 times fast than bert base for training and inference, accuracy remains: only 1.5 percentage less \*\*\*\*\***
<img src="https://github.com/brightmart/albert_zh/blob/master/resources/albert_tiny_compare_s.jpg" width="80%" height="40%" />


模型下载 Download Pre-trained Models of Chinese
-----------------------------------------------
1、<a href="https://storage.googleapis.com/albert_zh/albert_tiny.zip">albert_tiny_zh</a>, 文件大小16M、参数为1.8M

训练和推理预测速度提升约10倍,精度基本保留,模型大小为bert的1/25;语义相似度数据集LCQMC测试集上达到85.4%,相比bert_base仅下降1.5个点。

lcqmc训练使用如下参数: --max_seq_length=128 --train_batch_size=64 --learning_rate=1e-4 --num_train_epochs=5

albert_tiny使用同样的大规模中文语料数据,层数仅为4层、hidden size等向量维度大幅减少。

【使用场景】任务相对比较简单一些或实时性要求高的任务,如语义相似度等句子对任务、分类任务;比较难的任务如阅读理解等,可以使用其他大模型。

2、<a href="https://storage.googleapis.com/albert_zh/albert_large_zh.zip">albert_large_zh</a>,参数量,层数24,文件大小为64M

参数量和模型大小为bert_base的六分之一;在口语化描述相似性数据集LCQMC的测试集上相比bert_base上升0.2个点

**\*\*\*\*\* 2019-10-13: add albert_xlarge_zh; albert_base_zh_additional_steps, training with additional 150 Million instances) \*\*\*\*\***
3、<a href="https://storage.googleapis.com/albert_zh/albert_base_zh_additional_36k_steps.zip">albert_base_zh(额外训练了1.5亿个实例即 36k steps * batch_size 4096)</a>; <a href="https://storage.googleapis.com/albert_zh/albert_base_zh.zip"> albert_base_zh(小模型体验版)</a>, 参数量12M, 层数12,大小为40M

**\*\*\*\*\* 2019-10-11: albert with TensorFlow 2.0 \*\*\*\*\***
参数量为bert_base的十分之一,模型大小也十分之一;在口语化描述相似性数据集LCQMC的测试集上相比bert_base下降约0.6~1个点;
相比未预训练,albert_base提升14个点

Use or load pre-trained model with tf2.0 through <a href="https://github.com/kpe/bert-for-tf2">bert-for-tf2</a>
4、<a href="https://storage.googleapis.com/albert_zh/albert_xlarge_zh_177k.zip">albert_xlarge_zh_177k </a>;
<a href="https://storage.googleapis.com/albert_zh/albert_xlarge_zh_183k.zip">albert_xlarge_zh_183k</a>参数量,层数24,文件大小为230M

参数量和模型大小为bert_base的二分之一;需要一张大的显卡;完整测试对比将后续添加;batch_size不能太小,否则可能影响精度

**\*\*\*\*\* 2019-10-06: albert_xlarge_zh \*\*\*\*\***
##### Update
**\*\*\*\*\* 2019-10-15: add albert_tiny_zh, speed 10 times fast than bert base for training and inference, acc only 1.5 percentage less \*\*\*\*\***

Released albert_xlarge_zh, 59M parameters, half parameters of bert_base, 200M.
**\*\*\*\*\* 2019-10-07: more models of albert \*\*\*\*\***

rank top 1 for LCQMC dataset up to now, up 0.5 percentage
add albert_xlarge_zh; albert_base_zh_additional_steps, training with more instances

**\*\*\*\*\* 2019-10-04: PyTorch and Keras versions of albert were supported \*\*\*\*\***

Convert to PyTorch version and do your tasks through <a href="https://github.com/lonePatient/albert_pytorch">albert_pytorch</a>
a.Convert to PyTorch version and do your tasks through <a href="https://github.com/lonePatient/albert_pytorch">albert_pytorch</a>

b.Load pre-trained model with keras using one line of codes through <a href="https://github.com/bojone/bert4keras">bert4keras</a>

Load pre-trained model with keras using one line of codes through <a href="https://github.com/bojone/bert4keras">bert4keras</a>
c. albert with TensorFlow 2.0: Use or load pre-trained model with tf2.0 through <a href="https://github.com/kpe/bert-for-tf2">bert-for-tf2</a>

Releasing albert_xlarge on 6th Oct

Expand All @@ -46,29 +69,6 @@ Relased albert_large_zh with only 16% parameters of bert_base(64M)

Add codes and test functions for three main changes of albert from bert

模型下载 Download Pre-trained Models of Chinese
-----------------------------------------------
1、<a href="https://storage.googleapis.com/albert_zh/albert_tiny.zip">albert_tiny_zh</a>, 文件大小16M

训练和推理预测速度提升10倍,精度基本保留,模型大小为bert的1/25;在语义相似度数据集LCQMC的测试集上达到85.4%,相比bert_base仅下降1.5个点。

lcqmc训练使用如下参数: --max_seq_length=128 --train_batch_size=64 --learning_rate=1e-4 --num_train_epochs=5 \

2、<a href="https://storage.googleapis.com/albert_zh/albert_large_zh.zip">albert_large_zh</a>,参数量,层数24,文件大小为64M

参数量和模型大小为bert_base的六分之一;在口语化描述相似性数据集LCQMC的测试集上相比bert_base上升0.2个点

3、<a href="https://storage.googleapis.com/albert_zh/albert_base_zh_additional_36k_steps.zip">albert_base_zh(额外训练了1.5亿个实例即 36k steps * batch_size 4096)</a>; <a href="https://storage.googleapis.com/albert_zh/albert_base_zh.zip"> albert_base_zh(小模型体验版)</a>, 参数量12M, 层数12,大小为40M

参数量为bert_base的十分之一,模型大小也十分之一;在口语化描述相似性数据集LCQMC的测试集上相比bert_base下降约0.6~1个点;
相比未预训练,albert_base提升14个点

4、<a href="https://storage.googleapis.com/albert_zh/albert_xlarge_zh_177k.zip">albert_xlarge_zh_177k </a>;
<a href="https://storage.googleapis.com/albert_zh/albert_xlarge_zh_183k.zip">albert_xlarge_zh_183k</a>参数量,层数24,文件大小为230M

参数量和模型大小为bert_base的二分之一;需要一张大的显卡;完整测试对比将后续添加;batch_size不能太小,否则可能影响精度


ALBERT模型介绍 Introduction of ALBERT
-----------------------------------------------
ALBERT模型是BERT的改进版,与最近其他State of the art的模型不同的是,这次是预训练小模型,效果更好、参数更少。
Expand Down Expand Up @@ -119,15 +119,6 @@ ALBERT模型是BERT的改进版,与最近其他State of the art的模型不同
即以不同的概率使用n-gram,uni-gram的概率最大,bi-gram其次,tri-gram概率最小。
本项目中目前使用的是在中文上做whole word mask,稍后会更新一下与n-gram mask的效果对比。n-gram从spanBERT中来。

发布计划 Release Plan
-----------------------------------------------
1、albert_base, 参数量12M, 层数12,10月7号

2、albert_large, 参数量18M, 层数24,10月13号

3、albert_xlarge, 参数量59M, 层数24,10月6号

4、albert_xxlarge, 参数量233M, 层数12,10月7号(效果最佳的模型)

训练语料/训练配置 Training Data & Configuration
-----------------------------------------------
Expand All @@ -142,7 +133,6 @@ ALBERT模型是BERT的改进版,与最近其他State of the art的模型不同
训练使用TPU v3 Pod,我们使用的是v3-256,它包含32个v3-8。每个v3-8机器,含有128G的显存。



模型性能与对比(英文) Performance and Comparision
-----------------------------------------------
<img src="https://github.com/brightmart/albert_zh/blob/master/resources/state_of_the_art.jpg" width="80%" height="40%" />
Expand Down Expand Up @@ -195,6 +185,9 @@ ALBERT模型是BERT的改进版,与最近其他State of the art的模型不同

注:只跑了一次ALBERT-xlarge,效果还可能提升

<img src="https://github.com/brightmart/albert_zh/blob/master/resources/crmc2018_compare_s.jpg" width="80%" height="40%" />


### 语言模型、文本段预测准确性、训练时间 Mask Language Model Accuarcy & Training Time

| Model | MLM eval acc | SOP eval acc | Training(Hours) | Loss eval |
Expand Down Expand Up @@ -295,7 +288,7 @@ We will use LCQMC dataset for fine-tuning, it is oral language corpus, it is use
load pre-trained model with bert4keras


12G显存机器-支持的序列长度与批次大小的关系 Trade off between batch Size and sequence length
支持的序列长度与批次大小的关系,12G显存 Trade off between batch Size and sequence length
-------------------------------------------------

System | Seq Length | Max Batch Size
Expand Down Expand Up @@ -355,7 +348,13 @@ Reference

7、<a href="http://baijiahao.baidu.com/s?id=1645712785366950083&wfr=spider&for=pc">预训练小模型也能拿下13项NLP任务,ALBERT三大改造登顶GLUE基准</a>

8、<a href="https://github.com/bojone/bert4keras">bert4keras</a>
8、 <a href="https://github.com/lonePatient/albert_pytorch">albert_pytorch</a>

9、<a href="https://github.com/bojone/bert4keras">load albert with keras</a>

10、<a href="https://github.com/kpe/bert-for-tf2">load albert with tf2.0</a>

11、<a href="https://github.com/chineseGLUE/chineseGLUE">chineseGLUE-中文任务基准测评:更多公开可用数据集、基线模型、广泛测评与效果对比</a>



Expand Down
Binary file added resources/albert_tiny_compare_s.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added resources/crmc2018_compare_s.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 22c6dec

Please sign in to comment.