We have shown that the standard BERT recipe (including model architecture and training objective) is effective on a wide range of model sizes, beyond BERT-Base and BERT-Large. The smaller BERT models are intended for environments with restricted computational resources. They can be fine-tuned in the same manner as the original BERT models. However, they are most effective in the context of knowledge distillation, where the fine-tuning labels are produced by a larger and more accurate teacher.
Our goal is to enable research in institutions with fewer computational resources and encourage the community to seek directions of innovation alternative to increasing model capacity.
You can download the 24 BERT miniatures either from the
official BERT Github page
, or via HuggingFace from the links below:
Note that the BERT-Base model in this release is included for completeness only; it was re-trained under the same regime as the original model.
Here are the corresponding GLUE scores on the test set:
Model
Score
CoLA
SST-2
MRPC
STS-B
QQP
MNLI-m
MNLI-mm
QNLI(v2)
RTE
WNLI
AX
BERT-Tiny
64.2
0.0
83.2
81.1/71.1
74.3/73.6
62.2/83.4
70.2
70.3
81.5
57.2
62.3
21.0
BERT-Mini
65.8
0.0
85.9
81.1/71.8
75.4/73.3
66.4/86.2
74.8
74.3
84.1
57.9
62.3
26.1
BERT-Small
71.2
27.8
89.7
83.4/76.2
78.8/77.0
68.1/87.0
77.6
77.0
86.4
61.8
62.3
28.6
BERT-Medium
73.5
38.0
89.6
86.6/81.6
80.4/78.4
69.6/87.9
80.0
79.1
87.7
62.2
62.3
30.5
For each task, we selected the best fine-tuning hyperparameters from the lists below, and trained for 4 epochs:
batch sizes: 8, 16, 32, 64, 128
learning rates: 3e-4, 1e-4, 5e-5, 3e-5
If you use these models, please cite the following paper:
@article{turc2019,
title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1908.08962v2 },
year={2019}
}
Runs of google bert_uncased_L-4_H-256_A-4 on huggingface.co
52.2K
Total runs
0
24-hour runs
959
3-day runs
684
7-day runs
22.7K
30-day runs
More Information About bert_uncased_L-4_H-256_A-4 huggingface.co Model
More bert_uncased_L-4_H-256_A-4 license Visit here:
bert_uncased_L-4_H-256_A-4 huggingface.co is an AI model on huggingface.co that provides bert_uncased_L-4_H-256_A-4's model effect (), which can be used instantly with this google bert_uncased_L-4_H-256_A-4 model. huggingface.co supports a free trial of the bert_uncased_L-4_H-256_A-4 model, and also provides paid use of the bert_uncased_L-4_H-256_A-4. Support call bert_uncased_L-4_H-256_A-4 model through api, including Node.js, Python, http.
bert_uncased_L-4_H-256_A-4 huggingface.co is an online trial and call api platform, which integrates bert_uncased_L-4_H-256_A-4's modeling effects, including api services, and provides a free online trial of bert_uncased_L-4_H-256_A-4, you can try bert_uncased_L-4_H-256_A-4 online for free by clicking the link below.
google bert_uncased_L-4_H-256_A-4 online free url in huggingface.co:
bert_uncased_L-4_H-256_A-4 is an open source model from GitHub that offers a free installation service, and any user can find bert_uncased_L-4_H-256_A-4 on GitHub to install. At the same time, huggingface.co provides the effect of bert_uncased_L-4_H-256_A-4 install, users can directly use bert_uncased_L-4_H-256_A-4 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
bert_uncased_L-4_H-256_A-4 install url in huggingface.co: