Introduction of electra-base-turkish-mc4-cased-discriminator
Model Details of electra-base-turkish-mc4-cased-discriminator
🇹🇷 Turkish ELECTRA model
We present community-driven BERT, DistilBERT, ELECTRA and ConvBERT models for Turkish 🎉
Some datasets used for pretraining and evaluation are contributed from the
awesome Turkish NLP community, as well as the decision for the BERT model name: BERTurk.
We've also trained an ELECTRA (cased) model on the recently released Turkish part of the
multiligual C4 (mC4) corpus
from the AI2 team.
After filtering documents with a broken encoding, the training corpus has a size of 242GB resulting
in 31,240,963,926 tokens.
We used the original 32k vocab (instead of creating a new one).
mC4 ELECTRA
In addition to the ELEC
TR
A base model, we also trained an ELECTRA model on the Turkish part of the mC4 corpus. We use a
sequence length of 512 over the full training time and train the model for 1M steps on a v3-32 TPU.
Model usage
All trained models can be used from the
DBMDZ
Hugging Face
model hub page
using their model name.
Example usage with 🤗/Transformers:
tokenizer = AutoTokenizer.from_pretrained("dbmdz/electra-base-turkish-mc4-cased-discriminator")
model = AutoModel.from_pretrained("dbmdz/electra-base-turkish-mc4-cased-discriminator")
Citation
You can use the following BibTeX entry for citation:
@software{stefan_schweter_2020_3770924,
author = {Stefan Schweter},
title = {BERTurk - BERT models for Turkish},
month = apr,
year = 2020,
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.3770924},
url = {https://doi.org/10.5281/zenodo.3770924}
}
Acknowledgments
Thanks to
Kemal Oflazer
for providing us
additional large corpora for Turkish. Many thanks to Reyyan Yeniterzi for providing
us the Turkish NER dataset for evaluation.
We would like to thank
Merve Noyan
for the
awesome logo!
Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
Thanks for providing access to the TFRC ❤️
Runs of dbmdz electra-base-turkish-mc4-cased-discriminator on huggingface.co
88
Total runs
0
24-hour runs
1
3-day runs
-2
7-day runs
55
30-day runs
More Information About electra-base-turkish-mc4-cased-discriminator huggingface.co Model
More electra-base-turkish-mc4-cased-discriminator license Visit here:
electra-base-turkish-mc4-cased-discriminator huggingface.co is an AI model on huggingface.co that provides electra-base-turkish-mc4-cased-discriminator's model effect (), which can be used instantly with this dbmdz electra-base-turkish-mc4-cased-discriminator model. huggingface.co supports a free trial of the electra-base-turkish-mc4-cased-discriminator model, and also provides paid use of the electra-base-turkish-mc4-cased-discriminator. Support call electra-base-turkish-mc4-cased-discriminator model through api, including Node.js, Python, http.
electra-base-turkish-mc4-cased-discriminator huggingface.co is an online trial and call api platform, which integrates electra-base-turkish-mc4-cased-discriminator's modeling effects, including api services, and provides a free online trial of electra-base-turkish-mc4-cased-discriminator, you can try electra-base-turkish-mc4-cased-discriminator online for free by clicking the link below.
dbmdz electra-base-turkish-mc4-cased-discriminator online free url in huggingface.co:
electra-base-turkish-mc4-cased-discriminator is an open source model from GitHub that offers a free installation service, and any user can find electra-base-turkish-mc4-cased-discriminator on GitHub to install. At the same time, huggingface.co provides the effect of electra-base-turkish-mc4-cased-discriminator install, users can directly use electra-base-turkish-mc4-cased-discriminator installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
electra-base-turkish-mc4-cased-discriminator install url in huggingface.co: