dbmdz / convbert-base-turkish-mc4-uncased

huggingface.co
Total runs: 451
24-hour runs: 0
7-day runs: -339
30-day runs: -1.3K
Model's Last Updated: September 11 2023
fill-mask

Introduction of convbert-base-turkish-mc4-uncased

Model Details of convbert-base-turkish-mc4-uncased

🇹🇷 Turkish ConvBERT model

Logo provided by Merve Noyan

DOI

We present community-driven BERT, DistilBERT, ELECTRA and ConvBERT models for Turkish 🎉

Some datasets used for pretraining and evaluation are contributed from the awesome Turkish NLP community, as well as the decision for the BERT model name: BERTurk.

Logo is provided by Merve Noyan .

Stats

We've trained an (uncased) ConvBERT model on the recently released Turkish part of the multiligual C4 (mC4) corpus from the AI2 team.

After filtering documents with a broken encoding, the training corpus has a size of 242GB resulting in 31,240,963,926 tokens.

We used the original 32k vocab (instead of creating a new one).

mC4 ConvBERT

In addition to the ELEC TR A base model, we also trained an ConvBERT model on the Turkish part of the mC4 corpus. We use a sequence length of 512 over the full training time and train the model for 1M steps on a v3-32 TPU.

Model usage

All trained models can be used from the DBMDZ Hugging Face model hub page using their model name.

Example usage with 🤗/Transformers:

tokenizer = AutoTokenizer.from_pretrained("dbmdz/convbert-base-turkish-mc4-uncased")

model = AutoModel.from_pretrained("dbmdz/convbert-base-turkish-mc4-uncased")

Citation

You can use the following BibTeX entry for citation:

@software{stefan_schweter_2020_3770924,
  author       = {Stefan Schweter},
  title        = {BERTurk - BERT models for Turkish},
  month        = apr,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {1.0.0},
  doi          = {10.5281/zenodo.3770924},
  url          = {https://doi.org/10.5281/zenodo.3770924}
}

Acknowledgments

Thanks to Kemal Oflazer for providing us additional large corpora for Turkish. Many thanks to Reyyan Yeniterzi for providing us the Turkish NER dataset for evaluation.

We would like to thank Merve Noyan for the awesome logo!

Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️

Runs of dbmdz convbert-base-turkish-mc4-uncased on huggingface.co

451
Total runs
0
24-hour runs
-19
3-day runs
-339
7-day runs
-1.3K
30-day runs

More Information About convbert-base-turkish-mc4-uncased huggingface.co Model

More convbert-base-turkish-mc4-uncased license Visit here:

https://choosealicense.com/licenses/mit

convbert-base-turkish-mc4-uncased huggingface.co

convbert-base-turkish-mc4-uncased huggingface.co is an AI model on huggingface.co that provides convbert-base-turkish-mc4-uncased's model effect (), which can be used instantly with this dbmdz convbert-base-turkish-mc4-uncased model. huggingface.co supports a free trial of the convbert-base-turkish-mc4-uncased model, and also provides paid use of the convbert-base-turkish-mc4-uncased. Support call convbert-base-turkish-mc4-uncased model through api, including Node.js, Python, http.

convbert-base-turkish-mc4-uncased huggingface.co Url

https://huggingface.co/dbmdz/convbert-base-turkish-mc4-uncased

dbmdz convbert-base-turkish-mc4-uncased online free

convbert-base-turkish-mc4-uncased huggingface.co is an online trial and call api platform, which integrates convbert-base-turkish-mc4-uncased's modeling effects, including api services, and provides a free online trial of convbert-base-turkish-mc4-uncased, you can try convbert-base-turkish-mc4-uncased online for free by clicking the link below.

dbmdz convbert-base-turkish-mc4-uncased online free url in huggingface.co:

https://huggingface.co/dbmdz/convbert-base-turkish-mc4-uncased

convbert-base-turkish-mc4-uncased install

convbert-base-turkish-mc4-uncased is an open source model from GitHub that offers a free installation service, and any user can find convbert-base-turkish-mc4-uncased on GitHub to install. At the same time, huggingface.co provides the effect of convbert-base-turkish-mc4-uncased install, users can directly use convbert-base-turkish-mc4-uncased installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

convbert-base-turkish-mc4-uncased install url in huggingface.co:

https://huggingface.co/dbmdz/convbert-base-turkish-mc4-uncased

Url of convbert-base-turkish-mc4-uncased

convbert-base-turkish-mc4-uncased huggingface.co Url

Provider of convbert-base-turkish-mc4-uncased huggingface.co

dbmdz
ORGANIZATIONS

Other API from dbmdz

huggingface.co

Total runs: 8.9K
Run Growth: 266
Growth Rate: 3.00%
Updated: December 14 2023