ai4bharat / indic-bert

huggingface.co
Total runs: 1.2M
24-hour runs: 645
7-day runs: 206.5K
30-day runs: 783.0K
Model's Last Updated: August 08 2022

Introduction of indic-bert

Model Details of indic-bert

IndicBERT

IndicBERT is a multilingual ALBERT model pretrained exclusively on 12 major Indian languages. It is pre-trained on our novel monolingual corpus of around 9 billion tokens and subsequently evaluated on a set of diverse tasks. IndicBERT has much fewer parameters than other multilingual models (mBERT, XLM-R etc.) while it also achieves a performance on-par or better than these models.

The 12 languages covered by IndicBERT are: Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.

The code can be found here . For more information, checkout our project page or our paper .

Pretraining Corpus

We pre-trained indic-bert on AI4Bharat's monolingual corpus. The corpus has the following distribution of languages:

Language as bn en gu hi kn
No. of Tokens 36.9M 815M 1.34B 724M 1.84B 712M
Language ml mr or pa ta te all
No. of Tokens 767M 560M 104M 814M 549M 671M 8.9B
Evaluation Results

IndicBERT is evaluated on IndicGLUE and some additional tasks. The results are summarized below. For more details about the tasks, refer our official repo

IndicGLUE
Task mBERT XLM-R IndicBERT
News Article Headline Prediction 89.58 95.52 95.87
Wikipedia Section Title Prediction 73.66 66.33 73.31
Cloze-style multiple-choice QA 39.16 27.98 41.87
Article Genre Classification 90.63 97.03 97.34
Named Entity Recognition (F1-score) 73.24 65.93 64.47
Cross-Lingual Sentence Retrieval Task 21.46 13.74 27.12
Average 64.62 61.09 66.66
Additional Tasks
Task Task Type mBERT XLM-R IndicBERT
BBC News Classification Genre Classification 60.55 75.52 74.60
IIT Product Reviews Sentiment Analysis 74.57 78.97 71.32
IITP Movie Reviews Sentiment Analaysis 56.77 61.61 59.03
Soham News Article Genre Classification 80.23 87.6 78.45
Midas Discourse Discourse Analysis 71.20 79.94 78.44
iNLTK Headlines Classification Genre Classification 87.95 93.38 94.52
ACTSA Sentiment Analysis Sentiment Analysis 48.53 59.33 61.18
Winograd NLI Natural Language Inference 56.34 55.87 56.34
Choice of Plausible Alternative (COPA) Natural Language Inference 54.92 51.13 58.33
Amrita Exact Paraphrase Paraphrase Detection 93.81 93.02 93.75
Amrita Rough Paraphrase Paraphrase Detection 83.38 82.20 84.33
Average 69.84 74.42 73.66

* Note: all models have been restricted to a max_seq_length of 128.

Downloads

The model can be downloaded here . Both tf checkpoints and pytorch binaries are included in the archive. Alternatively, you can also download it from Huggingface .

Citing

If you are using any of the resources, please cite the following article:

@inproceedings{kakwani2020indicnlpsuite,
    title={{IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages}},
    author={Divyanshu Kakwani and Anoop Kunchukuttan and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
    year={2020},
    booktitle={Findings of EMNLP},
}

We would like to hear from you if:

  • You are using our resources. Please let us know how you are putting these resources to use.
  • You have any feedback on these resources.
License

The IndicBERT code (and models) are released under the MIT License.

Contributors
  • Divyanshu Kakwani
  • Anoop Kunchukuttan
  • Gokul NC
  • Satish Golla
  • Avik Bhattacharyya
  • Mitesh Khapra
  • Pratyush Kumar

This work is the outcome of a volunteer effort as part of AI4Bharat initiative .

Contact

Runs of ai4bharat indic-bert on huggingface.co

1.2M
Total runs
645
24-hour runs
45.5K
3-day runs
206.5K
7-day runs
783.0K
30-day runs

More Information About indic-bert huggingface.co Model

More indic-bert license Visit here:

https://choosealicense.com/licenses/mit

indic-bert huggingface.co

indic-bert huggingface.co is an AI model on huggingface.co that provides indic-bert's model effect (), which can be used instantly with this ai4bharat indic-bert model. huggingface.co supports a free trial of the indic-bert model, and also provides paid use of the indic-bert. Support call indic-bert model through api, including Node.js, Python, http.

ai4bharat indic-bert online free

indic-bert huggingface.co is an online trial and call api platform, which integrates indic-bert's modeling effects, including api services, and provides a free online trial of indic-bert, you can try indic-bert online for free by clicking the link below.

ai4bharat indic-bert online free url in huggingface.co:

https://huggingface.co/ai4bharat/indic-bert

indic-bert install

indic-bert is an open source model from GitHub that offers a free installation service, and any user can find indic-bert on GitHub to install. At the same time, huggingface.co provides the effect of indic-bert install, users can directly use indic-bert installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

indic-bert install url in huggingface.co:

https://huggingface.co/ai4bharat/indic-bert

Url of indic-bert

indic-bert huggingface.co Url

Provider of indic-bert huggingface.co

ai4bharat
ORGANIZATIONS

Other API from ai4bharat

huggingface.co

Total runs: 49.5K
Run Growth: -5.8K
Growth Rate: -11.65%
Updated: December 21 2022
huggingface.co

Total runs: 1.5K
Run Growth: -823
Growth Rate: -54.72%
Updated: August 08 2022
huggingface.co

Total runs: 1.2K
Run Growth: -3.0K
Growth Rate: -249.87%
Updated: March 11 2024
huggingface.co

Total runs: 221
Run Growth: -81
Growth Rate: -23.62%
Updated: October 18 2024
huggingface.co

Total runs: 26
Run Growth: 9
Growth Rate: 42.86%
Updated: October 18 2024
huggingface.co

Total runs: 20
Run Growth: 7
Growth Rate: 30.43%
Updated: October 18 2024
huggingface.co

Total runs: 14
Run Growth: -3
Growth Rate: -11.54%
Updated: October 18 2024
huggingface.co

Total runs: 5
Run Growth: 17
Growth Rate: 56.67%
Updated: October 18 2024
huggingface.co

Total runs: 5
Run Growth: 10
Growth Rate: 40.00%
Updated: October 18 2024