fasttext-language-identification huggingface.co api & facebook fasttext-language-identification github AI Model

Introduction of fasttext-language-identification

Model Details of fasttext-language-identification

fastText (Language Identification)

fastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. It was introduced in this paper . The official website can be found here .

This LID (Language IDentification) model is used to predict the language of the input text, and the hosted version ( lid218e ) was released as part of the NLLB project and can detect 217 languages. You can find older versions (ones that can identify 157 languages) on the official fastText website .

Model description

fastText is a library for efficient learning of word representations and sentence classification. fastText is designed to be simple to use for developers, domain experts, and students. It's dedicated to text classification and learning word representations, and was designed to allow for quick model iteration and refinement without specialized hardware. fastText models can be trained on more than a billion words on any multicore CPU in less than a few minutes.

It includes pre-trained models learned on Wikipedia and in over 157 different languages. fastText can be used as a command line, linked to a C++ application, or used as a library for use cases from experimentation and prototyping to production.

Intended uses & limitations

You can use pre-trained word vectors for text classification or language identification. See the tutorials and resources on its official website to look for tasks that interest you.

How to use

Here is how to use this model to detect the language of a given text:

>>> import fasttext
>>> from huggingface_hub import hf_hub_download

>>> model_path = hf_hub_download(repo_id="facebook/fasttext-language-identification", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")

(('__label__eng_Latn',), array([0.81148803]))

>>> model.predict("Hello, world!", k=5)

(('__label__eng_Latn', '__label__vie_Latn', '__label__nld_Latn', '__label__pol_Latn', '__label__deu_Latn'), 
 array([0.61224753, 0.21323682, 0.09696738, 0.01359863, 0.01319415]))

Limitations and bias

Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions.

Cosine similarity can be used to measure the similarity between two different word vectors. If two two vectors are identical, the cosine similarity will be 1. For two completely unrelated vectors, the value will be 0. If two vectors have an opposite relationship, the value will be -1.

>>> import numpy as np

>>> def cosine_similarity(word1, word2):
>>>     return np.dot(model[word1], model[word2]) / (np.linalg.norm(model[word1]) * np.linalg.norm(model[word2]))

>>> cosine_similarity("man", "boy")

0.061653383

>>> cosine_similarity("man", "ceo")

0.11989131

>>> cosine_similarity("woman", "ceo")

-0.08834904

Training data

Pre-trained word vectors for 157 languages were trained on Common Crawl and Wikipedia using fastText. These models were trained using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives. We also distribute three new word analogy datasets, for French, Hindi and Polish.

Training procedure

Tokenization

We used the Stanford word segmenter for Chinese, Mecab for Japanese and UETsegmenter for Vietnamese. For languages using the Latin, Cyrillic, Hebrew or Greek scripts, we used the tokenizer from the Europarl preprocessing tools. For the remaining languages, we used the ICU tokenizer.

More information about the training of these models can be found in the article Learning Word Vectors for 157 Languages .

License

The language identification model is distributed under the Creative Commons Attribution-NonCommercial 4.0 International Public License .

Evaluation datasets

The analogy evaluation datasets described in the paper are available here: French , Hindi , Polish .

BibTeX entry and citation info

Please cite [1] if using this code for learning word representations or [2] if using for text classification.

[1] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2016enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},
  year={2016}
}

[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification

@article{joulin2016bag,
  title={Bag of Tricks for Efficient Text Classification},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.01759},
  year={2016}
}

[3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models

@article{joulin2016fasttext,
  title={FastText.zip: Compressing text classification models},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{'e}gou, H{'e}rve and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1612.03651},
  year={2016}
}

If you use these word vectors, please cite the following paper:

[4] E. Grave*, P. Bojanowski*, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages

@inproceedings{grave2018learning,
  title={Learning Word Vectors for 157 Languages},
  author={Grave, Edouard and Bojanowski, Piotr and Gupta, Prakhar and Joulin, Armand and Mikolov, Tomas},
  booktitle={Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)},
  year={2018}
}

(* These authors contributed equally.)

Runs of facebook fasttext-language-identification on huggingface.co

3.0M

Total runs

6.3K

24-hour runs

10.8K

3-day runs

34.6K

7-day runs

2.7M

30-day runs

More Information About fasttext-language-identification huggingface.co Model

More fasttext-language-identification license Visit here:

https://choosealicense.com/licenses/cc-by-nc-4.0

fasttext-language-identification huggingface.co

fasttext-language-identification huggingface.co is an AI model on huggingface.co that provides fasttext-language-identification's model effect (), which can be used instantly with this facebook fasttext-language-identification model. huggingface.co supports a free trial of the fasttext-language-identification model, and also provides paid use of the fasttext-language-identification. Support call fasttext-language-identification model through api, including Node.js, Python, http.

fasttext-language-identification huggingface.co Url

https://huggingface.co/facebook/fasttext-language-identification

facebook fasttext-language-identification online free

fasttext-language-identification huggingface.co is an online trial and call api platform, which integrates fasttext-language-identification's modeling effects, including api services, and provides a free online trial of fasttext-language-identification, you can try fasttext-language-identification online for free by clicking the link below.

facebook fasttext-language-identification online free url in huggingface.co:

https://huggingface.co/facebook/fasttext-language-identification

fasttext-language-identification install

fasttext-language-identification is an open source model from GitHub that offers a free installation service, and any user can find fasttext-language-identification on GitHub to install. At the same time, huggingface.co provides the effect of fasttext-language-identification install, users can directly use fasttext-language-identification installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

fasttext-language-identification install url in huggingface.co:

https://huggingface.co/facebook/fasttext-language-identification

huggingface.co

facebook/dinov2-base

Total runs: 15.4M

Run Growth: 6.1M

Growth Rate: 39.73%

Updated: January 17 2024

huggingface.co

facebook/opt-125m

Total runs: 6.3M

Run Growth: 1.4M

Growth Rate: 22.16%

Updated: September 15 2023

huggingface.co

facebook/esmfold_v1

Total runs: 5.4M

Run Growth: 5.0M

Growth Rate: 92.86%

Updated: March 23 2023

huggingface.co

facebook/bart-large-cnn

Total runs: 3.7M

Run Growth: 251.1K

Growth Rate: 6.84%

Updated: February 14 2024

huggingface.co

facebook/esm2_t36_3B_UR50D

Total runs: 3.5M

Run Growth: 1.4M

Growth Rate: 40.13%

Updated: December 02 2022

huggingface.co

facebook/bart-large-mnli

Total runs: 3.1M

Run Growth: 5.6K

Growth Rate: 0.18%

Updated: September 05 2023

huggingface.co

facebook/wav2vec2-xls-r-300m

Total runs: 2.7M

Run Growth: 2.0M

Growth Rate: 92.09%

Updated: August 10 2022

huggingface.co

facebook/bart-base

Total runs: 2.6M

Run Growth: 304.5K

Growth Rate: 11.49%

Updated: November 17 2022

huggingface.co

facebook/wav2vec2-base-960h

Total runs: 2.2M

Run Growth: 393.4K

Growth Rate: 17.91%

Updated: November 15 2022

huggingface.co

facebook/rag-token-nq

Total runs: 2.2M

Run Growth: 2.0M

Growth Rate: 90.92%

Updated: November 14 2023

huggingface.co

facebook/roberta-hate-speech-dynabench-r4-target

Total runs: 1.9M

Run Growth: 1.0M

Growth Rate: 54.86%

Updated: March 17 2023

huggingface.co

facebook/esm2_t30_150M_UR50D

Total runs: 1.9M

Run Growth: -5.1M

Growth Rate: -271.31%

Updated: March 21 2023

huggingface.co

facebook/esm2_t6_8M_UR50D

Total runs: 1.7M

Run Growth: 1.4M

Growth Rate: 85.67%

Updated: March 21 2023

huggingface.co

facebook/m2m100_418M

Total runs: 1.2M

Run Growth: -23.4K

Growth Rate: -1.89%

Updated: February 29 2024

huggingface.co

facebook/musicgen-medium

Total runs: 1.2M

Run Growth: 1.2M

Growth Rate: 99.55%

Updated: November 17 2023

huggingface.co

facebook/encodec_24khz

Total runs: 1.1M

Run Growth: 364.1K

Growth Rate: 33.66%

Updated: July 25 2023

huggingface.co

facebook/hubert-large-ls960-ft

Total runs: 1.0M

Run Growth: -134.5K

Growth Rate: -12.85%

Updated: May 24 2022

huggingface.co

facebook/dinov2-large

Total runs: 1.0M

Run Growth: 419.8K

Growth Rate: 40.14%

Updated: September 06 2023

huggingface.co

facebook/esm2_t12_35M_UR50D

Total runs: 916.4K

Run Growth: -1.1M

Growth Rate: -122.76%

Updated: March 21 2023

huggingface.co

facebook/esm2_t33_650M_UR50D

Total runs: 861.9K

Run Growth: 577.5K

Growth Rate: 67.00%

Updated: March 21 2023

huggingface.co

facebook/wav2vec2-base

Total runs: 805.8K

Run Growth: -153.6K

Growth Rate: -19.24%

Updated: December 28 2021

huggingface.co

facebook/w2v-bert-2.0

Total runs: 773.3K

Run Growth: 515.3K

Growth Rate: 66.65%

Updated: January 25 2024

huggingface.co

facebook/sam-vit-base

Total runs: 740.9K

Run Growth: 186.0K

Growth Rate: 25.87%

Updated: January 12 2024

huggingface.co

facebook/mms-1b-all

Total runs: 650.1K

Run Growth: 497.6K

Growth Rate: 68.58%

Updated: June 15 2023

huggingface.co

facebook/m2m100_1.2B

Total runs: 638.0K

Run Growth: 539.7K

Growth Rate: 84.59%

Updated: November 16 2023

huggingface.co

facebook/dinov2-small

Total runs: 635.9K

Run Growth: 337.2K

Growth Rate: 53.25%

Updated: September 06 2023

huggingface.co

facebook/detr-resnet-50

Total runs: 630.4K

Run Growth: 217.4K

Growth Rate: 34.42%

Updated: April 10 2024

huggingface.co

facebook/wav2vec2-large-xlsr-53

Total runs: 570.1K

Run Growth: -310.4K

Growth Rate: -54.35%

Updated: March 19 2022

huggingface.co

facebook/mms-tts-yor

Total runs: 560.7K

Run Growth: 100.3K

Growth Rate: 18.93%

Updated: September 01 2023

huggingface.co

facebook/wav2vec2-large-960h

Total runs: 525.3K

Run Growth: 368.7K

Growth Rate: 83.95%

Updated: April 06 2022

huggingface.co

facebook/mask2former-swin-large-cityscapes-semantic

Total runs: 497.9K

Run Growth: 226.3K

Growth Rate: 45.52%

Updated: September 07 2023

huggingface.co

facebook/wav2vec2-xlsr-53-espeak-cv-ft

Total runs: 494.0K

Run Growth: 160.6K

Growth Rate: 32.26%

Updated: December 11 2021

huggingface.co

facebook/nllb-200-distilled-600M

Total runs: 480.6K

Run Growth: -182.9K

Growth Rate: -38.05%

Updated: February 15 2024

huggingface.co

facebook/contriever

Total runs: 353.9K

Run Growth: -34.3K

Growth Rate: -9.69%

Updated: January 20 2022

huggingface.co

facebook/opt-350m

Total runs: 352.9K

Run Growth: 176.2K

Growth Rate: 49.92%

Updated: September 15 2023

huggingface.co

facebook/sam-vit-huge

Total runs: 341.0K

Run Growth: 225.8K

Growth Rate: 66.79%

Updated: January 12 2024

huggingface.co

facebook/mbart-large-50-many-to-many-mmt

Total runs: 295.8K

Run Growth: 18.5K

Growth Rate: 6.26%

Updated: September 29 2023

huggingface.co

facebook/deit-base-distilled-patch16-384

Total runs: 236.6K

Run Growth: 216.9K

Growth Rate: 91.64%

Updated: September 13 2023

huggingface.co

facebook/dinov2-giant

Total runs: 234.6K

Run Growth: 142.8K

Growth Rate: 60.86%

Updated: September 06 2023

huggingface.co

facebook/contriever-msmarco

Total runs: 232.0K

Run Growth: 218.1K

Growth Rate: 93.98%

Updated: June 26 2022

huggingface.co

facebook/dino-vitb16

Total runs: 226.8K

Run Growth: 67.1K

Growth Rate: 29.28%

Updated: May 22 2023

huggingface.co

facebook/bart-large

Total runs: 222.7K

Run Growth: 88.3K

Growth Rate: 39.66%

Updated: June 03 2022

huggingface.co

facebook/detr-resnet-101

Total runs: 209.8K

Run Growth: 50.1K

Growth Rate: 24.08%

Updated: December 15 2023

huggingface.co

facebook/timesformer-base-finetuned-k400

Total runs: 204.7K

Run Growth: 169.3K

Growth Rate: 84.90%

Updated: January 02 2023

huggingface.co

facebook/mbart-large-50-many-to-one-mmt

Total runs: 191.0K

Run Growth: 124.1K

Growth Rate: 64.98%

Updated: March 28 2023

huggingface.co

facebook/wav2vec2-large-robust-ft-libri-960h

Total runs: 183.0K

Run Growth: -597.9K

Growth Rate: -338.24%

Updated: June 24 2023

huggingface.co

facebook/hubert-base-ls960

Total runs: 181.9K

Run Growth: 32.7K

Growth Rate: 18.00%

Updated: November 05 2021

huggingface.co

facebook/encodec_32khz

Total runs: 181.8K

Run Growth: -54.0K

Growth Rate: -29.70%

Updated: September 05 2023

huggingface.co

facebook/seamless-m4t-v2-large

Total runs: 176.5K

Run Growth: 132.2K

Growth Rate: 75.71%

Updated: January 04 2024

huggingface.co

facebook/sam2-hiera-large

Total runs: 166.6K

Run Growth: -228.0K

Growth Rate: -119.18%

Updated: August 15 2024

huggingface.co

facebook/convnextv2-atto-1k-224

Total runs: 166.4K

Run Growth: 164.2K

Growth Rate: 98.68%

Updated: September 05 2023

huggingface.co

facebook/sam2.1-hiera-large

Total runs: 163.8K

Run Growth: 9.9K

Growth Rate: 41.69%

Updated: September 24 2024

huggingface.co

facebook/detr-resnet-101-dc5

Total runs: 163.2K

Run Growth: 161.8K

Growth Rate: 99.15%

Updated: September 07 2023

huggingface.co

facebook/mask2former-swin-large-coco-panoptic

Total runs: 160.9K

Run Growth: -104.5K

Growth Rate: -62.19%

Updated: February 07 2023

huggingface.co

facebook/mask2former-swin-base-coco-panoptic

Total runs: 147.5K

Run Growth: 69.2K

Growth Rate: 57.60%

Updated: September 07 2023

huggingface.co

facebook/opt-1.3b

Total runs: 143.7K

Run Growth: 54.8K

Growth Rate: 38.13%

Updated: September 15 2023

huggingface.co

facebook/wav2vec2-large-960h-lv60-self

Total runs: 140.7K

Run Growth: -1.4M

Growth Rate: -1057.20%

Updated: May 24 2022

huggingface.co

facebook/deit-base-patch16-224

Total runs: 115.0K

Run Growth: -14.8K

Growth Rate: -12.98%

Updated: July 13 2022

huggingface.co

facebook/dpr-ctx_encoder-multiset-base

Total runs: 110.8K

Run Growth: 50.1K

Growth Rate: 45.19%

Updated: December 21 2022

huggingface.co

facebook/mask2former-swin-large-ade-semantic

Total runs: 108.4K

Run Growth: 42.4K

Growth Rate: 34.03%

Updated: September 12 2023

huggingface.co

facebook/mms-lid-256

Total runs: 108.3K

Run Growth: -5.1M

Growth Rate: -4749.39%

Updated: June 13 2023

huggingface.co

facebook/mask2former-swin-tiny-coco-instance

Total runs: 102.4K

Run Growth: -428.7K

Growth Rate: -417.06%

Updated: September 12 2023

huggingface.co

facebook/dpr-question_encoder-single-nq-base

Total runs: 92.0K

Run Growth: 15.5K

Growth Rate: 16.96%

Updated: December 21 2022

huggingface.co

facebook/dpr-reader-single-nq-base

Total runs: 91.8K

Run Growth: -59.5K

Growth Rate: -55.70%

Updated: December 21 2022

huggingface.co

facebook/wav2vec2-conformer-rope-large-960h-ft

Total runs: 86.4K

Run Growth: 28.4K

Growth Rate: 21.52%

Updated: March 21 2023

huggingface.co

facebook/deformable-detr-box-supervised

Total runs: 81.6K

Run Growth: -16.9K

Growth Rate: -20.77%

Updated: February 27 2023

huggingface.co

facebook/cotracker3

Total runs: 75.9K

Run Growth: -2.3K

Growth Rate: -44.21%

Updated: October 16 2024

huggingface.co

facebook/dpr-ctx_encoder-single-nq-base

Total runs: 75.1K

Run Growth: 39.0K

Growth Rate: 53.22%

Updated: December 21 2022

huggingface.co

facebook/dinov2-with-registers-base

Total runs: 72.8K

Run Growth: 0

Growth Rate: 0.00%

Updated: December 23 2024

huggingface.co

facebook/musicgen-small

Total runs: 62.2K

Run Growth: 10.7K

Growth Rate: 17.65%

Updated: November 17 2023

huggingface.co

facebook/opt-2.7b

Total runs: 58.0K

Run Growth: 22.4K

Growth Rate: 38.69%

Updated: September 15 2023

huggingface.co

facebook/metaclip-b32-400m

Total runs: 57.0K

Run Growth: 30.0K

Growth Rate: 52.58%

Updated: October 09 2023

huggingface.co

facebook/dino-vits16

Total runs: 56.5K

Run Growth: 27.0K

Growth Rate: 56.88%

Updated: May 22 2023

huggingface.co

facebook/wav2vec2-large-es-voxpopuli

Total runs: 54.8K

Run Growth: 10.0K

Growth Rate: 18.54%

Updated: July 06 2021

huggingface.co

facebook/blenderbot-400M-distill

Total runs: 52.7K

Run Growth: 3.0K

Growth Rate: 5.69%

Updated: March 31 2023

huggingface.co

facebook/dpr-question_encoder-multiset-base

Total runs: 52.1K

Run Growth: -9.1K

Growth Rate: -17.73%

Updated: December 21 2022

huggingface.co

facebook/convnext-large-224

Total runs: 51.6K

Run Growth: 49.6K

Growth Rate: 96.19%

Updated: June 14 2023

huggingface.co

facebook/sam-vit-large

Total runs: 51.4K

Run Growth: -44.9K

Growth Rate: -88.02%

Updated: January 12 2024

huggingface.co

facebook/opt-6.7b

Total runs: 46.6K

Run Growth: 3.0K

Growth Rate: 6.42%

Updated: January 25 2023

huggingface.co

facebook/metaclip-h14-fullcc2.5b

Total runs: 41.9K

Run Growth: 11.6K

Growth Rate: 27.55%

Updated: January 12 2024

huggingface.co

facebook/hubert-large-ll60k

Total runs: 37.9K

Run Growth: 33.6K

Growth Rate: 88.70%

Updated: November 05 2021

huggingface.co

facebook/deit-tiny-patch16-224

Total runs: 37.5K

Run Growth: 10.2K

Growth Rate: 27.72%

Updated: July 13 2022

huggingface.co

facebook/nllb-200-distilled-1.3B

Total runs: 37.2K

Run Growth: -2.5K

Growth Rate: -6.82%

Updated: February 12 2023

huggingface.co

facebook/sam2-hiera-tiny

Total runs: 36.4K

Run Growth: -30.0K

Growth Rate: -83.61%

Updated: August 15 2024

huggingface.co

facebook/dinov2-with-registers-giant

Total runs: 35.3K

Run Growth: 0

Growth Rate: 0.00%

Updated: December 23 2024

huggingface.co

facebook/xlm-roberta-xl

Total runs: 34.4K

Run Growth: -41.9K

Growth Rate: -130.30%

Updated: March 28 2024

huggingface.co

facebook/mms-300m

Total runs: 33.0K

Run Growth: 4.0K

Growth Rate: 12.12%

Updated: June 05 2023

huggingface.co

facebook/deit-base-distilled-patch16-224

Total runs: 30.5K

Run Growth: -73.7K

Growth Rate: -240.07%

Updated: July 13 2022

huggingface.co

facebook/convnextv2-tiny-1k-224

Total runs: 29.0K

Run Growth: 6.6K

Growth Rate: 40.08%

Updated: November 28 2023

huggingface.co

facebook/nllb-200-3.3B

Total runs: 28.5K

Run Growth: 6.8K

Growth Rate: 23.74%

Updated: February 12 2023

huggingface.co

facebook/incoder-6B

Total runs: 28.2K

Run Growth: 12.7K

Growth Rate: 45.10%

Updated: January 25 2023

huggingface.co

facebook/esm1v_t33_650M_UR90S_1

Total runs: 26.9K

Run Growth: 14.3K

Growth Rate: 53.04%

Updated: November 16 2022

huggingface.co

facebook/vit-mae-base

Total runs: 25.8K

Run Growth: -39.2K

Growth Rate: -157.99%

Updated: March 13 2024

huggingface.co

facebook/rag-sequence-nq

Total runs: 24.9K

Run Growth: 3.7K

Growth Rate: 14.91%

Updated: March 12 2021

huggingface.co

facebook/mbart-large-50

Total runs: 23.6K

Run Growth: -619

Growth Rate: -2.62%

Updated: March 28 2023

huggingface.co

facebook/chameleon-7b

Total runs: 22.1K

Run Growth: 11.2K

Growth Rate: 50.76%

Updated: July 23 2024

huggingface.co

facebook/wmt19-en-de

Total runs: 21.2K

Run Growth: -3.1K

Growth Rate: -15.51%

Updated: September 15 2023

huggingface.co

facebook/mms-tts-eng

Total runs: 21.0K

Run Growth: -4.6K

Growth Rate: -20.53%

Updated: September 06 2023

huggingface.co

facebook/xlm-roberta-xxl

Total runs: 20.6K

Run Growth: 3.2K

Growth Rate: 15.62%

Updated: August 08 2022

huggingface.co

facebook/opt-13b

Total runs: 18.7K

Run Growth: -4.9K

Growth Rate: -26.01%

Updated: January 25 2023

facebook / fasttext-language-identification

Introduction of fasttext-language-identification

Model Details of fasttext-language-identification

fastText (Language Identification)

Model description

Intended uses & limitations

How to use

Limitations and bias

Training data

Training procedure

Tokenization

License

Evaluation datasets

BibTeX entry and citation info

Runs of facebook fasttext-language-identification on huggingface.co

More Information About fasttext-language-identification huggingface.co Model

More fasttext-language-identification license Visit here:

fasttext-language-identification huggingface.co

fasttext-language-identification huggingface.co Url

facebook fasttext-language-identification online free

facebook fasttext-language-identification online free url in huggingface.co:

fasttext-language-identification install

fasttext-language-identification install url in huggingface.co:

Url of fasttext-language-identification

fasttext-language-identification huggingface.co Url

Provider of fasttext-language-identification huggingface.co

Other API from facebook