cointegrated / SONAR_200_text_encoder

huggingface.co
Total runs: 2.9K
24-hour runs: 0
7-day runs: -2.8K
30-day runs: -5.3K
Model's Last Updated: 2024年11月8日
sentence-similarity

Introduction of SONAR_200_text_encoder

Model Details of SONAR_200_text_encoder

This is a port of the multilingual SONAR text encoder ( https://huggingface.co/facebook/SONAR ) to the transformers format from fairseq2 .

Its embeddings are expected be equal to those the official implementation ( https://github.com/facebookresearch/SONAR ), but the latter stays the source of truth.

The encoder supports the same 202 languages as NLLB-200 (see also the source model card and FLORES-200 lang code mapping ).

How to compute embeddings:

# !pip install transformers sentencepiece -q

import torch
from transformers import AutoTokenizer
from transformers.models.m2m_100.modeling_m2m_100 import M2M100Encoder

model_name = "cointegrated/SONAR_200_text_encoder"
encoder = M2M100Encoder.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def encode_mean_pool(texts, tokenizer, encoder, lang='eng_Latn', norm=False):
    tokenizer.src_lang = lang
    with torch.inference_mode():
        batch = tokenizer(texts, return_tensors='pt', padding=True)
        seq_embs = encoder(**batch).last_hidden_state
        mask = batch.attention_mask
        mean_emb = (seq_embs * mask.unsqueeze(-1)).sum(1) / mask.unsqueeze(-1).sum(1)
        if norm:
            mean_emb = torch.nn.functional.normalize(mean_emb)
    return mean_emb

sentences = ['My name is SONAR.', 'I can embed the sentences into vectorial space.']
embs = encode_mean_pool(sentences, tokenizer, encoder, lang="eng_Latn")
print(embs.shape)  
# torch.Size([2, 1024])
print(embs)
# tensor([[-0.0053,  0.0020, -0.0006,  ...,  0.0094, -0.0009,  0.0070],
#         [-0.0003, -0.0071,  0.0076,  ...,  0.0055,  0.0022, -0.0083]])

For advanced examples of usage, please take a look at the readme in https://github.com/facebookresearch/SONAR .

The model was repacked in this notebook .

Runs of cointegrated SONAR_200_text_encoder on huggingface.co

2.9K
Total runs
0
24-hour runs
-603
3-day runs
-2.8K
7-day runs
-5.3K
30-day runs

More Information About SONAR_200_text_encoder huggingface.co Model

More SONAR_200_text_encoder license Visit here:

https://choosealicense.com/licenses/cc-by-nc-4.0

SONAR_200_text_encoder huggingface.co

SONAR_200_text_encoder huggingface.co is an AI model on huggingface.co that provides SONAR_200_text_encoder's model effect (), which can be used instantly with this cointegrated SONAR_200_text_encoder model. huggingface.co supports a free trial of the SONAR_200_text_encoder model, and also provides paid use of the SONAR_200_text_encoder. Support call SONAR_200_text_encoder model through api, including Node.js, Python, http.

SONAR_200_text_encoder huggingface.co Url

https://huggingface.co/cointegrated/SONAR_200_text_encoder

cointegrated SONAR_200_text_encoder online free

SONAR_200_text_encoder huggingface.co is an online trial and call api platform, which integrates SONAR_200_text_encoder's modeling effects, including api services, and provides a free online trial of SONAR_200_text_encoder, you can try SONAR_200_text_encoder online for free by clicking the link below.

cointegrated SONAR_200_text_encoder online free url in huggingface.co:

https://huggingface.co/cointegrated/SONAR_200_text_encoder

SONAR_200_text_encoder install

SONAR_200_text_encoder is an open source model from GitHub that offers a free installation service, and any user can find SONAR_200_text_encoder on GitHub to install. At the same time, huggingface.co provides the effect of SONAR_200_text_encoder install, users can directly use SONAR_200_text_encoder installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

SONAR_200_text_encoder install url in huggingface.co:

https://huggingface.co/cointegrated/SONAR_200_text_encoder

Url of SONAR_200_text_encoder

SONAR_200_text_encoder huggingface.co Url

Provider of SONAR_200_text_encoder huggingface.co

cointegrated
ORGANIZATIONS

Other API from cointegrated