This is the model card of IndicTrans2 Indic-En Distilled 200M variant.
Please refer to
section 7.6: Distilled Models
in the TMLR submission for further details on model training, data and metrics.
Usage Instructions
Please refer to the
github repository
for a detail description on how to use HF compatible IndicTrans2 models for inference.
import torch
from transformers import (
AutoModelForSeq2SeqLM,
AutoTokenizer,
)
from IndicTransTokenizer import IndicProcessor
model_name = "ai4bharat/indictrans2-indic-en-dist-200M"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True)
ip = IndicProcessor(inference=True)
input_sentences = [
"जब मैं छोटा था, मैं हर रोज़ पार्क जाता था।",
"हमने पिछले सप्ताह एक नई फिल्म देखी जो कि बहुत प्रेरणादायक थी।",
"अगर तुम मुझे उस समय पास मिलते, तो हम बाहर खाना खाने चलते।",
"मेरे मित्र ने मुझे उसके जन्मदिन की पार्टी में बुलाया है, और मैं उसे एक तोहफा दूंगा।",
]
src_lang, tgt_lang = "hin_Deva", "eng_Latn"
batch = ip.preprocess_batch(
input_sentences,
src_lang=src_lang,
tgt_lang=tgt_lang,
)
DEVICE = "cuda"if torch.cuda.is_available() else"cpu"# Tokenize the sentences and generate input encodings
inputs = tokenizer(
batch,
truncation=True,
padding="longest",
return_tensors="pt",
return_attention_mask=True,
).to(DEVICE)
# Generate translations using the modelwith torch.no_grad():
generated_tokens = model.generate(
**inputs,
use_cache=True,
min_length=0,
max_length=256,
num_beams=5,
num_return_sequences=1,
)
# Decode the generated tokens into textwith tokenizer.as_target_tokenizer():
generated_tokens = tokenizer.batch_decode(
generated_tokens.detach().cpu().tolist(),
skip_special_tokens=True,
clean_up_tokenization_spaces=True,
)
# Postprocess the translations, including entity replacement
translations = ip.postprocess_batch(generated_tokens, lang=tgt_lang)
for input_sentence, translation inzip(input_sentences, translations):
print(f"{src_lang}: {input_sentence}")
print(f"{tgt_lang}: {translation}")
Note: IndicTrans2 is now compatible with AutoTokenizer, however you need to use IndicProcessor from
IndicTransTokenizer
for preprocessing before tokenization.
Citation
If you consider using our work then please cite using:
@article{gala2023indictrans,
title={IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
author={Jay Gala and Pranjal A Chitale and A K Raghavan and Varun Gumma and Sumanth Doddapaneni and Aswanth Kumar M and Janki Atul Nawale and Anupama Sujatha and Ratish Puduppully and Vivek Raghavan and Pratyush Kumar and Mitesh M Khapra and Raj Dabre and Anoop Kunchukuttan},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=vfT4YuzAYA},
note={}
}
Runs of ai4bharat indictrans2-indic-en-dist-200M on huggingface.co
1.8K
Total runs
109
24-hour runs
172
3-day runs
420
7-day runs
301
30-day runs
More Information About indictrans2-indic-en-dist-200M huggingface.co Model
More indictrans2-indic-en-dist-200M license Visit here:
indictrans2-indic-en-dist-200M huggingface.co is an AI model on huggingface.co that provides indictrans2-indic-en-dist-200M's model effect (), which can be used instantly with this ai4bharat indictrans2-indic-en-dist-200M model. huggingface.co supports a free trial of the indictrans2-indic-en-dist-200M model, and also provides paid use of the indictrans2-indic-en-dist-200M. Support call indictrans2-indic-en-dist-200M model through api, including Node.js, Python, http.
indictrans2-indic-en-dist-200M huggingface.co is an online trial and call api platform, which integrates indictrans2-indic-en-dist-200M's modeling effects, including api services, and provides a free online trial of indictrans2-indic-en-dist-200M, you can try indictrans2-indic-en-dist-200M online for free by clicking the link below.
ai4bharat indictrans2-indic-en-dist-200M online free url in huggingface.co:
indictrans2-indic-en-dist-200M is an open source model from GitHub that offers a free installation service, and any user can find indictrans2-indic-en-dist-200M on GitHub to install. At the same time, huggingface.co provides the effect of indictrans2-indic-en-dist-200M install, users can directly use indictrans2-indic-en-dist-200M installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
indictrans2-indic-en-dist-200M install url in huggingface.co: