First half of the time model trained on the small part of all dataset (1%,3GB) and without prefixes in each task.
For RSG, we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
RSG submit here
https://russiansuperglue.com/login/submit_info/2060
Total training time was around 35 days on 160 V100 GPUs + 5 days on 80 A100.
Usage (HuggingFace Models Repository)
import torch
from transformers import GPT2Tokenizer, T5ForConditionalGeneration
tokenizer = GPT2Tokenizer.from_pretrained('ai-forever/FRED-T5-1.7B',eos_token='</s>')
model = T5ForConditionalGeneration.from_pretrained('ai-forever/FRED-T5-1.7B')
device='cuda'
model.to(device)
#Prefix <LM>
lm_text='<LM>Принялся Кутузов рассказывать свою историю как он сюда попал. Началось'
input_ids=torch.tensor([tokenizer.encode(lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
print(tokenizer.decode(outputs[0][1:]))
# print result: , как водится, с того, что он был в плену.</s>#Prefix <SC1>
lm_text='<SC1>Принялся Кутузов рассказывать свою историю <extra_id_0>. Началось с того, что он был в армии, служил в артиллерии.'
input_ids=torch.tensor([tokenizer.encode(lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
print(tokenizer.decode(outputs[0][1:]))
#print result: '<extra_id_0>, как он жил</s>'# Prefix <SC5>
lm_text='<SC5>Принялся Кутузов рассказывать свою историю <extra_id_0>. Началось с того, что он был в армии, служил в артиллерии.'
input_ids=torch.tensor([tokenizer.encode(lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True,max_length=100)
print(tokenizer.decode(outputs[0][1:]))
#print result: '<extra_id_0> </s>'
@misc{zmitrovich2023family,
title={A Family of Pretrained Transformer Language Models for Russian},
author={Dmitry Zmitrovich and Alexander Abramov and Andrey Kalmykov and Maria Tikhonova and Ekaterina Taktasheva and Danil Astafurov and Mark Baushenko and Artem Snegirev and Tatiana Shavrina and Sergey Markov and Vladislav Mikhailov and Alena Fenogenova},
year={2023},
eprint={2309.10931},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Runs of ai-forever FRED-T5-large on huggingface.co
845
Total runs
1
24-hour runs
7
3-day runs
20
7-day runs
466
30-day runs
More Information About FRED-T5-large huggingface.co Model
FRED-T5-large huggingface.co is an AI model on huggingface.co that provides FRED-T5-large's model effect (), which can be used instantly with this ai-forever FRED-T5-large model. huggingface.co supports a free trial of the FRED-T5-large model, and also provides paid use of the FRED-T5-large. Support call FRED-T5-large model through api, including Node.js, Python, http.
FRED-T5-large huggingface.co is an online trial and call api platform, which integrates FRED-T5-large's modeling effects, including api services, and provides a free online trial of FRED-T5-large, you can try FRED-T5-large online for free by clicking the link below.
ai-forever FRED-T5-large online free url in huggingface.co:
FRED-T5-large is an open source model from GitHub that offers a free installation service, and any user can find FRED-T5-large on GitHub to install. At the same time, huggingface.co provides the effect of FRED-T5-large install, users can directly use FRED-T5-large installed effect in huggingface.co for debugging and trial. It also supports api for free installation.