Language model for Russian. Model has 13B parameters as you can guess from it's name. This is our biggest model so far and it was used for trainig GigaChat (read more about it in the
article
).
Dataset
Model was pretrained on a 300Gb of various domains, than additionaly trained on the 100 Gb of code and legal documets. Here is the dataset structure:
Training data was deduplicated, the text deduplication includes 64-bit hashing of each text in the corpus for keeping texts with a unique hash. We also filter the documents based on their text compression rate using zlib4. The most strongly and weakly compressing deduplicated texts are discarded.
Technical details
Model was trained using Deepspeed and Megatron libraries, on 300B tokens dataset for 3 epochs, around 45 days on 512 V100. After that model was finetuned 1 epoch with sequence length 2048 around 20 days on 200 GPU A100 on additional data (see above).
After the final training perplexity for this model was around 8.8 for Russian.
Examples of usage
Try different generation strategies to reach better results.
request = "Стих про программиста может быть таким:"
encoded_input = tokenizer(request, return_tensors='pt', \
add_special_tokens=False).to('cuda:0')
output = model.generate(
**encoded_input,
num_beams=2,
do_sample=True,
max_new_tokens=100
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
>>> Стих про программиста может быть таким:
Программист сидит в кресле,
Стих сочиняет он про любовь,
Он пишет, пишет, пишет, пишет...
И не выходит ни черта!
>>> Нейронная сеть — это математическая модель, состоящая из большого
количества нейронов, соединенных между собой электрическими связями.
Нейронная сеть может быть смоделирована на компьютере, и с ее помощью
можно решать задачи, которые не поддаются решению с помощью традиционных
математических методов.
>>> Гагарин полетел в космос в 1961 году. Это было первое в истории
человечества космическое путешествие. Юрий Гагарин совершил его
на космическом корабле Восток-1. Корабль был запущен с космодрома
Байконур.
Runs of ai-forever ruGPT-3.5-13B on huggingface.co
3.4K
Total runs
0
24-hour runs
-6
3-day runs
128
7-day runs
975
30-day runs
More Information About ruGPT-3.5-13B huggingface.co Model
ruGPT-3.5-13B huggingface.co is an AI model on huggingface.co that provides ruGPT-3.5-13B's model effect (), which can be used instantly with this ai-forever ruGPT-3.5-13B model. huggingface.co supports a free trial of the ruGPT-3.5-13B model, and also provides paid use of the ruGPT-3.5-13B. Support call ruGPT-3.5-13B model through api, including Node.js, Python, http.
ruGPT-3.5-13B huggingface.co is an online trial and call api platform, which integrates ruGPT-3.5-13B's modeling effects, including api services, and provides a free online trial of ruGPT-3.5-13B, you can try ruGPT-3.5-13B online for free by clicking the link below.
ai-forever ruGPT-3.5-13B online free url in huggingface.co:
ruGPT-3.5-13B is an open source model from GitHub that offers a free installation service, and any user can find ruGPT-3.5-13B on GitHub to install. At the same time, huggingface.co provides the effect of ruGPT-3.5-13B install, users can directly use ruGPT-3.5-13B installed effect in huggingface.co for debugging and trial. It also supports api for free installation.