RU
ssian
D
ecoder
O
n
L
anguage
P
icture
H
yper-tasking (
RUDOLPH
)
2.7B
is the largest text-image-text transformer designed for an easy fine-tuning for a range of tasks: from generating images by text description and image classification to visual question answering and more. This model demonstrates the power of Hyper-tasking Transformers.
Hyper-tasking model is a generalized multi-tasking model, i.e., the model that can solve almost all tasks within supported modalities, mandatory including mutual pairwise translations between modalities (two modalities in case of RUDOLPH: images and Russian texts).
Tasks:
text2image generation, self reranking, text ranking, image ranking, image2text generation, zero-shot image classification, text2text generation, text qa, math qa, image captioning, image generation, text recognition in the wild, visual qa, and so on
Language:
Russian
Type:
decoder
Num Parameters:
2.7B
Training Data Volume:
119 million text-image pairs, 60 million text paragraphs
Fine-tuning Data Volume:
43 334 text question-answer pairs, 100 000 math tasks, 85 000 text-image pairs (for captioning, generation), 85 759 visual question-answer pairs, 140 000 image-text pairs for text recognition
The model was prepared as a baseline for FusionBrain Challenge 2.0 (as a part of AI Journey Contest 2022) and is a fine-tuned version of the pre-trained
RuDOLPH 2.7B model
using 6 tasks:
Text Recognition in the Wild: on
START
dataset (
S
yn
T
hesized and
A
nnotated dataset for
T
ext
R
ecognition) consisting of synthetic and real-world human-annotated data for text recognition task.
Details of architecture
The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
num_layers (32) — Number of hidden layers in the Transformer decoder.
hidden_size (2560) — Dimensionality of the hidden layers.
num_attention_heads (32) — Number of attention heads for each attention layer.
Sparse Attention Masks
The primary proposed method is to modify the sparse transformer's attention mask to better control modalities. It allows us to calculate the transitions of modalities in both directions, unlike another similar work DALL-E Transformer, which used only one direction, "text to image". The proposed "image to right text" direction is achieved by extension sparse attention mask to the right for auto-repressively text generation with both image and left text condition.
Runs of ai-forever RUDOLPH-2.7B-FBC2 on huggingface.co
0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs
More Information About RUDOLPH-2.7B-FBC2 huggingface.co Model
RUDOLPH-2.7B-FBC2 huggingface.co
RUDOLPH-2.7B-FBC2 huggingface.co is an AI model on huggingface.co that provides RUDOLPH-2.7B-FBC2's model effect (), which can be used instantly with this ai-forever RUDOLPH-2.7B-FBC2 model. huggingface.co supports a free trial of the RUDOLPH-2.7B-FBC2 model, and also provides paid use of the RUDOLPH-2.7B-FBC2. Support call RUDOLPH-2.7B-FBC2 model through api, including Node.js, Python, http.
RUDOLPH-2.7B-FBC2 huggingface.co is an online trial and call api platform, which integrates RUDOLPH-2.7B-FBC2's modeling effects, including api services, and provides a free online trial of RUDOLPH-2.7B-FBC2, you can try RUDOLPH-2.7B-FBC2 online for free by clicking the link below.
ai-forever RUDOLPH-2.7B-FBC2 online free url in huggingface.co:
RUDOLPH-2.7B-FBC2 is an open source model from GitHub that offers a free installation service, and any user can find RUDOLPH-2.7B-FBC2 on GitHub to install. At the same time, huggingface.co provides the effect of RUDOLPH-2.7B-FBC2 install, users can directly use RUDOLPH-2.7B-FBC2 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.