Pile-T5 XL is an Encoder-Decoder model trained on
the Pile
using the
T5x
library. The model was trained for 2 million steps or roughly 2 trillion tokens using MLM-objective similar to the original T5 model.
The HF version of Pile-T5 XL borrows UMT5's model implementation as it uses scalable model implementation from T5x and uses
LlamaTokenizer
.
Contact: to ask questions about this model, join the
EleutherAI
Discord
, and post them in
#release-discussion
.
Please read the existing GPT-NeoX-20B documentation before asking about the model
on Discord. For general correspondence:
contact@eleuther.
ai
.
Hyperparameter
Value
n
parameters
2849804288
n
encoder layers
24
n
decoder layers
24
d
model
5120
d
emb
2048
n
heads
32
d
head
64
n
vocab
32128
Sequence Length
512
Uses and limitations
Intended use
Pile-T5 was developed primarily for research purposes. It learns an inner
representation of the English language that can be used to extract features
useful for downstream tasks.
In addition to scientific uses, you may also further fine-tune and adapt
Pile-T5 for deployment, as long as your use is in accordance with the
Apache 2.0 license. This model works with the
Transformers
Library
. If you decide to use
pre-trained Pile-T5 as a basis for your fine-tuned model, please note that
you need to conduct your own risk and bias assessment.
Out-of-scope use
Pile-T5 is
not
intended for deployment as-is. It is not a product
and cannot be used for human-facing interactions without supervision.
Pile-T5 has not been fine-tuned for downstream tasks for which language
models are commonly deployed, such as writing genre prose, or commercial
chatbots. This means Pile-T5 will likely
not
respond to a given prompt
the way products such as ChatGPT do. This is because, unlike Pile-T5,
ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human
Feedback (RLHF) to better “understand” human instructions and dialogue.
This model is English-language only, and thus cannot be used for translation
or generating text in other languages.
Limitations and biases
The core functionality of Pile-T5 is to take a string of text that has been
partially replaced with mask tokens and predict a sequence of tokens that would
replace those mask tokens. Remember that the statistically most likely sequence
of tokens need not result in the most “accurate” text. Never rely on Pile-T5 to produce
factually accurate output.
This model was trained on
the Pile
, a dataset
known to contain profanity and texts that are lewd or otherwise offensive.
See
Section 6 of the Pile paper
for a
discussion of documented biases with regards to gender, religion, and race.
Pile-T5 may produce socially unacceptable or undesirable text,
even if
the prompt itself does not include anything explicitly offensive.
We recommend curating the outputs of this model before presenting it to a human
reader. Please inform your audience that you are using artificially generated
text.
How to use
Pile-T5 can be loaded using the
AutoModelForSeq2SeqLM
functionality:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pile-t5-xl")
model = AutoModelForSeq2SeqLM.from_pretrained("EleutherAI/pile-t5-xl")
Training
Training dataset
The Pile is a 825GiB general-purpose dataset in English. It was created by
EleutherAI specifically for training large language models. It contains texts
from 22 diverse sources, roughly broken down into five categories: academic
writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project
Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub,
Enron Emails). See
the Pile paper
for
a breakdown of all data sources, methodology, and a discussion of ethical
implications. Consult
the datasheet
for
more detailed documentation about the Pile and its component datasets. The
Pile can be downloaded from the
official website
,
or from a
community mirror
.
The Pile was deduplicated before being used to train Pile-T5.
Training procedure
Pile-T5 was trained with a batch size of approximately 1M tokens
(2048 sequences of 512 tokens each), for a total of 2,000,000 steps. Pile-T5 was trained
with the span-corruption objective.
Training checkpoints
Intermediate checkpoints for Pile-T5 are accessible within this repository.
There are in total 200 checkpoints that are spaced 10,000 steps. For T5x-native
checkpoints that can be used for finetuning with the T5x library, refer to
here
The training loss (in tfevent format) and validation perplexity (in jsonl) can be found
here
.
Evaluations
Pile-T5 XL was evaluated on SuperGLUE, CodeXGLUE. A Flan-finetuned version was evaluated on Flan Held In tasks, MMLU and BBH.
Results can be seen in the
blogpost
BibTeX
@misc{2024PileT5,
author = {Lintang Sutawika and Aran Komatsuzaki and Colin Raffel},
title = {Pile-T5},
year = {2024},
url = {https://blog.eleuther.ai/pile-t5/},
note = {Blog post},
}
Runs of EleutherAI pile-t5-xl on huggingface.co
218
Total runs
1
24-hour runs
45
3-day runs
28
7-day runs
95
30-day runs
More Information About pile-t5-xl huggingface.co Model
pile-t5-xl huggingface.co
pile-t5-xl huggingface.co is an AI model on huggingface.co that provides pile-t5-xl's model effect (), which can be used instantly with this EleutherAI pile-t5-xl model. huggingface.co supports a free trial of the pile-t5-xl model, and also provides paid use of the pile-t5-xl. Support call pile-t5-xl model through api, including Node.js, Python, http.
pile-t5-xl huggingface.co is an online trial and call api platform, which integrates pile-t5-xl's modeling effects, including api services, and provides a free online trial of pile-t5-xl, you can try pile-t5-xl online for free by clicking the link below.
EleutherAI pile-t5-xl online free url in huggingface.co:
pile-t5-xl is an open source model from GitHub that offers a free installation service, and any user can find pile-t5-xl on GitHub to install. At the same time, huggingface.co provides the effect of pile-t5-xl install, users can directly use pile-t5-xl installed effect in huggingface.co for debugging and trial. It also supports api for free installation.