In this repository we release (yet another) GPT-2 model, that was trained on various texts for German.
The model is meant to be an entry point for fine-tuning on other texts, and it is definitely not as good or "dangerous" as the English GPT-3 model. We do not plan extensive PR or staged releases for this model 😉
Note
: The model was initially released under an anonymous alias (
anonymous-german-nlp/german-gpt2
) so we now "de-anonymize" it.
More details about GPT-2 can be found in the great
Hugging Face
documentation.
Changelog
16.08.2021: Public release of re-trained version of our German GPT-2 model with better results.
We use pretty much the same corpora as used for training the DBMDZ BERT model, that can be found in
this repository
.
Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome
Tokenizers
library.
With the previously mentioned awesome Tokenizers library we created a 50K byte-level BPE vocab based on the training corpora.
After creating the vocab, we could train the GPT-2 for German on a v3-8 TPU over the complete training corpus for 20 epochs. All hyperparameters
can be found in the official JAX/FLAX documentation
here
from Transformers.
Using the model
The model itself can be used in this way:
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("dbmdz/german-gpt2")
model = AutoModelWithLMHead.from_pretrained("dbmdz/german-gpt2")
However, text generation is a bit more interesting, so here's an example that shows how to use the great Transformers
Pipelines
for generating text:
from transformers import pipeline
pipe = pipeline('text-generation', model="dbmdz/german-gpt2",
tokenizer="dbmdz/german-gpt2")
text = pipe("Der Sinn des Lebens ist es", max_length=100)[0]["generated_text"]
print(text)
This could output this beautiful text:
Der Sinn des Lebens ist es, im Geist zu verweilen, aber nicht in der Welt zu sein, sondern ganz im Geist zu leben.
Die Menschen beginnen, sich nicht nach der Natur und nach der Welt zu richten, sondern nach der Seele,'
german-gpt2 huggingface.co is an AI model on huggingface.co that provides german-gpt2's model effect (), which can be used instantly with this dbmdz german-gpt2 model. huggingface.co supports a free trial of the german-gpt2 model, and also provides paid use of the german-gpt2. Support call german-gpt2 model through api, including Node.js, Python, http.
german-gpt2 huggingface.co is an online trial and call api platform, which integrates german-gpt2's modeling effects, including api services, and provides a free online trial of german-gpt2, you can try german-gpt2 online for free by clicking the link below.
dbmdz german-gpt2 online free url in huggingface.co:
german-gpt2 is an open source model from GitHub that offers a free installation service, and any user can find german-gpt2 on GitHub to install. At the same time, huggingface.co provides the effect of german-gpt2 install, users can directly use german-gpt2 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.