oliverguhr / fullstop-punctuation-multilang-large

huggingface.co
Total runs: 202.1K
24-hour runs: 1.2K
7-day runs: -4.6K
30-day runs: -150.4K
Model's Last Updated: November 16 2023
token-classification

Introduction of fullstop-punctuation-multilang-large

Model Details of fullstop-punctuation-multilang-large

This model predicts the punctuation of English, Italian, French and German texts. We developed it to restore the punctuation of transcribed spoken language.

This multilanguage model was trained on the Europarl Dataset provided by the SEPP-NLG Shared Task . Please note that this dataset consists of political speeches. Therefore the model might perform differently on texts from other domains.

The model restores the following punctuation markers: "." "," "?" "-" ":"

Sample Code

We provide a simple python package that allows you to process text of any length.

Install

To get started install the package from pypi :

pip install deepmultilingualpunctuation
Restore Punctuation
from deepmultilingualpunctuation import PunctuationModel

model = PunctuationModel()
text = "My name is Clara and I live in Berkeley California Ist das eine Frage Frau Müller"
result = model.restore_punctuation(text)
print(result)

output

My name is Clara and I live in Berkeley, California. Ist das eine Frage, Frau Müller?

Predict Labels
from deepmultilingualpunctuation import PunctuationModel

model = PunctuationModel()
text = "My name is Clara and I live in Berkeley California Ist das eine Frage Frau Müller"
clean_text = model.preprocess(text)
labled_words = model.predict(clean_text)
print(labled_words)

output

[['My', '0', 0.9999887], ['name', '0', 0.99998665], ['is', '0', 0.9998579], ['Clara', '0', 0.6752215], ['and', '0', 0.99990904], ['I', '0', 0.9999877], ['live', '0', 0.9999839], ['in', '0', 0.9999515], ['Berkeley', ',', 0.99800044], ['California', '.', 0.99534047], ['Ist', '0', 0.99998784], ['das', '0', 0.99999154], ['eine', '0', 0.9999918], ['Frage', ',', 0.99622655], ['Frau', '0', 0.9999889], ['Müller', '?', 0.99863917]]

Results

The performance differs for the single punctuation markers as hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop. The model achieves the following F1 scores for the different languages:

Label EN DE FR IT
0 0.991 0.997 0.992 0.989
. 0.948 0.961 0.945 0.942
? 0.890 0.893 0.871 0.832
, 0.819 0.945 0.831 0.798
: 0.575 0.652 0.620 0.588
- 0.425 0.435 0.431 0.421
macro average 0.775 0.814 0.782 0.762
Languages
Models
Community Models
Languages Model
English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian kredor/punctuate-all
Catalan softcatala/fullstop-catalan-punctuation-prediction
Welsh techiaith/fullstop-welsh-punctuation-prediction

You can use different models by setting the model parameter:

model = PunctuationModel(model = "oliverguhr/fullstop-dutch-punctuation-prediction")
Where do I find the code and can I train my own model?

Yes you can! For complete code of the reareach project take a look at this repository .

There is also an guide on how to fine tune this model for you data / language .

References
@article{guhr-EtAl:2021:fullstop,
  title={FullStop: Multilingual Deep Models for Punctuation Prediction},
  author    = {Guhr, Oliver  and  Schumann, Anne-Kathrin  and  Bahrmann, Frank  and  Böhme, Hans Joachim},
  booktitle      = {Proceedings of the Swiss Text Analytics Conference 2021},
  month          = {June},
  year           = {2021},
  address        = {Winterthur, Switzerland},
  publisher      = {CEUR Workshop Proceedings},  
  url       = {http://ceur-ws.org/Vol-2957/sepp_paper4.pdf}
}

Runs of oliverguhr fullstop-punctuation-multilang-large on huggingface.co

202.1K
Total runs
1.2K
24-hour runs
-2.2K
3-day runs
-4.6K
7-day runs
-150.4K
30-day runs

More Information About fullstop-punctuation-multilang-large huggingface.co Model

More fullstop-punctuation-multilang-large license Visit here:

https://choosealicense.com/licenses/mit

fullstop-punctuation-multilang-large huggingface.co

fullstop-punctuation-multilang-large huggingface.co is an AI model on huggingface.co that provides fullstop-punctuation-multilang-large's model effect (), which can be used instantly with this oliverguhr fullstop-punctuation-multilang-large model. huggingface.co supports a free trial of the fullstop-punctuation-multilang-large model, and also provides paid use of the fullstop-punctuation-multilang-large. Support call fullstop-punctuation-multilang-large model through api, including Node.js, Python, http.

fullstop-punctuation-multilang-large huggingface.co Url

https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large

oliverguhr fullstop-punctuation-multilang-large online free

fullstop-punctuation-multilang-large huggingface.co is an online trial and call api platform, which integrates fullstop-punctuation-multilang-large's modeling effects, including api services, and provides a free online trial of fullstop-punctuation-multilang-large, you can try fullstop-punctuation-multilang-large online for free by clicking the link below.

oliverguhr fullstop-punctuation-multilang-large online free url in huggingface.co:

https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large

fullstop-punctuation-multilang-large install

fullstop-punctuation-multilang-large is an open source model from GitHub that offers a free installation service, and any user can find fullstop-punctuation-multilang-large on GitHub to install. At the same time, huggingface.co provides the effect of fullstop-punctuation-multilang-large install, users can directly use fullstop-punctuation-multilang-large installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

fullstop-punctuation-multilang-large install url in huggingface.co:

https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large

Url of fullstop-punctuation-multilang-large

fullstop-punctuation-multilang-large huggingface.co Url

Provider of fullstop-punctuation-multilang-large huggingface.co

oliverguhr
ORGANIZATIONS

Other API from oliverguhr