tiiuae / visper

huggingface.co
Total runs: 0
24-hour runs: 0
7-day runs: 0
30-day runs: 0
Model's Last Updated: June 06 2024

Introduction of visper

Model Details of visper

ViSpeR: Multilingual Audio-Visual Speech Recognition

ViSPer is a model for audio visual speech recognition (VSR/AVSR). Trained on 5500 hours of labelled video data.

Training details:

We use our proposed dataset to train a encoder-decoder model in a fully-supervised manner under a multi-lingual setting. While the encoder size is 12 layers, the decoder size is 6 layers. The hidden size, MLP and number of heads are set to 768, 3072 and 12, respectively. The unigram tokenizers are learned for all languages combined and have a vocabulary size of 21k. The models are trained for 150 epochs on 64 Nvidia A100 GPUs (40GB) using AdamW optimizer with max LR of 1e-3 and a weight decay of 0.1. A cosine scheduler with a warm-up of 5 epochs is used for training. The maximum batch size per GPU is set to 1800 video frames.

Performance:

We provide the results of the model on our proposed benchmarks in this table:

Language VSR (WER/CER) AVSR (WER/CER)
French 29.8 5.7
Spanish 39.4 4.4
Arabic 47.8 8.4
Chinese 51.3 (CER) 15.4 (CER)
English 49.1 8.1

Broader impact:

In essence, while we hope that ViSPer will open the doors for new research questions and opportunities, and should only be used for this purpose. There are also potential dual use concerns that come with releasing ViSPer (dataset and models), trained on a substantial corpus of multilingual video data. While the technology behind ViSPer offers significant advances in multimodal speech recognition, it should only be used for research purposes.

ViSpeR paper coming soon
Check our VSR related works

@inproceedings{djilali2023lip2vec,
  title={Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping},
  author={Djilali, Yasser Abdelaziz Dahou and Narayan, Sanath and Boussaid, Haithem and Almazrouei, Ebtessam and Debbah, Merouane},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13790--13801},
  year={2023}
}

@inproceedings{djilali2024vsr,
  title={Do VSR Models Generalize Beyond LRS3?},
  author={Djilali, Yasser Abdelaziz Dahou and Narayan, Sanath and LeBihan, Eustache and Boussaid, Haithem and Almazrouei, Ebtesam and Debbah, Merouane},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={6635--6644},
  year={2024}
}

Runs of tiiuae visper on huggingface.co

0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs

More Information About visper huggingface.co Model

visper huggingface.co

visper huggingface.co is an AI model on huggingface.co that provides visper's model effect (), which can be used instantly with this tiiuae visper model. huggingface.co supports a free trial of the visper model, and also provides paid use of the visper. Support call visper model through api, including Node.js, Python, http.

tiiuae visper online free

visper huggingface.co is an online trial and call api platform, which integrates visper's modeling effects, including api services, and provides a free online trial of visper, you can try visper online for free by clicking the link below.

tiiuae visper online free url in huggingface.co:

https://huggingface.co/tiiuae/visper

visper install

visper is an open source model from GitHub that offers a free installation service, and any user can find visper on GitHub to install. At the same time, huggingface.co provides the effect of visper install, users can directly use visper installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

visper install url in huggingface.co:

https://huggingface.co/tiiuae/visper

Url of visper

visper huggingface.co Url

Provider of visper huggingface.co

tiiuae
ORGANIZATIONS

Other API from tiiuae

huggingface.co

Total runs: 140.0K
Run Growth: 8.4K
Growth Rate: 6.02%
Updated: August 09 2024
huggingface.co

Total runs: 102.3K
Run Growth: 18.6K
Growth Rate: 18.13%
Updated: October 12 2024
huggingface.co

Total runs: 30.1K
Run Growth: 9.7K
Growth Rate: 32.35%
Updated: December 17 2024
huggingface.co

Total runs: 23.8K
Run Growth: 11.7K
Growth Rate: 49.33%
Updated: December 17 2024
huggingface.co

Total runs: 22.3K
Run Growth: 9.6K
Growth Rate: 43.02%
Updated: July 13 2023
huggingface.co

Total runs: 3.9K
Run Growth: -555
Growth Rate: -14.21%
Updated: September 06 2023
huggingface.co

Total runs: 2.5K
Run Growth: -154
Growth Rate: -6.08%
Updated: November 07 2024