pyannote / segmentation

huggingface.co
Total runs: 7.6M
24-hour runs: 0
7-day runs: 254.8K
30-day runs: 1.9M
Model's Last Updated: 5月 10 2024
voice-activity-detection

Introduction of segmentation

Model Details of segmentation

Using this open-source model in production?
Consider switching to pyannoteAI for better and faster options.

🎹 Speaker segmentation

Paper | Demo | Blog post

Example

Usage

Relies on pyannote.audio 2.1.1: see installation instructions .

# 1. visit hf.co/pyannote/segmentation and accept user conditions
# 2. visit hf.co/settings/tokens to create an access token
# 3. instantiate pretrained model
from pyannote.audio import Model
model = Model.from_pretrained("pyannote/segmentation", 
                              use_auth_token="ACCESS_TOKEN_GOES_HERE")
Voice activity detection
from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
  # onset/offset activation thresholds
  "onset": 0.5, "offset": 0.5,
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions
Overlapped speech detection
from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation=model)
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
Resegmentation
from pyannote.audio.pipelines import Resegmentation
pipeline = Resegmentation(segmentation=model, 
                          diarization="baseline")
pipeline.instantiate(HYPER_PARAMETERS)
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
# where `baseline` should be provided as a pyannote.core.Annotation instance
Raw scores
from pyannote.audio import Inference
inference = Inference(model)
segmentation = inference("audio.wav")
# `segmentation` is a pyannote.core.SlidingWindowFeature
# instance containing raw segmentation scores like the 
# one pictured above (output)
Citation
@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}
Reproducible research

In order to reproduce the results of the paper "End-to-end speaker segmentation for overlap-aware resegmentation " , use pyannote/segmentation@Interspeech2021 with the following hyper-parameters:

Voice activity detection onset offset min_duration_on min_duration_off
AMI Mix-Headset 0.684 0.577 0.181 0.037
DIHARD3 0.767 0.377 0.136 0.067
VoxConverse 0.767 0.713 0.182 0.501
Overlapped speech detection onset offset min_duration_on min_duration_off
AMI Mix-Headset 0.448 0.362 0.116 0.187
DIHARD3 0.430 0.320 0.091 0.144
VoxConverse 0.587 0.426 0.337 0.112
Resegmentation of VBx onset offset min_duration_on min_duration_off
AMI Mix-Headset 0.542 0.527 0.044 0.705
DIHARD3 0.592 0.489 0.163 0.182
VoxConverse 0.537 0.724 0.410 0.563

Expected outputs (and VBx baseline) are also provided in the /reproducible_research sub-directories.

Runs of pyannote segmentation on huggingface.co

7.6M
Total runs
0
24-hour runs
572.8K
3-day runs
254.8K
7-day runs
1.9M
30-day runs

More Information About segmentation huggingface.co Model

More segmentation license Visit here:

https://choosealicense.com/licenses/mit

segmentation huggingface.co

segmentation huggingface.co is an AI model on huggingface.co that provides segmentation's model effect (), which can be used instantly with this pyannote segmentation model. huggingface.co supports a free trial of the segmentation model, and also provides paid use of the segmentation. Support call segmentation model through api, including Node.js, Python, http.

segmentation huggingface.co Url

https://huggingface.co/pyannote/segmentation

pyannote segmentation online free

segmentation huggingface.co is an online trial and call api platform, which integrates segmentation's modeling effects, including api services, and provides a free online trial of segmentation, you can try segmentation online for free by clicking the link below.

pyannote segmentation online free url in huggingface.co:

https://huggingface.co/pyannote/segmentation

segmentation install

segmentation is an open source model from GitHub that offers a free installation service, and any user can find segmentation on GitHub to install. At the same time, huggingface.co provides the effect of segmentation install, users can directly use segmentation installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

segmentation install url in huggingface.co:

https://huggingface.co/pyannote/segmentation

Url of segmentation

segmentation huggingface.co Url

Provider of segmentation huggingface.co

pyannote
ORGANIZATIONS

Other API from pyannote

huggingface.co

Total runs: 318.5K
Run Growth: -55.3K
Growth Rate: -17.63%
Updated: 5月 10 2024
huggingface.co

Total runs: 100.8K
Run Growth: 86.7K
Growth Rate: 85.75%
Updated: 11月 15 2022