pyannote / segmentation-3.0

huggingface.co
Total runs: 14.9M
24-hour runs: 0
7-day runs: 43.5K
30-day runs: 4.7M
Model's Last Updated: Maio 10 2024
voice-activity-detection

Introduction of segmentation-3.0

Model Details of segmentation-3.0

Using this open-source model in production?
Consider switching to pyannoteAI for better and faster options.

🎹 "Powerset" speaker segmentation

This model ingests 10 seconds of mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are non-speech , speaker #1 , speaker #2 , speaker #3 , speakers #1 and #2 , speakers #1 and #3 , and speakers #2 and #3 .

Example output

# waveform (first row)
duration, sample_rate, num_channels = 10, 16000, 1
waveform = torch.randn(batch_size, num_channels, duration * sample_rate) 

# powerset multi-class encoding (second row)
powerset_encoding = model(waveform)

# multi-label encoding (third row)
from pyannote.audio.utils.powerset import Powerset
max_speakers_per_chunk, max_speakers_per_frame = 3, 2
to_multilabel = Powerset(
    max_speakers_per_chunk, 
    max_speakers_per_frame).to_multilabel
multilabel_encoding = to_multilabel(powerset_encoding)

The various concepts behind this model are described in details in this paper .

It has been trained by Séverin Baroudi with pyannote.audio 3.0.0 using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.

This companion repository by Alexis Plaquet also provides instructions on how to train or finetune such a model on your own data.

Requirements
  1. Install pyannote.audio 3.0 with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Create access token at hf.co/settings/tokens .
Usage
# instantiate the model
from pyannote.audio import Model
model = Model.from_pretrained(
  "pyannote/segmentation-3.0", 
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
Speaker diarization

This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).

See pyannote/speaker-diarization-3.0 pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.

Voice activity detection
from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions
Overlapped speech detection
from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation=model)
HYPER_PARAMETERS = {
  # remove overlapped speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-overlapped speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
Citations
@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Runs of pyannote segmentation-3.0 on huggingface.co

14.9M
Total runs
0
24-hour runs
249.7K
3-day runs
43.5K
7-day runs
4.7M
30-day runs

More Information About segmentation-3.0 huggingface.co Model

More segmentation-3.0 license Visit here:

https://choosealicense.com/licenses/mit

segmentation-3.0 huggingface.co

segmentation-3.0 huggingface.co is an AI model on huggingface.co that provides segmentation-3.0's model effect (), which can be used instantly with this pyannote segmentation-3.0 model. huggingface.co supports a free trial of the segmentation-3.0 model, and also provides paid use of the segmentation-3.0. Support call segmentation-3.0 model through api, including Node.js, Python, http.

segmentation-3.0 huggingface.co Url

https://huggingface.co/pyannote/segmentation-3.0

pyannote segmentation-3.0 online free

segmentation-3.0 huggingface.co is an online trial and call api platform, which integrates segmentation-3.0's modeling effects, including api services, and provides a free online trial of segmentation-3.0, you can try segmentation-3.0 online for free by clicking the link below.

pyannote segmentation-3.0 online free url in huggingface.co:

https://huggingface.co/pyannote/segmentation-3.0

segmentation-3.0 install

segmentation-3.0 is an open source model from GitHub that offers a free installation service, and any user can find segmentation-3.0 on GitHub to install. At the same time, huggingface.co provides the effect of segmentation-3.0 install, users can directly use segmentation-3.0 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

segmentation-3.0 install url in huggingface.co:

https://huggingface.co/pyannote/segmentation-3.0

Url of segmentation-3.0

segmentation-3.0 huggingface.co Url

Provider of segmentation-3.0 huggingface.co

pyannote
ORGANIZATIONS

Other API from pyannote

huggingface.co

Total runs: 315.2K
Run Growth: -55.3K
Growth Rate: -17.63%
Updated: Maio 10 2024
huggingface.co

Total runs: 100.8K
Run Growth: 86.7K
Growth Rate: 85.75%
Updated: Novembro 15 2022