speaker-diarization-3.0 huggingface.co api & pyannote speaker-diarization-3.0 github AI Model

Introduction of speaker-diarization-3.0

Model Details of speaker-diarization-3.0

Using this open-source model in production?
Consider switching to pyannoteAI for better and faster options.

🎹 Speaker diarization 3.0

This pipeline has been trained by Séverin Baroudi with pyannote.audio 3.0.0 using a combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.

It ingests mono audio sampled at 16kHz and outputs speaker diarization as an Annotation instance:

stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
audio files sampled at a different rate are resampled to 16kHz automatically upon loading.

Requirements

Install pyannote.audio 3.0 with pip install pyannote.audio
Accept pyannote/segmentation-3.0 user conditions
Accept pyannote/speaker-diarization-3.0 user conditions
Create access token at hf.co/settings/tokens .

Usage

# instantiate the pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
  "pyannote/speaker-diarization-3.0",
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# run the pipeline on an audio file
diarization = pipeline("audio.wav")

# dump the diarization output to disk using RTTM format
with open("audio.rttm", "w") as rttm:
    diarization.write_rttm(rttm)

Processing on GPU

pyannote.audio pipelines run on CPU by default. You can send them to GPU with the following lines:

import torch
pipeline.to(torch.device("cuda"))

Real-time factor is around 2.5% using one Nvidia Tesla V100 SXM2 GPU (for the neural inference part) and one Intel Cascade Lake 6248 CPU (for the clustering part).

In other words, it takes approximately 1.5 minutes to process a one hour conversation.

Processing from memory

Pre-loading audio files in memory may result in faster processing:

waveform, sample_rate = torchaudio.load("audio.wav")
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

Monitoring progress

Hooks are available to monitor the progress of the pipeline:

from pyannote.audio.pipelines.utils.hook import ProgressHook
with ProgressHook() as hook:
    diarization = pipeline("audio.wav", hook=hook)

Controlling the number of speakers

In case the number of speakers is known in advance, one can use the num_speakers option:

diarization = pipeline("audio.wav", num_speakers=2)

One can also provide lower and/or upper bounds on the number of speakers using min_speakers and max_speakers options:

diarization = pipeline("audio.wav", min_speakers=2, max_speakers=5)

Benchmark

This pipeline has been benchmarked on a large collection of datasets.

Processing is fully automatic:

no manual voice activity detection (as is sometimes the case in the literature)
no manual number of speakers (though it is possible to provide it to the pipeline)
no fine-tuning of the internal models nor tuning of the pipeline hyper-parameters to each dataset

... with the least forgiving diarization error rate (DER) setup (named "Full" in this paper ):

no forgiveness collar
evaluation of overlapped speech

Benchmark	DER%	FA%	Miss%	Conf%	Expected output	File-level evaluation
AISHELL-4	12.3	3.8	4.4	4.1	RTTM	eval
AliMeeting ( channel 1 )	24.3	4.4	10.0	9.9	RTTM	eval
AMI ( headset mix, only_words )	19.0	3.6	9.5	5.9	RTTM	eval
AMI ( array1, channel 1, only_words)	22.2	3.8	11.2	7.3	RTTM	eval
AVA-AVD	49.1	10.8	15.7	22.5	RTTM	eval
DIHARD 3 ( Full )	21.7	6.2	8.1	7.3	RTTM	eval
MSDWild	24.6	5.8	8.0	10.7	RTTM	eval
REPERE ( phase 2 )	7.8	1.8	2.6	3.5	RTTM	eval
VoxConverse ( v0.3 )	11.3	4.1	3.4	3.8	RTTM	eval

Citations

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Runs of pyannote speaker-diarization-3.0 on huggingface.co

2.5M

Total runs

24-hour runs

-37.4K

3-day runs

-143.1K

7-day runs

1.1M

30-day runs

More Information About speaker-diarization-3.0 huggingface.co Model

More speaker-diarization-3.0 license Visit here:

https://choosealicense.com/licenses/mit

speaker-diarization-3.0 huggingface.co

speaker-diarization-3.0 huggingface.co is an AI model on huggingface.co that provides speaker-diarization-3.0's model effect (), which can be used instantly with this pyannote speaker-diarization-3.0 model. huggingface.co supports a free trial of the speaker-diarization-3.0 model, and also provides paid use of the speaker-diarization-3.0. Support call speaker-diarization-3.0 model through api, including Node.js, Python, http.

speaker-diarization-3.0 huggingface.co Url

https://huggingface.co/pyannote/speaker-diarization-3.0

pyannote speaker-diarization-3.0 online free

speaker-diarization-3.0 huggingface.co is an online trial and call api platform, which integrates speaker-diarization-3.0's modeling effects, including api services, and provides a free online trial of speaker-diarization-3.0, you can try speaker-diarization-3.0 online for free by clicking the link below.

pyannote speaker-diarization-3.0 online free url in huggingface.co:

https://huggingface.co/pyannote/speaker-diarization-3.0

speaker-diarization-3.0 install

speaker-diarization-3.0 is an open source model from GitHub that offers a free installation service, and any user can find speaker-diarization-3.0 on GitHub to install. At the same time, huggingface.co provides the effect of speaker-diarization-3.0 install, users can directly use speaker-diarization-3.0 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

speaker-diarization-3.0 install url in huggingface.co:

https://huggingface.co/pyannote/speaker-diarization-3.0

huggingface.co

pyannote/segmentation-3.0

Total runs: 14.9M

Run Growth: 4.7M

Growth Rate: 31.26%

Updated: 5月 10 2024

huggingface.co

pyannote/wespeaker-voxceleb-resnet34-LM

Total runs: 14.8M

Run Growth: 3.6M

Growth Rate: 24.51%

Updated: 5月 10 2024

huggingface.co

pyannote/speaker-diarization-3.1

Total runs: 12.3M

Run Growth: 3.4M

Growth Rate: 28.36%

Updated: 5月 10 2024

huggingface.co

pyannote/segmentation

Total runs: 7.6M

Run Growth: 1.9M

Growth Rate: 25.36%

Updated: 5月 10 2024

huggingface.co

pyannote/speaker-diarization

Total runs: 7.2M

Run Growth: 2.1M

Growth Rate: 29.12%

Updated: 5月 10 2024

huggingface.co

pyannote/embedding

Total runs: 318.5K

Run Growth: -55.3K

Growth Rate: -17.63%

Updated: 5月 10 2024

huggingface.co

pyannote/voice-activity-detection

Total runs: 309.3K

Run Growth: 59.5K

Growth Rate: 21.33%

Updated: 5月 10 2024

huggingface.co

pyannote/brouhaha

Total runs: 100.8K

Run Growth: 86.7K

Growth Rate: 85.75%

Updated: 11月 15 2022

huggingface.co

pyannote/overlapped-speech-detection

Total runs: 30.8K

Run Growth: -3.8K

Growth Rate: -12.02%

Updated: 5月 10 2024

huggingface.co

pyannote/speech-separation-ami-1.0

Total runs: 3.3K

Run Growth: -5.1K

Growth Rate: -158.20%

Updated: 11月 11 2024

huggingface.co

pyannote/speaker-segmentation

Total runs: 109

Run Growth: -527

Growth Rate: -497.17%

Updated: 5月 10 2024

huggingface.co

pyannote/ci-segmentation

Total runs: 86

Run Growth: 64

Growth Rate: 74.42%

Updated: 2月 12 2025

huggingface.co

pyannote/TestModelForContinuousIntegration

Total runs: 6

Run Growth: 3

Growth Rate: 50.00%

Updated: 3月 23 2022

huggingface.co

pyannote/separation-ami-1.0

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: 7月 16 2024

pyannote / speaker-diarization-3.0

Introduction of speaker-diarization-3.0

Model Details of speaker-diarization-3.0

🎹 Speaker diarization 3.0

Requirements

Usage

Processing on GPU

Processing from memory

Monitoring progress

Controlling the number of speakers

Benchmark

Citations

Runs of pyannote speaker-diarization-3.0 on huggingface.co

More Information About speaker-diarization-3.0 huggingface.co Model

More speaker-diarization-3.0 license Visit here:

speaker-diarization-3.0 huggingface.co

speaker-diarization-3.0 huggingface.co Url

pyannote speaker-diarization-3.0 online free

pyannote speaker-diarization-3.0 online free url in huggingface.co:

speaker-diarization-3.0 install

speaker-diarization-3.0 install url in huggingface.co:

Url of speaker-diarization-3.0

speaker-diarization-3.0 huggingface.co Url

Provider of speaker-diarization-3.0 huggingface.co

Other API from pyannote