Using this open-source model in production?
Consider switching to
pyannoteAI
for better and faster options.
🎹 Wrapper around wespeaker-voxceleb-resnet34-LM
This model requires
pyannote.audio
version 3.1 or higher.
This is a wrapper around
WeSpeaker
wespeaker-voxceleb-resnet34-LM
pretrained speaker embedding model, for use in
pyannote.audio
.
Basic usage
# instantiate pretrained modelfrom pyannote.audio import Model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")
from pyannote.audio import Inference
inference = Inference(model, window="whole")
embedding1 = inference("speaker1.wav")
embedding2 = inference("speaker2.wav")
# `embeddingX` is (1 x D) numpy array extracted from the file as a whole.from scipy.spatial.distance import cdist
distance = cdist(embedding1, embedding2, metric="cosine")[0,0]
# `distance` is a `float` describing how dissimilar speakers 1 and 2 are.
from pyannote.audio import Inference
from pyannote.core import Segment
inference = Inference(model, window="whole")
excerpt = Segment(13.37, 19.81)
embedding = inference.crop("audio.wav", excerpt)
# `embedding` is (1 x D) numpy array extracted from the file excerpt.
Extract embeddings using a sliding window
from pyannote.audio import Inference
inference = Inference(model, window="sliding",
duration=3.0, step=1.0)
embeddings = inference("audio.wav")
# `embeddings` is a (N x D) pyannote.core.SlidingWindowFeature# `embeddings[i]` is the embedding of the ith position of the# sliding window, i.e. from [i * step, i * step + duration].
The pretrained model in WeNet follows the license of it's corresponding dataset. For example, the pretrained model on VoxCeleb follows Creative Commons Attribution 4.0 International License., since it is used as license of the VoxCeleb dataset, see
https://mm.kaist.ac.kr/datasets/voxceleb/
.
Citation
@inproceedings{Wang2023,
title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
wespeaker-voxceleb-resnet34-LM huggingface.co is an AI model on huggingface.co that provides wespeaker-voxceleb-resnet34-LM's model effect (), which can be used instantly with this pyannote wespeaker-voxceleb-resnet34-LM model. huggingface.co supports a free trial of the wespeaker-voxceleb-resnet34-LM model, and also provides paid use of the wespeaker-voxceleb-resnet34-LM. Support call wespeaker-voxceleb-resnet34-LM model through api, including Node.js, Python, http.
wespeaker-voxceleb-resnet34-LM huggingface.co is an online trial and call api platform, which integrates wespeaker-voxceleb-resnet34-LM's modeling effects, including api services, and provides a free online trial of wespeaker-voxceleb-resnet34-LM, you can try wespeaker-voxceleb-resnet34-LM online for free by clicking the link below.
pyannote wespeaker-voxceleb-resnet34-LM online free url in huggingface.co:
wespeaker-voxceleb-resnet34-LM is an open source model from GitHub that offers a free installation service, and any user can find wespeaker-voxceleb-resnet34-LM on GitHub to install. At the same time, huggingface.co provides the effect of wespeaker-voxceleb-resnet34-LM install, users can directly use wespeaker-voxceleb-resnet34-LM installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
wespeaker-voxceleb-resnet34-LM install url in huggingface.co: