stable-codec-speech-16k
is a Transformer-based codec model designed for high-quality, low-bitrate audio coding. It processes audio waveforms by encoding them into discrete tokens, which can later be decoded back into the original audio waveform.
Please note: For individuals or organizations generating annual revenue of US $1,000,000 (or local currency equivalent) or more, regardless of the source of that revenue, you must obtain an enterprise commercial license directly from Stability AI before commercially using Stable Codec, any derivative work of Stable Codec (such as a “fine tune” model), or their outputs. You may submit a request for an Enterprise License at
https://stability.ai/enterprise
. Please refer to Stability AI's Community License, available at
https://stability.ai/license
, for more information.
Model details
: This released model is a speech codec designed to compress real-world speech data into a suitable format for generative modeling. It provides a foundational tool for developing downstream applications in speech understanding and generation, such as text-to-speech systems and conversational AI models.
Please check our
arXiv page
and
Github repo
for details.
License
Community License:
Free for research, non-commercial, and commercial use by organizations and individuals generating annual revenue of US $1,000,000 (or local currency equivalent) or more, regardless of the source of that revenue. If your annual revenue exceeds US $1M, any commercial use of this model or derivative works thereof requires obtaining an Enterprise License directly from Stability AI. You may submit a request for an Enterprise License at
https://stability.ai/enterprise
. Please refer to Stability AI's Community License, available at
https://stability.ai/license
, for more information.
Intended uses include the following:
Efficient compression of speech signals for storage or streaming purposes.
Enhancing speech-based applications, such as telecommunication systems and real-time communication platforms.
Research and development in audio coding and speech synthesis, including understanding and improving codec performance.
Development of downstream applications including speech recognition and generation.
This model is purely trained on non-overlapping clean English speech, and exhibits optimal performance in these situations. It is not suitable for applications requiring high-fidelity music or environmental sound coding.
Contact
Please report any issues with the model or contact us:
stable-codec-speech-16k huggingface.co is an AI model on huggingface.co that provides stable-codec-speech-16k's model effect (), which can be used instantly with this stabilityai stable-codec-speech-16k model. huggingface.co supports a free trial of the stable-codec-speech-16k model, and also provides paid use of the stable-codec-speech-16k. Support call stable-codec-speech-16k model through api, including Node.js, Python, http.
stable-codec-speech-16k huggingface.co is an online trial and call api platform, which integrates stable-codec-speech-16k's modeling effects, including api services, and provides a free online trial of stable-codec-speech-16k, you can try stable-codec-speech-16k online for free by clicking the link below.
stabilityai stable-codec-speech-16k online free url in huggingface.co:
stable-codec-speech-16k is an open source model from GitHub that offers a free installation service, and any user can find stable-codec-speech-16k on GitHub to install. At the same time, huggingface.co provides the effect of stable-codec-speech-16k install, users can directly use stable-codec-speech-16k installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
stable-codec-speech-16k install url in huggingface.co: