Convert audio to text with OpenAI Whisper
Table of Contents:
- Introduction
- Setting Up the Project
- Installing Dependencies
- Obtaining the OpenAI API Key
- Understanding the Pricing
- Creating the Transcription Function
- Testing the Function
- Exploring the Whisper Model
- Playing the Transcribed Audio
- Conclusion
Introduction
In this article, we will learn how to easily transcribe audio files to text using Node.js and the Whisper model from OpenAI. We will cover the steps needed to set up the project, install dependencies, obtain the OpenAI API key, create the transcription function, and test the function. Additionally, we will explore the Whisper model and play the transcribed audio. So, let's get started.
Setting Up the Project
Before we begin, make sure you have an audio file ready for transcription and a valid API key from OpenAI. Create a new directory for your project and open it in your preferred code editor. We will be using Visual Studio Code for this tutorial.
Installing Dependencies
To utilize the necessary functionalities, we need to install two dependencies: dotenv and axios. dotenv allows us to use environment variables, while axios is used for making HTTP requests. Install these dependencies by running the following command in your terminal:
npm install dotenv axios
Obtaining the OpenAI API Key
To make requests to the OpenAI API, we need an API key. If you don't have an OpenAI account, sign up for one. Once logged in, go to the API Docs and click on your avatar at the top right corner. Select "View API keys" and create a new secret key. Remember to never share your API key with anyone.
Understanding the Pricing
While creating a new account gives you free credits to spend on testing the API, the OpenAI API is not free. The pricing for the Whisper model we are using in this tutorial is approximately 0.006 per minute. For detailed pricing information on each model, refer to the OpenAI website.
Creating the Transcription Function
Now, let's create the function that will send a request to the OpenAI API. We will name this function "transcribe". It will take a file (audio file) as an input. Use Axios to make a POST request to the OpenAI API endpoint: https://api.openai.com/v1/audio/transcriptions
. Remember, file uploads are limited to 25 megabytes by default, and the supported input file types include MP3. If your file is larger than 25 megabytes, you will need to break it up or compress it to a lower format. The Whisper model will try to match the style of the prompt you provide, so you can specify an optimal prompt to improve the quality of the transcripts generated. You can also specify the response format (default is JSON), temperature (higher value for more randomness), and language (default is English).
Testing the Function
To test the function, we need to create a read stream of our audio file using the native Node module "fs". Once we have the file stream, we can call the transcribe function and pass the file as an argument. The function will make the request to the OpenAI API and return the text from the response. Finally, we can log the transcript to the console.
Exploring the Whisper Model
The Whisper model used for transcription is powerful and has the potential for creating amazing applications. Its usage is not limited to transcription, as it can be trained on multiple languages and adapt to various prompts. Its abilities provide a lot of flexibility for developers.
Playing the Transcribed Audio
To demonstrate that the transcription function works, we can play the transcribed audio. By playing the audio file, we can hear the accuracy of the transcription and confirm that it aligns with the original audio.
Conclusion
In this article, we covered the process of transcribing audio files to text using Node.js and the Whisper model from OpenAI. We learned how to set up the project, install dependencies, obtain the OpenAI API key, create the transcription function, and test the function. Additionally, we explored the capabilities of the Whisper model and played the transcribed audio. With this knowledge, you can now efficiently convert audio to text and leverage the power of artificial intelligence in your projects.
Highlights
- Learn how to transcribe audio files to text with Node.js and the Whisper model from OpenAI.
- Set up the project, install dependencies, and obtain the OpenAI API key.
- Create a transcription function to make requests to the OpenAI API.
- Test the function and explore the Whisper model's capabilities.
- Play the transcribed audio for validation.
FAQ
Q: Can I transcribe large audio files using this method?
A: By default, file uploads are limited to 25 megabytes. If your audio file exceeds this limit, you will need to break it up into smaller chunks or compress it to a lower format.
Q: Is the transcription process accurate?
A: The accuracy of the transcription depends on the quality of the audio and the Whisper model's capabilities. It is recommended to provide clear audio and experiment with different prompts to improve accuracy.
Q: How much does the OpenAI API cost?
A: The pricing for the Whisper model used in this tutorial is approximately 0.006 per minute. For detailed pricing information on each model, refer to the OpenAI website.
Q: Can I transcribe audio files in languages other than English?
A: Yes, you can specify the language parameter when making the transcription request. The default language is English, so make sure to change it if you are uploading a non-English file.