Revolutionizing Speech to Text and Document AI with Power Platform and Azure OpenAI
Table of Contents
- Introduction
- Leveraging the Power of AI and the Microsoft Power Platform
- Converting Speech to Text Using OpenAI's Whisper API
- Creating Content with Azure OpenAI GPT
- Setting Up the Custom Connector
- Creating a New Custom Connector
- Defining the Custom Connector's Parameters
- Building the PowerApp
- Creating a New Blank Canvas App
- Adding the Custom Connector and API Key
- Adding the Microphone Control and Button
- Calling the Whisper API and Storing the Response
- Showcasing the Speech-to-Text Response
- Creating a Document with Azure OpenAI Service GPT
- Creating a New Flow
- Customizing the Flow and Instructions
- Creating a File and Converting it to PDF
- Sending the PDF as an Email Attachment
- Calling the GPT Generate Document Flow
- Notifying the User and Displaying the Document
- Conclusion
Leveraging the Power of AI and the Microsoft Power Platform to Convert Speech to a Document
In this article, we will explore how to leverage the power of AI and the Microsoft Power Platform to convert speech to a document. Specifically, we will look at how to utilize OpenAI's Whisper API to convert speech to text and then take AdVantage of the new Azure OpenAI GPT action to Create content for the document Based upon the speech-to-text conversion.
1. Introduction
Speech-to-text conversion is a powerful tool that can enhance various applications, including transcription services, voice assistants, and more. With advancements in AI and cloud technologies, it has become easier than ever to implement speech-to-text conversion in your applications. In this article, we will walk you through the process of leveraging AI and the Microsoft Power Platform to convert speech to a document.
2. Leveraging the Power of AI and the Microsoft Power Platform
2.1 Converting Speech to Text Using OpenAI's Whisper API
Before we dive into the process of creating a document, let's first understand how we can convert speech to text using OpenAI's Whisper API. This API allows us to transcribe speech into text and provides us with two specific endpoints to leverage speech-to-text capabilities in PowerApps.
To get started, we need to create a custom connector that integrates with OpenAI's Whisper API. We can do this by creating a new custom connector from scratch and defining its parameters. The custom connector will serve as the bridge between PowerApps and the Whisper API, enabling us to convert speech to text seamlessly.
2.2 Creating Content with Azure OpenAI GPT
Now that we have the ability to convert speech to text, let's explore how we can create content based on the converted text. For this, we will leverage the Azure OpenAI GPT action, which is a powerful tool for generating text based on given instructions.
Using the Azure OpenAI GPT action, we can create blog posts, answer questions, summarize documents, and perform various other text generation tasks. It comes with a standard set of templates, making it easy to generate content quickly. In our Scenario, we will focus on creating a document, specifically a blog post.
3. Setting Up the Custom Connector
3.1 Creating a New Custom Connector
To get started, we need to create a custom connector in PowerApps that integrates with OpenAI's Whisper API. This custom connector will allow us to send requests to the API and retrieve speech-to-text transcriptions.
To create the custom connector, we will follow these steps:
- Open the PowerApps portal and navigate to the custom connectors section.
- Click on "New custom connector" and select the option to create a new connector from scratch.
- Provide a name for the custom connector and upload an icon if desired.
- Enter the API host as "api.openai.com" and set the security to require an API key.
- Define the required parameter for authorization as "API key".
- Create a new action and set the operation ID as "speech to text".
- Import the sample POST request from the URL "V1/audio/transcriptions".
- Import the sample response as JSON format, which should include the "text" property.
3.2 Defining the Custom Connector's Parameters
Now that we have created the custom connector, we need to define the parameters required for the speech-to-text functionality. Navigate to the Swagger editor and insert the parameters related to form data. Additionally, define that the connector consumes multi-part form data.
Once the custom connector is defined, we can start leveraging it in our PowerApps application to convert speech to text.
4. Building the PowerApp
To convert speech to a document, we need to build a PowerApp that integrates with our custom connector and provides the necessary controls to Record audio and send requests to the API.
4.1 Creating a New Blank Canvas App
Start by creating a new blank canvas app in PowerApps. Give your app a name and click on "Create".
4.2 Adding the Custom Connector and API Key
In the new app, go to the Data section and search for your custom connector. Select the custom connector and insert your API key. The API key should be in the format "Bearer [API Key]".
4.3 Adding the Microphone Control and Button
To enable speech recording, add the microphone control to your app. This control allows users to record audio that can be converted to text.
Next, add a button control to your app. On the button's "OnSelect" property, add the formula to call the API through the custom connector. Use the "speech to text" action and provide the necessary parameters, such as the audio file reference from the microphone control.
5. Calling the Whisper API and Storing the Response
With the PowerApp set up, we can now call the Whisper API and store the response. When the user clicks the button to convert speech to text, the app will send a request to the Whisper API through the custom connector. The API will transcribe the speech and provide a response in JSON format, which includes the converted text.
To showcase the speech-to-text response, we can add a label control to the app and set its text property to the variable containing the API response.
6. Showcasing the Speech-to-Text Response
To display the converted text from the speech-to-text response, we can leverage a label control in our PowerApp. Set the text property of the label control to the variable containing the API response's "text" property.
Now, when the user records their speech and converts it to text, the converted text will be displayed in the label control, providing them with a clear view of the transcription.
7. Creating a Document with Azure OpenAI Service GPT
Now that we have the converted text from the speech-to-text API, let's explore how we can create a document using Azure OpenAI's GPT (Generative Pretrained Transformer) service.
7.1 Creating a New Flow
To generate a document, we will create a new flow using Azure Logic Apps. Flow provides a visual interface for creating automated workflows that integrate different services, including AI Builder and Azure OpenAI.
Start by creating a new flow from scratch in the Azure Portal. Delete the default trigger action and add the PowerApps V2 trigger action. This trigger will listen for events from our PowerApp.
7.2 Customizing the Flow and Instructions
The flow requires some inputs to generate the desired document. We will provide two input parameters: the text instruction for the blog post and the email address of the user.
To generate the document, we will use the Azure OpenAI GPT action and select the "Create a blog post" template. This template comes with predefined instructions for creating a blog post.
Now, we can customize the instructions to meet our requirements. We can specify that the blog post should be less than one page, in HTML format with Relevant HTML tags and inline styling. Additionally, we can include dynamic content from the PowerApp's trigger action to generate personalized blog posts.
8. Creating a File and Converting it to PDF
Once we have the content generated by Azure OpenAI's GPT action, we can create a file in OneDrive. Set the destination of the file to the root folder in OneDrive and provide a name for the file.
The content of the file will be the dynamic property "text" from Azure OpenAI's GPT action. To reference this dynamically generated content, use the dynamic content property ID from the "create file" action.
To prepare the document for sharing, we can convert it to a PDF. This can be done using OneDrive's built-in capabilities or by using third-party tools or connectors.
9. Sending the PDF as an Email Attachment
To wrap up the process, we can send the generated document as an email attachment to the user who initiated the flow. The email can be sent using the built-in Office 365 Outlook connector or other email-related connectors.
Provide the email subject and body, including any necessary dynamic content from the PowerApp's trigger action. Attach the generated PDF to the email by specifying the file name and content.
10. Calling the GPT Generate Document Flow
To trigger the document generation, we need to call the generated Azure Logic Apps flow from our PowerApp. This can be done by invoking the run method of the flow and passing the required input parameters.
In our case, we will pass the text instruction from the speech-to-text API response as the instruction input for the blog post. Additionally, we will pass the user's email address as the email input.
11. Notifying the User and Displaying the Document
To provide feedback to the user, we can leverage the PowerApps notification function. After calling the GPT Generate Document flow, we can use the Notify function to display a success message to the user. This message can inform them that the document generation request has been sent.
Additionally, we can display the generated document to the user. This can be done by opening the document in a browser or presenting it within the PowerApp itself. The method will depend on the specific requirements and capabilities of the PowerApp and the generated document.
12. Conclusion
In this article, we have explored the process of leveraging the power of AI and the Microsoft Power Platform to convert speech to a document. We started by setting up a custom connector to integrate with OpenAI's Whisper API for speech-to-text conversion. Then, we built a PowerApp that allows users to record speech and convert it to text using the custom connector.
Next, we created a flow using Azure Logic Apps to generate a document based on the converted text. We customized the instructions and used Azure OpenAI GPT to generate a blog post. The generated document was saved as a file and converted to a PDF for sharing with the user.
Finally, we called the flow from the PowerApp and notified the user about the document generation request. The generated document was displayed to the user either in a browser or within the PowerApp itself.
By leveraging the power of AI and the Microsoft Power Platform, we can automate the process of converting speech to a document, streamlining workflows, and increasing productivity.