Deploy AI Models with Nvidia Triton Inference Server and Azure VM
Table of Contents
- Introduction
- What is the Nvidia Triton Inference Server?
- Setting up the Azure VM
- Installing Prerequisites on the VM
- Pulling the Docker Image and Setting up Inference
- Creating the Customized Image
- Filter and Selecting the GPU
- Opening the SSH Port
- Creating and Configuring the VM
- Downloading Python Dev Packages
- Updating the Path
- Installing the Container Engine
- Installing the Mobi Engine
- Pulling the Docker Container to the VM
- Retrieving the Source Code
- Installing the Unzip Package
- Unzipping the Demo Files
- Updating Permissions
- Executing the Scripts
- Viewing the Inference Results
- Analyzing the Object Detection
- Printing Images with ChaFFa
- Conclusion
Introduction
Today, we will explore the new Nvidia Triton Inference Server and how to set it up on an Azure VM for inferencing using Onyx Runtime. Microsoft has recently released a comprehensive learning path covering various aspects, from acquiring a dataset using auto-labeling features in Azure Machine Learning to training a model with AutoML. This learning path is perfect for those without an in-depth understanding of neural networks. The culmination of this learning path is deploying the model with Triton, which we will focus on in this article.
What is the Nvidia Triton Inference Server?
The Nvidia Triton Inference Server is an open-source software that enables teams to deploy trained AI models from any framework. It supports deployment on any GPU or CPU, data center, cloud, on-premises, or even devices. Its versatility makes it a powerful tool for deploying AI models seamlessly.
Setting up the Azure VM
To begin, we will create an Azure VM and install the necessary prerequisites on it. Azure offers pre-built images that can be used or a customized VM image can be created, saving time on future VM deployments with the pre-installed components.
Installing Prerequisites on the VM
After configuring the VM, we will install the required software packages, including the Python development packages. Updating the path to include the local bin directory will ensure smooth execution of the subsequent steps.
Pulling the Docker Image and Setting up Inference
Next, we will install the container engine required for Triton Inference Server. Microsoft provides a suitable container engine, making installation hassle-free. Additionally, we will install the Mobi engine for optimal performance. Once all prerequisites are installed, we will pull the Docker image to the VM.
Creating the Customized Image
If desired, the customized image created during the prerequisite installations can be set as the default image for future VM deployments. This will save time and effort by having all the required components pre-installed.
Filter and Selecting the GPU
Azure offers the flexibility to filter and select a specific GPU type for the VM. In this case, we will choose the NC6 machine, which is perfect for our requirements. Furthermore, we will open the SSH port (22) to enable secure remote access to the VM.
Creating and Configuring the VM
Once the necessary selections and configurations are made, we can create the VM and proceed further. The creation process may take a few minutes, but once it's complete, we can proceed to SSH into our machine.
Downloading Python Dev Packages
Inside the VM, we will download the required Python dev packages to ensure compatibility with the subsequent steps.
Updating the Path
Updating the path to include the local bin directory will prevent any path-related issues during the installation and execution of the container engine.
Installing the Container Engine
To leverage the power of Triton Inference Server, we need to install the container engine provided by Microsoft. This engine ensures smooth operation of the Docker containers.
Installing the Mobi Engine
In addition to the container engine, we will install the Mobi engine to maximize the performance of our inferencing process.
Pulling the Docker Container to the VM
With the container engine and Mobi engine in place, we are ready to pull the Docker container containing the Triton Inference Server to our VM.
Retrieving the Source Code
To proceed with inferencing, we need to retrieve the source code required to run the system smoothly. We can use the 'wget' command to quickly download the necessary files.
Installing the Unzip Package
Once the source code is downloaded, we need to install the unzip package to facilitate the extraction of the demo files.
Unzipping the Demo Files
After installing the unzip package, we will unzip the demo files to access the necessary resources for inferencing.
Updating Permissions
Before executing the inference scripts, we need to update the permissions of the files to ensure smooth execution.
Executing the Scripts
With all preparations complete, we can now kick off the required scripts to start inferencing. These scripts will utilize the Docker container and execute the inference steps.
Viewing the Inference Results
Once the inference scripts have run successfully, we can examine the results. By scrolling through the inference images, we can observe the detected objects, the type of objects, and their bounding boxes.
Analyzing the Object Detection
Analyzing the object detection, we can identify the various soda cans that have been successfully detected. The bounding boxes provide information about the precise location of each object.
Printing Images with ChaFFa
To get a closer look at the detected objects, we can use a package called ChaFFa to print the images directly in the terminal. Although the images may appear pixelated due to printing in a terminal, the green bounding boxes accurately highlight the soda cans.
Conclusion
In conclusion, we have successfully leveraged Azure VMs and the Triton Inference Server to perform inferencing using Onyx Runtime. This powerful combination allows for seamless deployment and execution of AI models in a variety of environments, from local and cloud storage, to any GPU or CPU, data center, cloud, on-premises, or even devices.
Highlights:
- Explore the Nvidia Triton Inference Server and its open-source capabilities
- Set up an Azure VM and install the necessary prerequisites for inferencing
- Pull the Docker image and configure the Triton Inference Server
- Create a customized VM image with pre-installed components for future deployments
- Optimize performance by selecting the appropriate GPU and opening the SSH port
- Retrieve the source code and install the required packages for smooth inferencing
- Execute scripts and analyze the inferencing results
- Utilize the ChaFFa package to print images and Visualize object detection