Home AI News Deploy AI Models with Nvidia Triton Inference Server and Azure VM

Deploy AI Models with Nvidia Triton Inference Server and Azure VM

Introduction
What is the Nvidia Triton Inference Server?
Setting up the Azure VM
Installing Prerequisites on the VM
Pulling the Docker Image and Setting up Inference
Creating the Customized Image
Filter and Selecting the GPU
Opening the SSH Port
Creating and Configuring the VM
Downloading Python Dev Packages
Updating the Path
Installing the Container Engine
Installing the Mobi Engine
Pulling the Docker Container to the VM
Retrieving the Source Code
Installing the Unzip Package
Unzipping the Demo Files
Updating Permissions
Executing the Scripts
Viewing the Inference Results
Analyzing the Object Detection
Printing Images with ChaFFa
Conclusion

Introduction

Today, we will explore the new Nvidia Triton Inference Server and how to set it up on an Azure VM for inferencing using Onyx Runtime. Microsoft has recently released a comprehensive learning path covering various aspects, from acquiring a dataset using auto-labeling features in Azure Machine Learning to training a model with AutoML. This learning path is perfect for those without an in-depth understanding of neural networks. The culmination of this learning path is deploying the model with Triton, which we will focus on in this article.

What is the Nvidia Triton Inference Server?

The Nvidia Triton Inference Server is an open-source software that enables teams to deploy trained AI models from any framework. It supports deployment on any GPU or CPU, data center, cloud, on-premises, or even devices. Its versatility makes it a powerful tool for deploying AI models seamlessly.

Setting up the Azure VM

To begin, we will create an Azure VM and install the necessary prerequisites on it. Azure offers pre-built images that can be used or a customized VM image can be created, saving time on future VM deployments with the pre-installed components.

Installing Prerequisites on the VM

After configuring the VM, we will install the required software packages, including the Python development packages. Updating the path to include the local bin directory will ensure smooth execution of the subsequent steps.

Pulling the Docker Image and Setting up Inference

Next, we will install the container engine required for Triton Inference Server. Microsoft provides a suitable container engine, making installation hassle-free. Additionally, we will install the Mobi engine for optimal performance. Once all prerequisites are installed, we will pull the Docker image to the VM.

Creating the Customized Image

If desired, the customized image created during the prerequisite installations can be set as the default image for future VM deployments. This will save time and effort by having all the required components pre-installed.

Filter and Selecting the GPU

Azure offers the flexibility to filter and select a specific GPU type for the VM. In this case, we will choose the NC6 machine, which is perfect for our requirements. Furthermore, we will open the SSH port (22) to enable secure remote access to the VM.

Creating and Configuring the VM

Once the necessary selections and configurations are made, we can create the VM and proceed further. The creation process may take a few minutes, but once it's complete, we can proceed to SSH into our machine.

Downloading Python Dev Packages

Inside the VM, we will download the required Python dev packages to ensure compatibility with the subsequent steps.

Updating the Path

Updating the path to include the local bin directory will prevent any path-related issues during the installation and execution of the container engine.

Installing the Container Engine

To leverage the power of Triton Inference Server, we need to install the container engine provided by Microsoft. This engine ensures smooth operation of the Docker containers.

Installing the Mobi Engine

In addition to the container engine, we will install the Mobi engine to maximize the performance of our inferencing process.

Pulling the Docker Container to the VM

With the container engine and Mobi engine in place, we are ready to pull the Docker container containing the Triton Inference Server to our VM.

Retrieving the Source Code

To proceed with inferencing, we need to retrieve the source code required to run the system smoothly. We can use the 'wget' command to quickly download the necessary files.

Installing the Unzip Package

Once the source code is downloaded, we need to install the unzip package to facilitate the extraction of the demo files.

Unzipping the Demo Files

After installing the unzip package, we will unzip the demo files to access the necessary resources for inferencing.

Updating Permissions

Before executing the inference scripts, we need to update the permissions of the files to ensure smooth execution.

Executing the Scripts

With all preparations complete, we can now kick off the required scripts to start inferencing. These scripts will utilize the Docker container and execute the inference steps.

Viewing the Inference Results

Once the inference scripts have run successfully, we can examine the results. By scrolling through the inference images, we can observe the detected objects, the type of objects, and their bounding boxes.

Analyzing the Object Detection

Analyzing the object detection, we can identify the various soda cans that have been successfully detected. The bounding boxes provide information about the precise location of each object.

Printing Images with ChaFFa

To get a closer look at the detected objects, we can use a package called ChaFFa to print the images directly in the terminal. Although the images may appear pixelated due to printing in a terminal, the green bounding boxes accurately highlight the soda cans.

Conclusion

In conclusion, we have successfully leveraged Azure VMs and the Triton Inference Server to perform inferencing using Onyx Runtime. This powerful combination allows for seamless deployment and execution of AI models in a variety of environments, from local and cloud storage, to any GPU or CPU, data center, cloud, on-premises, or even devices.

Highlights:

Explore the Nvidia Triton Inference Server and its open-source capabilities
Set up an Azure VM and install the necessary prerequisites for inferencing
Pull the Docker image and configure the Triton Inference Server
Create a customized VM image with pre-installed components for future deployments
Optimize performance by selecting the appropriate GPU and opening the SSH port
Retrieve the source code and install the required packages for smooth inferencing
Execute scripts and analyze the inferencing results
Utilize the ChaFFa package to print images and Visualize object detection