Deploy AI Models with Nvidia Triton Inference Server and Azure VM

Deploy AI Models with Nvidia Triton Inference Server and Azure VM

Table of Contents

  1. Introduction
  2. What is the Nvidia Triton Inference Server?
  3. Setting up the Azure VM
  4. Installing Prerequisites on the VM
  5. Pulling the Docker Image and Setting up Inference
  6. Creating the Customized Image
  7. Filter and Selecting the GPU
  8. Opening the SSH Port
  9. Creating and Configuring the VM
  10. Downloading Python Dev Packages
  11. Updating the Path
  12. Installing the Container Engine
  13. Installing the Mobi Engine
  14. Pulling the Docker Container to the VM
  15. Retrieving the Source Code
  16. Installing the Unzip Package
  17. Unzipping the Demo Files
  18. Updating Permissions
  19. Executing the Scripts
  20. Viewing the Inference Results
  21. Analyzing the Object Detection
  22. Printing Images with ChaFFa
  23. Conclusion

Introduction

Today, we will explore the new Nvidia Triton Inference Server and how to set it up on an Azure VM for inferencing using Onyx Runtime. Microsoft has recently released a comprehensive learning path covering various aspects, from acquiring a dataset using auto-labeling features in Azure Machine Learning to training a model with AutoML. This learning path is perfect for those without an in-depth understanding of neural networks. The culmination of this learning path is deploying the model with Triton, which we will focus on in this article.

What is the Nvidia Triton Inference Server?

The Nvidia Triton Inference Server is an open-source software that enables teams to deploy trained AI models from any framework. It supports deployment on any GPU or CPU, data center, cloud, on-premises, or even devices. Its versatility makes it a powerful tool for deploying AI models seamlessly.

Setting up the Azure VM

To begin, we will create an Azure VM and install the necessary prerequisites on it. Azure offers pre-built images that can be used or a customized VM image can be created, saving time on future VM deployments with the pre-installed components.

Installing Prerequisites on the VM

After configuring the VM, we will install the required software packages, including the Python development packages. Updating the path to include the local bin directory will ensure smooth execution of the subsequent steps.

Pulling the Docker Image and Setting up Inference

Next, we will install the container engine required for Triton Inference Server. Microsoft provides a suitable container engine, making installation hassle-free. Additionally, we will install the Mobi engine for optimal performance. Once all prerequisites are installed, we will pull the Docker image to the VM.

Creating the Customized Image

If desired, the customized image created during the prerequisite installations can be set as the default image for future VM deployments. This will save time and effort by having all the required components pre-installed.

Filter and Selecting the GPU

Azure offers the flexibility to filter and select a specific GPU type for the VM. In this case, we will choose the NC6 machine, which is perfect for our requirements. Furthermore, we will open the SSH port (22) to enable secure remote access to the VM.

Creating and Configuring the VM

Once the necessary selections and configurations are made, we can create the VM and proceed further. The creation process may take a few minutes, but once it's complete, we can proceed to SSH into our machine.

Downloading Python Dev Packages

Inside the VM, we will download the required Python dev packages to ensure compatibility with the subsequent steps.

Updating the Path

Updating the path to include the local bin directory will prevent any path-related issues during the installation and execution of the container engine.

Installing the Container Engine

To leverage the power of Triton Inference Server, we need to install the container engine provided by Microsoft. This engine ensures smooth operation of the Docker containers.

Installing the Mobi Engine

In addition to the container engine, we will install the Mobi engine to maximize the performance of our inferencing process.

Pulling the Docker Container to the VM

With the container engine and Mobi engine in place, we are ready to pull the Docker container containing the Triton Inference Server to our VM.

Retrieving the Source Code

To proceed with inferencing, we need to retrieve the source code required to run the system smoothly. We can use the 'wget' command to quickly download the necessary files.

Installing the Unzip Package

Once the source code is downloaded, we need to install the unzip package to facilitate the extraction of the demo files.

Unzipping the Demo Files

After installing the unzip package, we will unzip the demo files to access the necessary resources for inferencing.

Updating Permissions

Before executing the inference scripts, we need to update the permissions of the files to ensure smooth execution.

Executing the Scripts

With all preparations complete, we can now kick off the required scripts to start inferencing. These scripts will utilize the Docker container and execute the inference steps.

Viewing the Inference Results

Once the inference scripts have run successfully, we can examine the results. By scrolling through the inference images, we can observe the detected objects, the type of objects, and their bounding boxes.

Analyzing the Object Detection

Analyzing the object detection, we can identify the various soda cans that have been successfully detected. The bounding boxes provide information about the precise location of each object.

Printing Images with ChaFFa

To get a closer look at the detected objects, we can use a package called ChaFFa to print the images directly in the terminal. Although the images may appear pixelated due to printing in a terminal, the green bounding boxes accurately highlight the soda cans.

Conclusion

In conclusion, we have successfully leveraged Azure VMs and the Triton Inference Server to perform inferencing using Onyx Runtime. This powerful combination allows for seamless deployment and execution of AI models in a variety of environments, from local and cloud storage, to any GPU or CPU, data center, cloud, on-premises, or even devices.

Highlights:

  • Explore the Nvidia Triton Inference Server and its open-source capabilities
  • Set up an Azure VM and install the necessary prerequisites for inferencing
  • Pull the Docker image and configure the Triton Inference Server
  • Create a customized VM image with pre-installed components for future deployments
  • Optimize performance by selecting the appropriate GPU and opening the SSH port
  • Retrieve the source code and install the required packages for smooth inferencing
  • Execute scripts and analyze the inferencing results
  • Utilize the ChaFFa package to print images and Visualize object detection

Find AI tools in Toolify

Join TOOLIFY to find the ai tools

Get started

Sign Up
App rating
4.9
AI Tools
20k+
Trusted Users
5000+
No complicated
No difficulty
Free forever
Browse More Content