Achieving Peak Performance with NVIDIA DGX A100 SuperPOD and DDN AI400X

Find AI Tools

No difficulty

No complicated process

Find ai tools

Home Hardware Achieving Peak Performance with NVIDIA DGX A100 SuperPOD and DDN AI400X

Achieving Peak Performance with NVIDIA DGX A100 SuperPOD and DDN AI400X

Introduction
The DGX A100 Super Pod: A System Built for Capability Computing
The Importance of Data Access in Deep Learning
- Understanding Data Intensity in Deep Learning
- The Impact of Data Size on Caching
The DDN AI400X Appliance with Exascaler
- System Configuration and Performance
- Read Performance on Different Block Sizes
- Performance Comparison with Local RAID
Real Performance in Model Training
- Performance Comparison on Different Node Sets
- Using Remote File System for Training
Conclusion

🚀 Introduction

In this article, we will dive into our experiences deploying the DDN AI400X and Exascaler with our DGX A100 Super Pod. As a solution architect at NVIDIA specialized in large-scale data center deployment, I will share our journey of achieving top performance in AI and HPC applications.

🔬 The DGX A100 Super Pod: A System Built for Capability Computing

The DGX A100 Super Pod is a purpose-built system designed for capability computing in both AI and HPC domains. With this innovative system, we have successfully achieved all eight MLPerf training records. Ranked at number seven on the Top 500 list, the DGX A100 Super Pod boasts a staggering 27.6 teraflop performance. It offers the capabilities to accommodate AI training at Scale and high-end HPC workloads.

📊 The Importance of Data Access in Deep Learning

Understanding Data Intensity in Deep Learning

Deep learning training is a highly data-intensive process. As we train AI models, the iterative nature of the process requires constant access to data. The models propagate forward and use complex solvers, like gradient descent, to find the weights that fit the model. This iterative process involves going through the entire data set multiple times, which makes read I/O performance critical.

The Impact of Data Size on Caching

While caching is important for performance optimization, there are instances where the data sets are too large to be cached entirely. In such cases, the underlying file system needs to support the heavy workloads. We have categorized the different levels of I/O requirements based on the file formats and data types. Ranging from "good" to "best," these categories address the varying needs of different models and data formats.

📦 The DDN AI400X Appliance with Exascaler

Our reliance on the DDN AI400X appliance with Exascaler has been instrumental in achieving optimal performance. Our system configuration consists of 40 AI400X appliances, offering a storage space of 10 petabytes. This setup allows for up to 2 terabytes of peak read performance across the 280 nodes in our existing system. With distributed metadata using DNA, each node has two HDR links to the InfiniBand storage network, enabling us to achieve up to 50 gigabytes per Second per node.

Read Performance on Different Block Sizes

We evaluated the appliance's read performance across different block sizes, such as 128 kilobytes, 1 megabyte, and 16 megabytes. Our tests, using both sequential and random operations, revealed that larger block sizes exhibit similar performance levels, exceeding 45 gigabytes per second per node. For 128-kilobyte reads, sequential performance scales linearly with the number of Threads, reaching up to 10 gigabytes per second for random performance.

Performance Comparison with Local RAID

In order to assess the DDN AI400X appliance's performance compared to a local RAID setup, we ran tests on various node sizes, ranging from one to 96 nodes, using popular deep learning models like ResNet50. The results were promising, with the appliance demonstrating comparable performance to the local RAID setup, ensuring flexibility for our users to run production workloads directly from the AI400X file system.

🎯 Real Performance in Model Training

To gauge real performance in model training, we conducted further tests on different node sets, including one, two, and 96 nodes. We compared the performance of reading data from the local RAID and the remote file system, specifically focusing on both buffered I/O and mmap operations. Our findings indicated that the relative performance between the two approaches was consistently within 97 to 98%, while the performance level matched that of the local RAID for smaller runs. This flexibility eliminates the need for local data caching and enables efficient training of larger models.

🏁 Conclusion

The DDN AI400X appliance with Exascaler has proven to be a valuable addition to our DGX A100 Super Pod, allowing us to achieve peak performance in training a variety of models and data formats. With its impressive scalability and performance, the AI400X appliance meets the demanding needs of our NVIDIA users and scientists. The ability to efficiently access and process vast amounts of data sets new standards in the realm of AI and HPC applications.

Exciting Highlights and Innovations at Computex 2018

Ryzen 5 4500U Gaming: Red Dead Redemption 2 Game Testing

Are you spending too much time looking for ai tools?