Home AI News Supercharging Workflows with Batch Inference

Supercharging Workflows with Batch Inference

Introduction
The Batch Inference Feature
User Pain Points
- Lack of Standardization
- Computationally Heavy
- Difficulty in Implementing
Value Adds of Batch Inference
- First Party Integration
- Speeding up Inference with Limited Resources
- Increased Prototyping Velocity
Use Case: Text Embedding Computation
- Overview of Text Embedding
- Retrieval Augmented Use Cases
Demo: Running Batch Inference for Text Embedding
- Code Overview
- Example User Flow
Materials and Resources
- Blog Post
- Code Repository
- Documentation
Conclusion
FAQs

The Batch Inference Feature: Enhancing Workflows with Determined AI

Batch inference, also known as offline inference, is a process that involves generating predictions on a batch of observations. Unlike real-time or online inference, which generates predictions Based on individual observations at runtime, batch inference runs on a scheduled basis, processing large sets of data to store predictions for future use.

In this article, we will discuss the benefits, use cases, and implementation of the batch inference feature offered by Determined AI. We will also provide a detailed walkthrough of a specific use case - text embedding computation - and demonstrate how to run batch inference in this Context.

Introduction

Hello everyone! My name is Ram, one of the project managers at Determined AI, specializing in AI development. Today, I am excited to present and share our insights on the batch inference feature with all of You. This new feature, called the Batch Inference, expands the capabilities of Determined AI beyond training workloads and focuses on non-training tasks like inference. In this article, we will take you through various topics, ranging from a high-level introduction to batch inference to a detailed use case of text embedding computation.

The Batch Inference Feature

The batch inference feature is a significant addition to the Determined AI platform. While most of the existing features revolve around training models, the batch inference feature stands out as the first step towards supporting non-training workloads. By enabling batch inference, Determined AI aims to empower users to expand their workflows and cover different use cases.

User Pain Points

Before delving into the specifics, it is essential to understand the pain points faced by users when interacting with batch inference. After conducting numerous interviews and surveys with internal and external customers, we have identified and categorized the pain points into three main areas.

Lack of Standardization

Previously, Determined AI did not offer a first-party solution for batch inference, forcing users to resort to low-level parallelization programming or homegrown solutions. These homegrown solutions lacked native Determined AI functions, such as preemption, experiment tracking, and metrics reporting. As a result, users faced challenges in ensuring standardization across their batch inference workflows.

Computationally Heavy

Batch inference can be computationally heavy, especially when processing large datasets. Users often had to write complex scripts to manage distributed workloads and handle parallelization manually. This process was time-consuming and resource-intensive, requiring users to allocate significant GPU resources to complete batch inference efficiently.

Difficulty in Implementing

Many users struggled with implementing batch inference as they faced challenges in establishing an influential data pipeline. Initially, during the prototyping stage, users often used small-Scale datasets, which did not highlight the need for batch inference. As their projects expanded and matured, users realized the difficulties and complexities of running batch inference workflows, including distributed inference and acquiring additional GPU resources.

Value Adds of Batch Inference

To address the pain points Mentioned above, Determined AI provides several value-added benefits to users who leverage the batch inference feature.

First Party Integration

Determined AI aims to make batch inference a native and first-party feature seamlessly integrated into the platform. Users can now treat batch inference experiments similar to training jobs, with complete visibility in the Determined AI web UI. This integration allows users to view and analyze metrics, pause or resume experiments, and access experiment checkpoints from anywhere. The addition of native functions ensures standardization and simplifies the adoption of batch inference.

Speeding up Inference with Limited Resources

With the batch processing API, users can significantly speed up their inference tasks while efficiently utilizing limited GPU resources. The API provides the ability to scale up or down the number of GPUs used for embedding generation with a simple configuration change. This functionality abstracts away the complexities of parallelization and enables users to process large datasets effectively.

Increased Prototyping Velocity

The flexibility of the batch processing API allows users to increase their prototyping velocity regardless of their development stage. Users can launch the API anytime and experiment with different configurations, environments, and resources. This flexibility allows for rapid prototyping and experimentation, facilitating quick iterations to find optimal solutions for specific use cases.

Use Case: Text Embedding Computation

One of the significant use cases where batch inference proves valuable is in the context of text embedding computation. Text embedding refers to the mapping of high-dimensional textual data (such as words, sentences, or documents) to lower-dimensional vectors that capture their semantic meaning. By generating text embeddings, users can enhance their workflows, such as retrieval augmented applications and question-answering systems.

Overview of Text Embedding

Text embedding is a process that maps high-dimensional textual data to lower-dimensional vectors. These vectors capture the semantic meaning of the original words, sentences, or documents. The proximity of the embeddings in the embedding space reflects the relevance or similarity between the corresponding Texts. This property allows users to perform retrieval-based operations, such as finding similar documents or answering questions effectively.

Retrieval Augmented Use Cases

In the realm of large language models (LLMs) like OpenAI's GPT-3, retrieval augmented techniques have gained popularity. These techniques combine external data sources, such as document repositories or knowledge bases, to provide additional context and Relevant information to the LLM. By leveraging batch inference, users can generate embeddings for their document collections, Create a vector database, and use this database to enrich and enhance the Prompts given to LLMs during inference. This approach ensures that LLM outputs are more accurate and relevant, addressing the limitations of pure LLMs.

Demo: Running Batch Inference for Text Embedding

In this demo, we will showcase the process of running batch inference for text embedding using Determined AI's Batch Processing API. The provided code demonstrates the steps involved, but note that the specifics may vary depending on your use case.

The code initializes the necessary resources, including the tokenizer, model, and database client. It then defines the processing logic for each batch of data. In this example, the data is tokenized, passed through the model to extract embeddings, and associated with the original text for future reference. The code also handles checkpoints, allowing users to save their progress and resume from the last checkpoint.

To further illustrate the demo, we will provide a user flow that guides you through the setup, execution, and querying phases of the batch inference process. This walkthrough will help you understand how to integrate batch inference into your own projects effectively.

Materials and Resources

To get started with batch inference and explore the detailed implementation steps, we have prepared several resources for you:

Blog Post: We have published a blog post that outlines the problems users face with batch inference, highlights the benefits of the feature, and presents the text embedding use case. You can access the blog post here.
Code Repository: The code used in the demo, as well as additional examples and documentation, is available in the Determined AI GitHub repository. You can find the repository here.
Documentation: For in-depth instructions, best practices, and additional materials, refer to the batch processing API documentation. The documentation provides step-by-step guidance and covers various aspects of batch inference. You can access the documentation here.

Feel free to explore these resources, share your feedback, and reach out to our team for any questions or further assistance.

Conclusion

In this article, we introduced Determined AI's batch inference feature, discussed its benefits, and explored a specific use case: text embedding computation. The demo and accompanying materials provide step-by-step instructions on how to run batch inference for text embedding effectively. Leveraging the batch processing API, Determined AI aims to simplify and optimize batch inference workflows, empowering users to scale their projects and enhance their machine learning capabilities.

We encourage you to dive into the resources provided, experiment with batch inference, and share your feedback with the Determined AI community. We look forward to your success and continued collaboration!

Highlights

Determined AI introduces the batch inference feature, expanding its capabilities beyond training workloads.
Users benefit from native integration, speeding up inference, and increased prototyping velocity.
Text embedding computation is a valuable use case for batch inference.
A demo showcases the process of running batch inference for text embedding using Determined AI's Batch Processing API.
Resources, including a blog post, code repository, and documentation, are available for users to explore and implement batch inference effectively.

FAQs

Q: Is the batch inference API available for trial API users?

A: Yes, the batch inference API is accessible for trial API users. It provides an easy and efficient way to perform batch inference on test datasets.

Q: How does batch size impact latency?

A: The batch size determines how many data samples are processed simultaneously. Larger batches can lead to faster processing times but may require more memory resources. It is crucial to find the optimal balance between batch size and available resources to achieve optimal latency.

Q: Are there any optimization strategies for batch sizing?

A: Currently, batch sizing optimization is not available within Determined AI. Users need to manually experiment and find the optimal batch size based on resource availability and model requirements.

Porting MedMNIST Training Script

Embrace Morning Serenity: Discover the Power of Mindfulness and Meditation