Enhance your Mercari experience with Image Search

Enhance your Mercari experience with Image Search

Table of Contents

  1. Introduction
  2. The Need for Image Search
  3. Basic Architecture of Image Search
  4. Challenges in Indexing for Image Search
    • Quality Variation in Titles and Descriptions
    • Keyword-based Search Limitations
    • Scaling for a Large Number of Images
  5. Addressing the Challenges
    • Feature Extraction and Conversion
    • Indexing Similar Images
    • Compressing and Storing Feature Vectors
  6. Updating the Image Index
    • Packaging and Deploying Docker Images
    • Reconstructing Time-based Indexes
  7. Removing Duplicate Images
    • Managing Business-related Users
    • Implementing Duplicate Removal
  8. Conclusion

🔍 Introduction

In this article, we will discuss our experience with performing large-Scale approximate vector searches using image search applications at Mercari, a leading C2C marketplace in Japan. Mercari provides a platform for individuals to sell and buy products, with millions of daily active users. However, the quality of titles and descriptions for listings can vary significantly. To address this, we realized that buyers are more skilled at taking good item photos rather than writing detailed descriptions. We also found that buyers prefer using visual search when purchasing products. Hence, we introduced image search functionality to enhance the buyer and seller experience.

🔍 The Need for Image Search

Mercari's listings consist of multiple images alongside titles and short descriptions. However, traditional keyword-based searches were hindered by the inconsistent quality of titles and descriptions, affecting the overall user experience. Recognizing the importance of visual content, we introduced image search to enable buyers to find visually similar items on Mercari. This revolutionized the way individuals search for products, emphasizing the power of images over textual descriptions.

🔍 Basic Architecture of Image Search

Image search at Mercari involves converting input images to feature vectors using a deep neural network-based feature extractor. These feature vectors are then dispatched to the image index service, which uses them as search keys to retrieve similar listings within Mercari. The image index represents images in a way that preserves the concept of similarity, allowing efficient exploration of similar images at scale. To achieve this, we utilized the Faiss library from Facebook's AI research, which compresses vectors and provides an indexing structure for efficient approximate nearest neighbor search.

🔍 Challenges in Indexing for Image Search

The implementation of image search at Mercari faced several challenges. The quality variation in titles and descriptions posed a challenge to traditional keyword-based approaches. Additionally, with over 150 million product images, performing exhaustive searches for accurate nearest neighbor results was not viable. Therefore, we needed to rely on fast and approximate nearest neighbor search techniques. With the vast search space, we also had to find a way to compress and store feature vectors to fit within memory limitations.

🔍 Addressing the Challenges

To address the challenges faced in indexing for image search, several approaches were taken. First, we utilized a deep neural network-based feature extractor to convert images into feature vectors that capture their visual characteristics. These vectors exhibit the property of generating close neighbors for similar images. Additionally, by leveraging the Faiss library and Python bindings, we achieved efficient and accurate approximate nearest neighbor search. This enabled us to provide buyers with visually similar listings on Mercari, enhancing their overall search experience.

🔍 Updating the Image Index

Updating the image index in a Timely manner was essential to ensure the inclusion of newly listed items and provide the latest results to users. To accomplish this, we packaged the index into separate Docker images and deployed them as individual services. These services were deployed every hour, resulting in a maximum of 24 services over a 24-hour period. Daily and monthly Cron-jobs were implemented to reorganize the indexes. This architecture, although complex, allowed for timely updates and ensured the accuracy and relevance of search results.

🔍 Removing Duplicate Images

With business-related users sometimes listing multiple items with the same image, it was crucial to remove duplicate images to improve search result relevance and user experience. Initially, we implemented a method that involved searching the vector index before adding a new image to prevent duplicates. However, this method was time-consuming due to the need to sort vectors individually. To overcome this, we introduced a mechanism that merges the constructed indexes and efficiently removes duplicates. By utilizing the "find_duplicates()" function in Faiss, we achieved significant speed improvements.

🔍 Conclusion

In conclusion, the implementation of image search at Mercari has greatly enhanced the buyer and seller experience. By prioritizing visual content through image search, we have overcome several challenges related to inconsistent textual descriptions and scale. The architecture and techniques described have streamlined the indexing process, reduced maintenance complexity, and improved search result relevancy. Mercari's image search service continues to evolve, providing users with intuitive ways to discover visually similar items and fostering a vibrant marketplace environment.

Highlights:

  1. The Power of Image Search: Enhancing the buyer experience on Mercari through visual search capabilities.
  2. Addressing Quality Variation: Overcoming the challenge of inconsistent titles and descriptions in listings.
  3. Efficient Nearest Neighbor Search: Using deep neural network-based feature extraction and Faiss library for scalable and accurate results.
  4. Timely Index Updates: Packaging and deploying Docker images to ensure the inclusion of newly listed items.
  5. Removing Duplicate Images: Implementing efficient duplicate removal methods to improve search result relevance.
  6. Streamlined Architecture: Unifying indexes into a single service to simplify system management and maintenance.

FAQ:

Q: How does image search work at Mercari? Image search at Mercari involves converting input images into feature vectors using a deep neural network-based feature extractor. These vectors are then used to search for visually similar listings within Mercari.

Q: What challenges did Mercari face in implementing image search? Mercari faced challenges such as inconsistent quality in titles and descriptions, scaling for a large number of images, and removing duplicate images from the index.

Q: How is the image index updated at Mercari? The image index is updated by packaging it into separate Docker images and deploying them as individual services. Daily and monthly Cron-jobs are implemented to reorganize the indexes.

Q: How are duplicate images removed from the index? Duplicate images are efficiently removed by leveraging the "find_duplicates()" function in Faiss, which compares vectors using PQ codes to find exact matches.

Q: What are the benefits of image search at Mercari? Image search enhances the buyer experience by allowing them to find visually similar items, overcoming the limitations of traditional keyword-based searches. It also improves the quality and relevance of search results.

Find AI tools in Toolify

Join TOOLIFY to find the ai tools

Get started

Sign Up
App rating
4.9
AI Tools
20k+
Trusted Users
5000+
No complicated
No difficulty
Free forever
Browse More Content