TencentARC / QA-CLIP-ViT-B-16

huggingface.co
Total runs: 33
24-hour runs: 0
7-day runs: -97
30-day runs: -78
Model's Last Updated: May 16 2023
zero-shot-image-classification

Introduction of QA-CLIP-ViT-B-16

Model Details of QA-CLIP-ViT-B-16

中文说明 | English

Introduction

This project aims to provide a better Chinese CLIP model. The training data used in this project consists of publicly accessible image URLs and related Chinese text descriptions, totaling 400 million. After screening, we ultimately used 100 million data for training. This project is produced by QQ-ARC Joint Lab, Tencent PCG. For more detailed information, please refer to the main page of the QA-CLIP project . We have also open-sourced our code on GitHub, QA-CLIP , and welcome to star!

Results

We conducted zero-shot tests on MUGE Retrieval , Flickr30K-CN , and COCO-CN datasets for image-text retrieval tasks. For the image zero-shot classification task, we tested on the ImageNet dataset. The test results are shown in the table below:

Flickr30K-CN Zero-shot Retrieval (Official Test Set) :

Task Text-to-Image Image-to-Text
Metric R@1 R@5 R@10 R@1 R@5 R@10
CN-CLIP RN50 48.8 76.0 84.6 60.0 85.9 92.0
QA-CLIP RN50 50.5 77.4 86.1 67.1 87.9 93.2
CN-CLIP ViT-B/16 62.7 86.9 92.8 74.6 93.5 97.1
QA-CLIP ViT-B/16 63.8 88.0 93.2 78.4 96.1 98.5
CN-CLIP ViT-L/14 68.0 89.7 94.4 80.2 96.6 98.2
AltClip ViT-L/14 69.7 90.1 94.8 84.8 97.7 99.1
QA-CLIP ViT-L/14 69.3 90.3 94.7 85.3 97.9 99.2

MUGE Zero-shot Retrieval (Official Validation Set) :

Task Text-to-Image Image-to-Text
Metric R@1 R@5 R@10 R@1 R@5 R@10
CN-CLIP RN50 42.6 68.5 78.0 30.0 56.2 66.9
QA-CLIP RN50 44.0 69.9 79.5 32.4 59.5 70.3
CN-CLIP ViT-B/16 52.1 76.7 84.4 38.7 65.6 75.1
QA-CLIP ViT-B/16 53.2 77.7 85.1 40.7 68.2 77.2
CN-CLIP ViT-L/14 56.4 79.8 86.2 42.6 69.8 78.6
AltClip ViT-L/14 29.6 49.9 58.8 21.4 42.0 51.9
QA-CLIP ViT-L/14 57.4 81.0 87.7 45.5 73.0 81.4

COCO-CN Zero-shot Retrieval (Official Test Set) :

Task Text-to-Image Image-to-Text
Metric R@1 R@5 R@10 R@1 R@5 R@10
CN-CLIP RN50 48.1 81.3 90.5 50.9 81.1 90.5
QA-CLIP RN50 50.1 82.5 91.7 56.7 85.2 92.9
CN-CLIP ViT-B/16 62.2 87.1 94.9 56.3 84.0 93.3
QA-CLIP ViT-B/16 62.9 87.7 94.7 61.5 87.6 94.8
CN-CLIP ViT-L/14 64.9 88.8 94.2 60.6 84.4 93.1
AltClip ViT-L/14 63.5 87.6 93.5 62.6 88.5 95.9
QA-CLIP ViT-L/14 65.7 90.2 95.0 64.5 88.3 95.1

Zero-shot Image Classification on ImageNet :

Task ImageNet
CN-CLIP RN50 33.5
QA-CLIP RN50 35.5
CN-CLIP ViT-B/16 48.4
QA-CLIP ViT-B/16 49.7
CN-CLIP ViT-L/14 54.7
QA-CLIP ViT-L/14 55.8



Getting Started

Inference Code

Inference code example:

from PIL import Image
import requests
from transformers import ChineseCLIPProcessor, ChineseCLIPModel

model = ChineseCLIPModel.from_pretrained("TencentARC/QA-CLIP-ViT-B-16")
processor = ChineseCLIPProcessor.from_pretrained("TencentARC/QA-CLIP-ViT-B-16")

url = "https://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/pokemon.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
# Squirtle, Bulbasaur, Charmander, Pikachu in English
texts = ["杰尼龟", "妙蛙种子", "小火龙", "皮卡丘"]

# compute image feature
inputs = processor(images=image, return_tensors="pt")
image_features = model.get_image_features(**inputs)
image_features = image_features / image_features.norm(p=2, dim=-1, keepdim=True)  # normalize

# compute text features
inputs = processor(text=texts, padding=True, return_tensors="pt")
text_features = model.get_text_features(**inputs)
text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)  # normalize

# compute image-text similarity scores
inputs = processor(text=texts, images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)



Acknowledgments

The project code is based on implementation of Chinese-CLIP , and we are very grateful for their outstanding open-source contributions.

Runs of TencentARC QA-CLIP-ViT-B-16 on huggingface.co

33
Total runs
0
24-hour runs
-97
3-day runs
-97
7-day runs
-78
30-day runs

More Information About QA-CLIP-ViT-B-16 huggingface.co Model

More QA-CLIP-ViT-B-16 license Visit here:

https://choosealicense.com/licenses/apache-2.0

QA-CLIP-ViT-B-16 huggingface.co

QA-CLIP-ViT-B-16 huggingface.co is an AI model on huggingface.co that provides QA-CLIP-ViT-B-16's model effect (), which can be used instantly with this TencentARC QA-CLIP-ViT-B-16 model. huggingface.co supports a free trial of the QA-CLIP-ViT-B-16 model, and also provides paid use of the QA-CLIP-ViT-B-16. Support call QA-CLIP-ViT-B-16 model through api, including Node.js, Python, http.

QA-CLIP-ViT-B-16 huggingface.co Url

https://huggingface.co/TencentARC/QA-CLIP-ViT-B-16

TencentARC QA-CLIP-ViT-B-16 online free

QA-CLIP-ViT-B-16 huggingface.co is an online trial and call api platform, which integrates QA-CLIP-ViT-B-16's modeling effects, including api services, and provides a free online trial of QA-CLIP-ViT-B-16, you can try QA-CLIP-ViT-B-16 online for free by clicking the link below.

TencentARC QA-CLIP-ViT-B-16 online free url in huggingface.co:

https://huggingface.co/TencentARC/QA-CLIP-ViT-B-16

QA-CLIP-ViT-B-16 install

QA-CLIP-ViT-B-16 is an open source model from GitHub that offers a free installation service, and any user can find QA-CLIP-ViT-B-16 on GitHub to install. At the same time, huggingface.co provides the effect of QA-CLIP-ViT-B-16 install, users can directly use QA-CLIP-ViT-B-16 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

QA-CLIP-ViT-B-16 install url in huggingface.co:

https://huggingface.co/TencentARC/QA-CLIP-ViT-B-16

Url of QA-CLIP-ViT-B-16

QA-CLIP-ViT-B-16 huggingface.co Url

Provider of QA-CLIP-ViT-B-16 huggingface.co

TencentARC
ORGANIZATIONS

Other API from TencentARC

huggingface.co

Create photos, paintings and avatars for anyone in any style within seconds.

Total runs: 35.2K
Run Growth: -43.4K
Growth Rate: -124.12%
Updated: July 22 2024
huggingface.co

Total runs: 122
Run Growth: -78
Growth Rate: -55.71%
Updated: December 16 2024
huggingface.co

Total runs: 114
Run Growth: 22
Growth Rate: 19.30%
Updated: November 29 2024
huggingface.co

Total runs: 19
Run Growth: 11
Growth Rate: 57.89%
Updated: December 10 2024
huggingface.co

Total runs: 5
Run Growth: -1
Growth Rate: -20.00%
Updated: December 30 2024
huggingface.co

Total runs: 5
Run Growth: -2
Growth Rate: -40.00%
Updated: December 30 2024
huggingface.co

Total runs: 4
Run Growth: -6
Growth Rate: -150.00%
Updated: December 30 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: June 29 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: August 20 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: December 16 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: December 20 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: August 28 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: December 17 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: August 13 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: April 11 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: October 08 2022
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: January 20 2024