QA-CLIP huggingface.co api & TencentARC QA-CLIP github AI Model

Introduction of QA-CLIP

Model Details of QA-CLIP

Introduction

This project aims to provide a better Chinese CLIP model. The training data used in this project consists of publicly accessible image URLs and related Chinese text descriptions, totaling 400 million. After screening, we ultimately used 100 million data for training. This project is produced by QQ-ARC Joint Lab, Tencent PCG. We have also open-sourced our code on GitHub, QA-CLIP , and welcome to star!

Models and Results

Model Card

QA-CLIP currently has three different open-source models of different sizes, and their model information and download links are shown in the table below:

Model	Ckp	Params	Vision	Params of Vision	Text	Params of Text	Resolution
QA-CLIP _RN50	Download	77M	ResNet50	38M	RBT3	39M	224
QA-CLIP _ViT-B/16	Download	188M	ViT-B/16	86M	RoBERTa-wwm-Base	102M	224
QA-CLIP _ViT-L/14	Download	406M	ViT-L/14	304M	RoBERTa-wwm-Base	102M	224

Results

We conducted zero-shot tests on MUGE Retrieval , Flickr30K-CN , and COCO-CN datasets for image-text retrieval tasks. For the image zero-shot classification task, we tested on the ImageNet dataset. The test results are shown in the table below:

Flickr30K-CN Zero-shot Retrieval (Official Test Set) :

Task	Text-to-Image	Image-to-Text
Metric	R@1	R@5	R@10	R@1	R@5	R@10
CN-CLIP _RN50	48.8	76.0	84.6	60.0	85.9	92.0
QA-CLIP _RN50	50.5	77.4	86.1	67.1	87.9	93.2
CN-CLIP _ViT-B/16	62.7	86.9	92.8	74.6	93.5	97.1
QA-CLIP _ViT-B/16	63.8	88.0	93.2	78.4	96.1	98.5
CN-CLIP _ViT-L/14	68.0	89.7	94.4	80.2	96.6	98.2
AltClip _ViT-L/14	69.7	90.1	94.8	84.8	97.7	99.1
QA-CLIP _ViT-L/14	69.3	90.3	94.7	85.3	97.9	99.2

MUGE Zero-shot Retrieval (Official Validation Set) :

Task	Text-to-Image	Image-to-Text
Metric	R@1	R@5	R@10	R@1	R@5	R@10
CN-CLIP _RN50	42.6	68.5	78.0	30.0	56.2	66.9
QA-CLIP _RN50	44.0	69.9	79.5	32.4	59.5	70.3
CN-CLIP _ViT-B/16	52.1	76.7	84.4	38.7	65.6	75.1
QA-CLIP _ViT-B/16	53.2	77.7	85.1	40.7	68.2	77.2
CN-CLIP _ViT-L/14	56.4	79.8	86.2	42.6	69.8	78.6
AltClip _ViT-L/14	29.6	49.9	58.8	21.4	42.0	51.9
QA-CLIP _ViT-L/14	57.4	81.0	87.7	45.5	73.0	81.4

COCO-CN Zero-shot Retrieval (Official Test Set) :

Task	Text-to-Image	Image-to-Text
Metric	R@1	R@5	R@10	R@1	R@5	R@10
CN-CLIP _RN50	48.1	81.3	90.5	50.9	81.1	90.5
QA-CLIP _RN50	50.1	82.5	91.7	56.7	85.2	92.9
CN-CLIP _ViT-B/16	62.2	87.1	94.9	56.3	84.0	93.3
QA-CLIP _ViT-B/16	62.9	87.7	94.7	61.5	87.6	94.8
CN-CLIP _ViT-L/14	64.9	88.8	94.2	60.6	84.4	93.1
AltClip _ViT-L/14	63.5	87.6	93.5	62.6	88.5	95.9
QA-CLIP _ViT-L/14	65.7	90.2	95.0	64.5	88.3	95.1

Zero-shot Image Classification on ImageNet :

Task	ImageNet
CN-CLIP _RN50	33.5
QA-CLIP _RN50	35.5
CN-CLIP _ViT-B/16	48.4
QA-CLIP _ViT-B/16	49.7
CN-CLIP _ViT-L/14	54.7
QA-CLIP _ViT-L/14	55.8

Getting Started

Installation Requirements

Environment configuration requirements:

python >= 3.6.4
pytorch >= 1.8.0 (with torchvision >= 0.9.0)
CUDA Version >= 10.2

Install required packages:

cd /yourpath/QA-CLIP-main
pip install --upgrade pip
pip install -r requirements.txt

Inference Code

export PYTHONPATH=/yourpath/QA-CLIP-main

Inference code example：

import torch 
from PIL import Image

import clip as clip
from clip import load_from_name, available_models
print("Available models:", available_models())  
# Available models: ['ViT-B-16', 'ViT-L-14', 'RN50']

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = load_from_name("ViT-B-16", device=device, download_root='./')
model.eval()
image = preprocess(Image.open("examples/pokemon.jpeg")).unsqueeze(0).to(device)
text = clip.tokenize(["杰尼龟", "妙蛙种子", "小火龙", "皮卡丘"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    # Normalize the features. Please use the normalized features for downstream tasks.
    image_features /= image_features.norm(dim=-1, keepdim=True) 
    text_features /= text_features.norm(dim=-1, keepdim=True)    

    logits_per_image, logits_per_text = model.get_similarity(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)

Prediction and Evaluation

Download Image-text Retrieval Test Dataset

In Project Chinese-CLIP , the test set has already been preprocessed. Here is the download link they provided:

MUGE dataset： download link

Flickr30K-CN dataset： download link

Additionally, obtaining the COCO-CN dataset requires applying to the original author.

Download ImageNet Dataset

Please download the raw data yourself， Chinese Label and English Label are provided by Project Chinese-CLIP

Image-text Retrieval Evaluation

The image-text retrieval evaluation code can be referred to as follows:

split=test # Designate the computation of features for the valid or test set
resume=your_ckp_path
DATAPATH=your_DATAPATH
dataset_name=Flickr30k-CN
# dataset_name=MUGE

python -u eval/extract_features.py \
    --extract-image-feats \
    --extract-text-feats \
    --image-data="${DATAPATH}/datasets/${dataset_name}/lmdb/${split}/imgs" \
    --text-data="${DATAPATH}/datasets/${dataset_name}/${split}_texts.jsonl" \
    --img-batch-size=32 \
    --text-batch-size=32 \
    --context-length=52 \
    --resume=${resume} \
    --vision-model=ViT-B-16 \
    --text-model=RoBERTa-wwm-ext-base-chinese

python -u eval/make_topk_predictions.py \
    --image-feats="${DATAPATH}/datasets/${dataset_name}/${split}_imgs.img_feat.jsonl" \
    --text-feats="${DATAPATH}/datasets/${dataset_name}/${split}_texts.txt_feat.jsonl" \
    --top-k=10 \
    --eval-batch-size=32768 \
    --output="${DATAPATH}/datasets/${dataset_name}/${split}_predictions.jsonl"

python -u eval/make_topk_predictions_tr.py \
    --image-feats="${DATAPATH}/datasets/${dataset_name}/${split}_imgs.img_feat.jsonl" \
    --text-feats="${DATAPATH}/datasets/${dataset_name}/${split}_texts.txt_feat.jsonl" \
    --top-k=10 \
    --eval-batch-size=32768 \
    --output="${DATAPATH}/datasets/${dataset_name}/${split}_tr_predictions.jsonl"

python eval/evaluation.py \
    ${DATAPATH}/datasets/${dataset_name}/${split}_texts.jsonl \
    ${DATAPATH}/datasets/${dataset_name}/${split}_predictions.jsonl \
    ${DATAPATH}/datasets/${dataset_name}/output1.json
cat  ${DATAPATH}/datasets/${dataset_name}/output1.json

python eval/transform_ir_annotation_to_tr.py \
    --input ${DATAPATH}/datasets/${dataset_name}/${split}_texts.jsonl

python eval/evaluation_tr.py \
    ${DATAPATH}/datasets/${dataset_name}/${split}_texts.tr.jsonl \
    ${DATAPATH}/datasets/${dataset_name}/${split}_tr_predictions.jsonl \
    ${DATAPATH}/datasets/${dataset_name}/output2.json
cat ${DATAPATH}/datasets/${dataset_name}/output2.json

ImageNet Zero-shot Classification

The ImageNet zero-shot classification code can be referred to as follows

bash scripts/zeroshot_eval.sh 0 \
    ${DATAPATH} imagenet \
    ViT-B-16 RoBERTa-wwm-ext-base-chinese \
    ./pretrained_weights/QA-CLIP-base.pt

Huggingface Model and Online Demo

We have open-sourced our model on the HuggingFace for easier access and utilization. Additionally, we have prepared a simple online demo for zero-shot classification, allowing everyone to experience it firsthand. We encourage you to give it a try!

⭐️QA-CLIP-ViT-B-16⭐️

⭐️QA-CLIP-ViT-L-14⭐️

Here are some examples for demonstration:

Acknowledgments

The project code is based on implementation of Chinese-CLIP , and we are very grateful for their outstanding open-source contributions.

Runs of TencentARC QA-CLIP on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About QA-CLIP huggingface.co Model

More QA-CLIP license Visit here:

https://choosealicense.com/licenses/apache-2.0

QA-CLIP huggingface.co

QA-CLIP huggingface.co is an AI model on huggingface.co that provides QA-CLIP's model effect (), which can be used instantly with this TencentARC QA-CLIP model. huggingface.co supports a free trial of the QA-CLIP model, and also provides paid use of the QA-CLIP. Support call QA-CLIP model through api, including Node.js, Python, http.

QA-CLIP huggingface.co Url

https://huggingface.co/TencentARC/QA-CLIP

TencentARC QA-CLIP online free

QA-CLIP huggingface.co is an online trial and call api platform, which integrates QA-CLIP's modeling effects, including api services, and provides a free online trial of QA-CLIP, you can try QA-CLIP online for free by clicking the link below.

TencentARC QA-CLIP online free url in huggingface.co:

https://huggingface.co/TencentARC/QA-CLIP

QA-CLIP install

QA-CLIP is an open source model from GitHub that offers a free installation service, and any user can find QA-CLIP on GitHub to install. At the same time, huggingface.co provides the effect of QA-CLIP install, users can directly use QA-CLIP installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

QA-CLIP install url in huggingface.co:

https://huggingface.co/TencentARC/QA-CLIP

huggingface.co

TencentARC/InstantMesh

Total runs: 75.8K

Run Growth: -691

Growth Rate: -1.04%

Updated: Abril 11 2024

huggingface.co

TencentARC/PhotoMaker

Create photos, paintings and avatars for anyone in any style within seconds.

Total runs: 35.2K

Run Growth: -43.4K

Growth Rate: -124.12%

Updated: Julio 22 2024

huggingface.co

TencentARC/PhotoMaker-V2

Total runs: 30.3K

Run Growth: 6.6K

Growth Rate: 21.61%

Updated: Julio 22 2024

huggingface.co

TencentARC/t2i-adapter-sketch-sdxl-1.0

Total runs: 9.8K

Run Growth: 134

Growth Rate: 1.37%

Updated: Septiembre 08 2023

huggingface.co

TencentARC/t2i-adapter-canny-sdxl-1.0

Total runs: 6.8K

Run Growth: 1.1K

Growth Rate: 16.29%

Updated: Septiembre 07 2023

huggingface.co

TencentARC/t2i-adapter-lineart-sdxl-1.0

Total runs: 6.7K

Run Growth: 363

Growth Rate: 5.55%

Updated: Septiembre 07 2023

huggingface.co

TencentARC/t2i-adapter-depth-midas-sdxl-1.0

Total runs: 5.9K

Run Growth: 1.0K

Growth Rate: 17.41%

Updated: Septiembre 07 2023

huggingface.co

TencentARC/t2i-adapter-openpose-sdxl-1.0

Total runs: 5.3K

Run Growth: 1.4K

Growth Rate: 27.11%

Updated: Septiembre 07 2023

huggingface.co

TencentARC/t2i-adapter-depth-zoe-sdxl-1.0

Total runs: 4.3K

Run Growth: 38

Growth Rate: 0.89%

Updated: Septiembre 08 2023

huggingface.co

TencentARC/t2iadapter_depth_sd15v2

Total runs: 2.6K

Run Growth: 327

Growth Rate: 12.57%

Updated: Julio 31 2023

huggingface.co

TencentARC/t2iadapter_sketch_sd15v2

Total runs: 2.4K

Run Growth: 57

Growth Rate: 2.32%

Updated: Agosto 01 2023

huggingface.co

TencentARC/t2iadapter_canny_sd15v2

Total runs: 2.4K

Run Growth: 152

Growth Rate: 6.31%

Updated: Julio 31 2023

huggingface.co

TencentARC/LLaMA-Pro-8B

Total runs: 2.3K

Run Growth: 962

Growth Rate: 42.17%

Updated: Enero 08 2024

huggingface.co

TencentARC/LLaMA-Pro-8B-Instruct

Total runs: 2.2K

Run Growth: 743

Growth Rate: 34.30%

Updated: Enero 07 2024

huggingface.co

TencentARC/t2iadapter_zoedepth_sd15v1

Total runs: 2.0K

Run Growth: 210

Growth Rate: 10.38%

Updated: Julio 31 2023

huggingface.co

TencentARC/Mistral_Pro_8B_v0.1

Total runs: 217

Run Growth: -181

Growth Rate: -82.27%

Updated: Febrero 27 2024

huggingface.co

TencentARC/StereoCrafter

Total runs: 160

Run Growth: 44

Growth Rate: 26.04%

Updated: Diciembre 27 2024

huggingface.co

TencentARC/t2iadapter_openpose_sd14v1

Total runs: 148

Run Growth: -542

Growth Rate: -354.25%

Updated: Julio 31 2023

huggingface.co

TencentARC/NVComposer

Total runs: 122

Run Growth: -78

Growth Rate: -55.71%

Updated: Diciembre 16 2024

huggingface.co

TencentARC/flux-mini

Total runs: 114

Run Growth: 22

Growth Rate: 19.30%

Updated: Noviembre 29 2024

huggingface.co

TencentARC/t2iadapter_depth_sd14v1

Total runs: 57

Run Growth: 35

Growth Rate: 61.40%

Updated: Julio 31 2023

huggingface.co

TencentARC/t2iadapter_color_sd14v1

Total runs: 54

Run Growth: -18

Growth Rate: -35.29%

Updated: Julio 31 2023

huggingface.co

TencentARC/t2iadapter_sketch_sd14v1

Total runs: 50

Run Growth: 29

Growth Rate: 60.42%

Updated: Julio 31 2023

huggingface.co

TencentARC/t2iadapter_canny_sd14v1

Total runs: 41

Run Growth: 12

Growth Rate: 30.77%

Updated: Julio 31 2023

huggingface.co

TencentARC/QA-CLIP-ViT-L-14

Total runs: 41

Run Growth: -103

Growth Rate: -264.10%

Updated: Mayo 16 2023

huggingface.co

TencentARC/QA-CLIP-ViT-B-16

Total runs: 33

Run Growth: -78

Growth Rate: -243.75%

Updated: Mayo 16 2023

huggingface.co

TencentARC/t2iadapter_seg_sd14v1

Total runs: 27

Run Growth: 9

Growth Rate: 40.91%

Updated: Julio 31 2023

huggingface.co

TencentARC/MetaMath-Mistral-Pro

Total runs: 24

Run Growth: 3

Growth Rate: 12.50%

Updated: Febrero 27 2024

huggingface.co

TencentARC/Divot

Total runs: 19

Run Growth: 11

Growth Rate: 57.89%

Updated: Diciembre 10 2024

huggingface.co

TencentARC/Open-MAGVIT2-Tokenizer-128-resolution

Total runs: 19

Run Growth: 3

Growth Rate: 15.79%

Updated: Enero 02 2025

huggingface.co

TencentARC/Open-MAGVIT2-Tokenizer-256-resolution

Total runs: 18

Run Growth: 9

Growth Rate: 50.00%

Updated: Enero 02 2025

huggingface.co

TencentARC/SEED-Story

Total runs: 16

Run Growth: -3

Growth Rate: -17.65%

Updated: Agosto 26 2024

huggingface.co

TencentARC/IBQ-Tokenizer-16384

Total runs: 12

Run Growth: 2

Growth Rate: 16.67%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/Open-MAGVIT2-AR-XL-256-resolution

Total runs: 12

Run Growth: 2

Growth Rate: 16.67%

Updated: Enero 02 2025

huggingface.co

TencentARC/t2iadapter_keypose_sd14v1

Total runs: 12

Run Growth: 2

Growth Rate: 16.67%

Updated: Julio 14 2023

huggingface.co

TencentARC/Open-MAGVIT2-AR-B-256-resolution

Total runs: 9

Run Growth: 2

Growth Rate: 22.22%

Updated: Enero 02 2025

huggingface.co

TencentARC/IBQ-AR-XXL

Total runs: 9

Run Growth: 3

Growth Rate: 33.33%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/IBQ-Tokenizer-262144

Total runs: 7

Run Growth: -5

Growth Rate: -71.43%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/IBQ-Tokenizer-1024

Total runs: 6

Run Growth: -2

Growth Rate: -33.33%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/IBQ-Tokenizer-8192

Total runs: 5

Run Growth: -2

Growth Rate: -40.00%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/IBQ-AR-XL

Total runs: 5

Run Growth: -1

Growth Rate: -20.00%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/IBQ-AR-L

Total runs: 5

Run Growth: -2

Growth Rate: -40.00%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/Open-MAGVIT2-AR-L-256-resolution

Total runs: 5

Run Growth: -6

Growth Rate: -120.00%

Updated: Enero 02 2025

huggingface.co

TencentARC/IBQ-Tokenizer-16384-Pretrain

Total runs: 4

Run Growth: 4

Growth Rate: 100.00%

Updated: Febrero 13 2025

huggingface.co

TencentARC/IBQ-Tokenizer-262144-Pretrain

Total runs: 4

Run Growth: 4

Growth Rate: 100.00%

Updated: Febrero 13 2025

huggingface.co

TencentARC/IBQ-AR-B

Total runs: 4

Run Growth: -6

Growth Rate: -150.00%

Updated: Diciembre 30 2024

huggingface.co

TencentARC/ViT-Lens

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Junio 29 2024

huggingface.co

TencentARC/FreeSplatter

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Diciembre 19 2024

huggingface.co

TencentARC/mllm-npu-llama2-qwenvl-vit

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Julio 10 2024

huggingface.co

TencentARC/ColorFlow

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Enero 12 2025

huggingface.co

TencentARC/SmartEdit-7B

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Abril 27 2024

huggingface.co

TencentARC/MasaCtrl

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Agosto 20 2023

huggingface.co

TencentARC/ImageConductor

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Julio 09 2024

huggingface.co

TencentARC/T2I-Adapter

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Agosto 22 2023

huggingface.co

TencentARC/BrushEdit

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Diciembre 16 2024

huggingface.co

TencentARC/DI-PCG

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Diciembre 20 2024

huggingface.co

TencentARC/Moto

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Diciembre 17 2024

huggingface.co

TencentARC/SmartEdit-13B

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Abril 27 2024

huggingface.co

TencentARC/Mira-v1

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Agosto 13 2024

huggingface.co

TencentARC/MotionCtrl

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Julio 19 2024

huggingface.co

TencentARC/Mira-v0

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Abril 11 2024

huggingface.co

TencentARC/Open-MAGVIT2

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Septiembre 09 2024

huggingface.co

TencentARC/GFPGANv1

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Octubre 08 2022

huggingface.co

TencentARC/CustomNet

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Julio 22 2024

huggingface.co

TencentARC/ViSFT

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Enero 20 2024

TencentARC / QA-CLIP

Introduction of QA-CLIP

Model Details of QA-CLIP

Introduction

Models and Results

Model Card

Results

Getting Started

Installation Requirements

Inference Code

Prediction and Evaluation

Download Image-text Retrieval Test Dataset

Download ImageNet Dataset

Image-text Retrieval Evaluation

ImageNet Zero-shot Classification

Huggingface Model and Online Demo

Acknowledgments

Runs of TencentARC QA-CLIP on huggingface.co

More Information About QA-CLIP huggingface.co Model

More QA-CLIP license Visit here:

QA-CLIP huggingface.co

QA-CLIP huggingface.co Url

TencentARC QA-CLIP online free

TencentARC QA-CLIP online free url in huggingface.co:

QA-CLIP install

QA-CLIP install url in huggingface.co:

Url of QA-CLIP

QA-CLIP huggingface.co Url

Provider of QA-CLIP huggingface.co

Other API from TencentARC