We propose a simple and efficient method to train a better multilingua CLIP model. Named AltCLIP-m9. AltCLIP-m9 is trained with training data from
WuDao dataset
and
Liaon
.
The AltCLIP-m9 model can provide support for the AltDiffusion-m9 model in this project. Specific information on the AltDiffusion model can be found in
this tutorial
.
The model code has been open sourced on
FlagAI
and the weights are located on
modelhub
. We also provide scripts for fine-tuning, inference, and validation, so feel free to try them out.
引用
关于AltCLIP,我们已经推出了相关报告,有更多细节可以查阅,如对您的工作有帮助,欢迎引用。
If you find this work helpful, please consider to cite
@article{https://doi.org/10.48550/arxiv.2211.06679,
doi = {10.48550/ARXIV.2211.06679},
url = {https://arxiv.org/abs/2211.06679},
author = {Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences},
title = {AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
There are two phases of training.
In the parallel knowledge distillation phase, we only use parallel corpus texts for distillation (parallel corpus is easier to obtain and larger in number compared to image text pairs). In the multilingual comparison learning phase, we use a small number of text-image pairs (about 6 million in each language) to train our text encoder to better fit the image encoder.
下游效果 Performance
可视化效果 Visualization effects
基于AltCLIP,我们还开发了AltDiffusion模型,可视化效果如下。
Based on AltCLIP, we have also developed the AltDiffusion model, visualized as follows.
from PIL import Image
import requests
# transformers version >= 4.21.0from modeling_altclip import AltCLIP
from processing_altclip import AltCLIPProcessor
# now our repo's in private, so we need `use_auth_token=True`
model = AltCLIP.from_pretrained("BAAI/AltCLIP-m9")
processor = AltCLIPProcessor.from_pretrained("BAAI/AltCLIP-m9")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
Runs of BAAI AltCLIP-m9 on huggingface.co
21
Total runs
0
24-hour runs
1
3-day runs
9
7-day runs
-56
30-day runs
More Information About AltCLIP-m9 huggingface.co Model
AltCLIP-m9 huggingface.co is an AI model on huggingface.co that provides AltCLIP-m9's model effect (), which can be used instantly with this BAAI AltCLIP-m9 model. huggingface.co supports a free trial of the AltCLIP-m9 model, and also provides paid use of the AltCLIP-m9. Support call AltCLIP-m9 model through api, including Node.js, Python, http.
AltCLIP-m9 huggingface.co is an online trial and call api platform, which integrates AltCLIP-m9's modeling effects, including api services, and provides a free online trial of AltCLIP-m9, you can try AltCLIP-m9 online for free by clicking the link below.
BAAI AltCLIP-m9 online free url in huggingface.co:
AltCLIP-m9 is an open source model from GitHub that offers a free installation service, and any user can find AltCLIP-m9 on GitHub to install. At the same time, huggingface.co provides the effect of AltCLIP-m9 install, users can directly use AltCLIP-m9 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.