The testing is performed with VTAB+ (A combination of VTAB (
https://arxiv.org/abs/1910.04867
) w/ additional robustness datasets) for classification and COCO and Flickr for retrieval.
Results
The model achieves
imagenet 1k 62.33% (vs 62.9% for baseline)
mscoco 63.4% (vs 60.8% for baseline)
flickr30k 86.2% (vs 85.4% for baseline)
A preliminary multilingual evaluation was run: 43% on imagenet1k italian (vs 21% for english B/32), 37% for imagenet1k japanese (vs 1% for english B/32 and 50% for B/16 clip japanese). It shows the multilingual property is indeed there as expected. Larger models will get even better performance.
Acknowledgements
Acknowledging
stability.ai
for the compute used to train this model.
@inproceedings{Radford2021LearningTV,
title={Learning Transferable Visual Models From Natural Language Supervision},
author={Alec Radford and Jong Wook Kim and Chris Hallacy and A. Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
booktitle={ICML},
year={2021}
}
OpenCLIP software
@software{ilharco_gabriel_2021_5143773,
author = {Ilharco, Gabriel and
Wortsman, Mitchell and
Wightman, Ross and
Gordon, Cade and
Carlini, Nicholas and
Taori, Rohan and
Dave, Achal and
Shankar, Vaishaal and
Namkoong, Hongseok and
Miller, John and
Hajishirzi, Hannaneh and
Farhadi, Ali and
Schmidt, Ludwig},
title = {OpenCLIP},
month = jul,
year = 2021,
note = {If you use this software, please cite it as below.},
publisher = {Zenodo},
version = {0.1},
doi = {10.5281/zenodo.5143773},
url = {https://doi.org/10.5281/zenodo.5143773}
}
CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k huggingface.co is an AI model on huggingface.co that provides CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k's model effect (), which can be used instantly with this laion CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k model. huggingface.co supports a free trial of the CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k model, and also provides paid use of the CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k. Support call CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k model through api, including Node.js, Python, http.
CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k huggingface.co is an online trial and call api platform, which integrates CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k's modeling effects, including api services, and provides a free online trial of CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k, you can try CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k online for free by clicking the link below.
laion CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k online free url in huggingface.co:
CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k is an open source model from GitHub that offers a free installation service, and any user can find CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k on GitHub to install. At the same time, huggingface.co provides the effect of CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k install, users can directly use CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k install url in huggingface.co: