Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
Model description
SigLIP is
CLIP
, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
A TLDR of SigLIP by one of the authors can be found
here
.
Intended uses & limitations
You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the
model hub
to look for
other versions on a task that interests you.
How to use
Here is how to use this model to perform zero-shot image classification:
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("google/siglip-large-patch16-384")
processor = AutoProcessor.from_pretrained("google/siglip-large-patch16-384")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
texts = ["a photo of 2 cats", "a photo of 2 dogs"]
inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image) # these are the probabilitiesprint(f"{probs[0][0]:.1%} that image 0 is '{texts[0]}'")
Alternatively, one can leverage the pipeline API which abstracts away the complexity for the user:
from transformers import pipeline
from PIL import Image
import requests
# load pipe
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-large-patch16-384")
# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
# inference
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)
SigLIP is pre-trained on the English image-text pairs of the WebLI dataset
(Chen et al., 2023)
.
Preprocessing
Images are resized/rescaled to the same resolution (384x384) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
Texts are tokenized and padded to the same length (64 tokens).
Compute
The model was trained on 16 TPU-v4 chips for three days.
Evaluation results
Evaluation of SigLIP compared to CLIP is shown below (taken from the paper).
BibTeX entry and citation info
@misc{zhai2023sigmoid,
title={Sigmoid Loss for Language Image Pre-Training},
author={Xiaohua Zhai and Basil Mustafa and Alexander Kolesnikov and Lucas Beyer},
year={2023},
eprint={2303.15343},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Runs of google siglip-large-patch16-384 on huggingface.co
175.5K
Total runs
-121
24-hour runs
2.5K
3-day runs
163.0K
7-day runs
164.9K
30-day runs
More Information About siglip-large-patch16-384 huggingface.co Model
siglip-large-patch16-384 huggingface.co is an AI model on huggingface.co that provides siglip-large-patch16-384's model effect (), which can be used instantly with this google siglip-large-patch16-384 model. huggingface.co supports a free trial of the siglip-large-patch16-384 model, and also provides paid use of the siglip-large-patch16-384. Support call siglip-large-patch16-384 model through api, including Node.js, Python, http.
siglip-large-patch16-384 huggingface.co is an online trial and call api platform, which integrates siglip-large-patch16-384's modeling effects, including api services, and provides a free online trial of siglip-large-patch16-384, you can try siglip-large-patch16-384 online for free by clicking the link below.
google siglip-large-patch16-384 online free url in huggingface.co:
siglip-large-patch16-384 is an open source model from GitHub that offers a free installation service, and any user can find siglip-large-patch16-384 on GitHub to install. At the same time, huggingface.co provides the effect of siglip-large-patch16-384 install, users can directly use siglip-large-patch16-384 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
siglip-large-patch16-384 install url in huggingface.co: