stabilityai / japanese-stable-clip-vit-l-16

huggingface.co
Total runs: 561
24-hour runs: 0
7-day runs: -23
30-day runs: 166
Model's Last Updated: July 10 2024
feature-extraction

Introduction of japanese-stable-clip-vit-l-16

Model Details of japanese-stable-clip-vit-l-16

Japanese Stable CLIP ViT-L/16

Please note: for commercial usage of this model, please see https://stability.ai/license

商用利用に関する日本語での問い合わせは [email protected] までお願い致します。

Model Details

Japanese Stable CLIP is a Japanese CLIP (Contrastive Language-Image Pre-Training) model that enables to map both Japanese texts and images to the same embedding space. This model alone is capable of tasks such as zero-shot image classification and text-to-image retrieval. Furthermore, when combined with other components, it can be used as part of generative models, such as image-to-text and text-to-image generation.

Usage
  1. Install packages
pip install ftfy pillow requests transformers torch sentencepiece protobuf
  1. Run!
from typing import Union, List
import ftfy, html, re, io
import requests
from PIL import Image
import torch
from transformers import AutoModel, AutoTokenizer, AutoImageProcessor, BatchFeature

# taken from https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/tokenizer.py#L65C8-L65C8
def basic_clean(text):
    text = ftfy.fix_text(text)
    text = html.unescape(html.unescape(text))
    return text.strip()

def whitespace_clean(text):
    text = re.sub(r"\s+", " ", text)
    text = text.strip()
    return text

def tokenize(
    tokenizer,
    texts: Union[str, List[str]],
    max_seq_len: int = 77,
):
    """
    This is a function that have the original clip's code has.
    https://github.com/openai/CLIP/blob/main/clip/clip.py#L195
    """
    if isinstance(texts, str):
        texts = [texts]
    texts = [whitespace_clean(basic_clean(text)) for text in texts]

    inputs = tokenizer(
        texts,
        max_length=max_seq_len - 1,
        padding="max_length",
        truncation=True,
        add_special_tokens=False,
    )
    # add bos token at first place
    input_ids = [[tokenizer.bos_token_id] + ids for ids in inputs["input_ids"]]
    attention_mask = [[1] + am for am in inputs["attention_mask"]]
    position_ids = [list(range(0, len(input_ids[0])))] * len(texts)

    return BatchFeature(
        {
            "input_ids": torch.tensor(input_ids, dtype=torch.long),
            "attention_mask": torch.tensor(attention_mask, dtype=torch.long),
            "position_ids": torch.tensor(position_ids, dtype=torch.long),
        }
    )

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = "stabilityai/japanese-stable-clip-vit-l-16"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoImageProcessor.from_pretrained(model_path)

# Run!
image = Image.open(io.BytesIO(requests.get('https://images.pexels.com/photos/2253275/pexels-photo-2253275.jpeg?auto=compress&cs=tinysrgb&dpr=3&h=750&w=1260').content))
image = processor(images=image, return_tensors="pt").to(device)
text = tokenize(
    tokenizer=tokenizer,
    texts=["犬", "猫", "象"],
).to(device)

with torch.no_grad():
    image_features = model.get_image_features(**image)
    text_features = model.get_text_features(**text)
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs) 
# [[1.0, 0.0, 0.0]]
Model Details

* Computed scores based on https://github.com/rinnakk/japanese-clip .

Training

The model uses a ViT-L/16 Transformer architecture as an image encoder and a 12-layer BERT as a text encoder with the Japanese tokenizer from rinna/japanese-roberta-base . During training, the image encoder was initialized from the AugReg vit-large-patch16-224 model and we applied SigLIP (Sigmoid loss for Language-Image Pre-training) .

Training Dataset

The training dataset includes the following public datasets:

Use and Limitations
Intended Use

This model is intended to be used by the open-source community in vision-language applications.

Limitations and bias

The training dataset may have contained offensive or inappropriate content even though we applied data filters. We recommend users exercise reasonable caution when using these models in production systems. Do not use the model for any applications that may cause harm or distress to individuals or groups.

How to cite
@misc{JapaneseStableCLIP, 
    url    = {[https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16](https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16)}, 
    title  = {Japanese Stable CLIP ViT-L/16}, 
    author = {Shing, Makoto and Akiba, Takuya}
}
Contact

Runs of stabilityai japanese-stable-clip-vit-l-16 on huggingface.co

561
Total runs
0
24-hour runs
-17
3-day runs
-23
7-day runs
166
30-day runs

More Information About japanese-stable-clip-vit-l-16 huggingface.co Model

More japanese-stable-clip-vit-l-16 license Visit here:

https://choosealicense.com/licenses/other

japanese-stable-clip-vit-l-16 huggingface.co

japanese-stable-clip-vit-l-16 huggingface.co is an AI model on huggingface.co that provides japanese-stable-clip-vit-l-16's model effect (), which can be used instantly with this stabilityai japanese-stable-clip-vit-l-16 model. huggingface.co supports a free trial of the japanese-stable-clip-vit-l-16 model, and also provides paid use of the japanese-stable-clip-vit-l-16. Support call japanese-stable-clip-vit-l-16 model through api, including Node.js, Python, http.

japanese-stable-clip-vit-l-16 huggingface.co Url

https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16

stabilityai japanese-stable-clip-vit-l-16 online free

japanese-stable-clip-vit-l-16 huggingface.co is an online trial and call api platform, which integrates japanese-stable-clip-vit-l-16's modeling effects, including api services, and provides a free online trial of japanese-stable-clip-vit-l-16, you can try japanese-stable-clip-vit-l-16 online for free by clicking the link below.

stabilityai japanese-stable-clip-vit-l-16 online free url in huggingface.co:

https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16

japanese-stable-clip-vit-l-16 install

japanese-stable-clip-vit-l-16 is an open source model from GitHub that offers a free installation service, and any user can find japanese-stable-clip-vit-l-16 on GitHub to install. At the same time, huggingface.co provides the effect of japanese-stable-clip-vit-l-16 install, users can directly use japanese-stable-clip-vit-l-16 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

japanese-stable-clip-vit-l-16 install url in huggingface.co:

https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16

Url of japanese-stable-clip-vit-l-16

japanese-stable-clip-vit-l-16 huggingface.co Url

Provider of japanese-stable-clip-vit-l-16 huggingface.co

stabilityai
ORGANIZATIONS

Other API from stabilityai

huggingface.co

Total runs: 186.1K
Run Growth: 36.8K
Growth Rate: 19.87%
Updated: August 04 2023
huggingface.co

Total runs: 68.9K
Run Growth: -62.4K
Growth Rate: -88.78%
Updated: July 10 2024
huggingface.co

Total runs: 42.3K
Run Growth: 13.5K
Growth Rate: 31.97%
Updated: August 09 2024
huggingface.co

Total runs: 819
Run Growth: 102
Growth Rate: 12.45%
Updated: August 03 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: July 10 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: April 14 2024