Japanese Stable VLM is a vision-language instruction-following model that enables to generate Japanese descriptions for input images and optionally input texts such as questions.
Usage
import torch
from transformers import AutoTokenizer, AutoModelForVision2Seq, AutoImageProcessor
from PIL import Image
import requests
# helper function to format input prompts
TASK2INSTRUCTION = {
"caption": "画像を詳細に述べてください。",
"tag": "与えられた単語を使って、画像を詳細に述べてください。",
"vqa": "与えられた画像を下に、質問に答えてください。",
}
defbuild_prompt(task="caption", input=None, sep="\n\n### "):
assert (
task in TASK2INSTRUCTION
), f"Please choose from {list(TASK2INSTRUCTION.keys())}"if task in ["tag", "vqa"]:
assertinputisnotNone, "Please fill in `input`!"if task == "tag"andisinstance(input, list):
input = "、".join(input)
else:
assertinputisNone, f"`{task}` mode doesn't support to input questions"
sys_msg = "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"
p = sys_msg
roles = ["指示", "応答"]
instruction = TASK2INSTRUCTION[task]
msgs = [": \n" + instruction, ": \n"]
ifinput:
roles.insert(1, "入力")
msgs.insert(1, ": \n" + input)
for role, msg inzip(roles, msgs):
p += sep + role + msg
return p
# load model
device = "cuda"if torch.cuda.is_available() else"cpu"
model = AutoModelForVision2Seq.from_pretrained("stabilityai/japanese-stable-vlm", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("stabilityai/japanese-stable-vlm")
tokenizer = AutoTokenizer.from_pretrained("stabilityai/japanese-stable-vlm")
model.to(device)
# prepare inputs
url = "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = build_prompt(task="caption")
# prompt = build_prompt(task="tag", input=["河津桜", "青空"])# prompt = build_prompt(task="vqa", input="季節はいつですか?")
inputs = processor(images=image, return_tensors="pt")
text_encoding = tokenizer(prompt, add_special_tokens=False, return_tensors="pt")
inputs.update(text_encoding)
# generate
outputs = model.generate(
**inputs.to(device, dtype=model.dtype),
do_sample=False,
num_beams=5,
max_new_tokens=128,
min_length=1,
repetition_penalty=1.5,
)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
print(generated_text)
# 桜越しの東京スカイツリー
This model is a vision-language instruction-following model with the
LLaVA 1.5
architecture. It uses
stabilityai/japanese-stablelm-instruct-gamma-7b
as a language model and
openai/clip-vit-large-patch14
as an image encoder. During training, the MLP projection was trained from scratch at the first stage and the language model and the MLP projection were further trained at the second stage.
Training Dataset
The training dataset includes the following public datasets:
This model is intended to be used by the open-source community in vision-language applications.
Limitations and bias
The training dataset may have contained offensive or inappropriate content even though we applied data filters.
We recommend users exercise reasonable caution when using these models in production systems. Do not use the model for any applications that may cause harm or distress to individuals or groups.
How to cite
@misc{JapaneseStableVLM,
url = {[https://huggingface.co/stabilityai/japanese-stable-vlm](https://huggingface.co/stabilityai/japanese-stable-vlm)},
title = {Japanese Stable VLM},
author = {Shing, Makoto and Akiba, Takuya}
}
japanese-stable-vlm huggingface.co is an AI model on huggingface.co that provides japanese-stable-vlm's model effect (), which can be used instantly with this stabilityai japanese-stable-vlm model. huggingface.co supports a free trial of the japanese-stable-vlm model, and also provides paid use of the japanese-stable-vlm. Support call japanese-stable-vlm model through api, including Node.js, Python, http.
japanese-stable-vlm huggingface.co is an online trial and call api platform, which integrates japanese-stable-vlm's modeling effects, including api services, and provides a free online trial of japanese-stable-vlm, you can try japanese-stable-vlm online for free by clicking the link below.
stabilityai japanese-stable-vlm online free url in huggingface.co:
japanese-stable-vlm is an open source model from GitHub that offers a free installation service, and any user can find japanese-stable-vlm on GitHub to install. At the same time, huggingface.co provides the effect of japanese-stable-vlm install, users can directly use japanese-stable-vlm installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
japanese-stable-vlm install url in huggingface.co: