cuuupid / glm-4v-9b

GLM-4V is a multimodal model released by Tsinghua University that is competitive with GPT-4o and establishes a new SOTA on several benchmarks, including OCR.

replicate.com
Total runs: 75.6K
24-hour runs: 100
7-day runs: 1.1K
30-day runs: 4.3K
Github
Model's Last Updated: July 02 2024

Introduction of glm-4v-9b

Model Details of glm-4v-9b

Readme

This is a vision language model that is highly competitive with GPT-4’s vision capabilities. Credit to THUDM, original README in below.

GLM-4

Model Introduction

GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4-9B and its human preference-aligned version GLM-4-9B-Chat have shown superior performance beyond Llama-3-8B. In addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German. We have also launched the GLM-4-9B-Chat-1M model that supports 1M context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B. GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.

Model List
Model Type Seq Length Download Online Demo
GLM-4-9B Base 8K 🤗 Huggingface 🤖 ModelScope 🟣 WiseModel /
GLM-4-9B-Chat Chat 128K 🤗 Huggingface 🤖 ModelScope 🟣 WiseModel 🤖 ModelScope CPU
🤖 ModelScope vLLM
GLM-4-9B-Chat-1M Chat 1M 🤗 Huggingface 🤖 ModelScope 🟣 WiseModel /
GLM-4V-9B Chat 8K 🤗 Huggingface 🤖 ModelScope 🟣 WiseModel 🤖 ModelScope
Projects

The following excellent open source repositories have in-depth support for the GLM-4-9B model, and everyone is welcome to expand their learning.

Inference acceleration:

  • chatglm.cpp : Real-time inference on your laptop accelerated by quantization, similar to llama.cpp.
BenchMark
Typical Tasks
Model AlignBench MT-Bench IFEval MMLU C-Eval GSM8K MATH HumanEval NaturalCodeBench
Llama-3-8B-Instruct 6.40 8.00 68.58 68.4 51.3 79.6 30.0 62.2 24.7
ChatGLM3-6B 5.18 5.50 28.1 66.4 69.0 72.3 25.7 58.5 11.3
GLM-4-9B-Chat 7.01 8.35 69.0 72.4 75.6 79.6 50.6 71.8 32.2
Base Model
Model MMLU C-Eval GPQA GSM8K MATH HumanEval
Llama-3-8B 66.6 51.2 - 45.8 - 33.5
Llama-3-8B-Instruct 68.4 51.3 34.2 79.6 30.0 62.2
ChatGLM3-6B-Base 61.4 69.0 26.8 72.3 25.7 58.5
GLM-4-9B 74.7 77.1 34.3 84.0 30.4 70.1

Since GLM-4-9B adds some math, reasoning, and code-related instruction data during pre-training, Llama-3-8B-Instruct is also included in the comparison range.

Long Context

The needle-in-the-haystack experiment was conducted with a context length of 1M, and the results are as follows:

Multi Language

The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the corresponding languages selected for each dataset are shown in the table below:

Dataset Llama-3-8B-Instruct GLM-4-9B-Chat Languages
M-MMLU 49.6 56.6 all
FLORES 25.0 28.8 ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no
MGSM 54.0 65.3 zh, en, bn, de, es, fr, ja, ru, sw, te, th
XWinograd 61.7 73.1 zh, en, fr, jp, ru, pt
XStoryCloze 84.7 90.7 zh, en, ar, es, eu, hi, id, my, ru, sw, te
XCOPA 73.3 80.1 zh, et, ht, id, it, qu, sw, ta, th, tr, vi
Function Call

Tested on Berkeley Function Calling Leaderboard .

Model Overall Acc. AST Summary Exec Summary Relevance
Llama-3-8B-Instruct 58.88 59.25 70.01 45.83
gpt-4-turbo-2024-04-09 81.24 82.14 78.61 88.75
ChatGLM3-6B 57.88 62.18 69.78 5.42
GLM-4-9B-Chat 81.00 80.26 84.40 87.92
Multi-Modal

GLM-4V-9B is a multimodal language model with visual understanding capabilities. The evaluation results of its related classic tasks are as follows:

MMBench-EN-Test MMBench-CN-Test SEEDBench_IMG MMStar MMMU MME HallusionBench AI2D OCRBench
gpt-4o-2024-05-13 83.4 82.1 77.1 63.9 69.2 2310.3 55 84.6 736
gpt-4-turbo-2024-04-09 81.0 80.2 73.0 56.0 61.7 2070.2 43.9 78.6 656
gpt-4-1106-preview 77.0 74.4 72.3 49.7 53.8 1771.5 46.5 75.9 516
InternVL-Chat-V1.5 82.3 80.7 75.2 57.1 46.8 2189.6 47.4 80.6 720
LLaVA-Next-Yi-34B 81.1 79 75.7 51.6 48.8 2050.2 34.8 78.9 574
Step-1V 80.7 79.9 70.3 50.0 49.9 2206.4 48.4 79.2 625
MiniCPM-Llama3-V2.5 77.6 73.8 72.3 51.8 45.8 2024.6 42.4 78.4 725
Qwen-VL-Max 77.6 75.7 72.7 49.5 52 2281.7 41.2 75.7 684
Gemini 1.0 Pro 73.6 74.3 70.7 38.6 49 2148.9 45.7 72.9 680
Claude 3 Opus 63.3 59.2 64 45.7 54.9 1586.8 37.8 70.6 694
GLM-4V-9B 81.1 79.4 76.8 58.7 47.2 2163.8 46.6 81.1 786
Quick call

For hardware configuration and system requirements, please check here .

Use the following method to quickly call the GLM-4-9B-Chat language model

Use the transformers backend for inference:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"

tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat", trust_remote_code=True)

query = "你好"

inputs = tokenizer.apply_chat_template([{"role": "user", "content": query}],
                                       add_generation_prompt=True,
                                       tokenize=True,
                                       return_tensors="pt",
                                       return_dict=True
                                       )

inputs = inputs.to(device)
model = AutoModelForCausalLM.from_pretrained(
    "THUDM/glm-4-9b-chat",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
).to(device).eval()

gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
with torch.no_grad():
    outputs = model.generate(**inputs, **gen_kwargs)
    outputs = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Use the vLLM backend for inference:

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# GLM-4-9B-Chat
# If you encounter OOM, you can try to reduce max_model_len or increase tp_size
max_model_len, tp_size = 131072, 1
model_name = "THUDM/glm-4-9b-chat"
prompt = [{"role": "user", "content": "你好"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
llm = LLM(
    model=model_name,
    tensor_parallel_size=tp_size,
    max_model_len=max_model_len,
    trust_remote_code=True,
    enforce_eager=True,
    # if you encounter OOM in GLM-4-9B-Chat-1M, you can try to enable the following parameters
    # enable_chunked_prefill=True,
    # max_num_batched_tokens=8192
)
stop_token_ids = [151329, 151336, 151338]
sampling_params = SamplingParams(temperature=0.95, max_tokens=1024, stop_token_ids=stop_token_ids)

inputs = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
outputs = llm.generate(prompts=inputs, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

Use the following method to quickly call the GLM-4V-9B multimodal model

Use the transformers backend for inference:

import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"

tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4v-9b", trust_remote_code=True)

query = 'display this image'
image = Image.open("your image").convert('RGB')
inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
                                       add_generation_prompt=True, tokenize=True, return_tensors="pt",
                                       return_dict=True)  # chat mode

inputs = inputs.to(device)
model = AutoModelForCausalLM.from_pretrained(
    "THUDM/glm-4v-9b",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
).to(device).eval()

gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
with torch.no_grad():
    outputs = model.generate(**inputs, **gen_kwargs)
    outputs = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(outputs[0]))

Note: GLM-4V-9B does not support calling using vLLM method yet.

Complete project list

If you want to learn more about the GLM-4-9B series open source models, this open source repository provides developers with basic GLM-4-9B usage and development code through the following content

  • basic_demo : Contains
  • Interaction code using transformers and vLLM backend
  • OpenAI API backend interaction code
  • Batch reasoning code

  • composite_demo : Contains

  • Fully functional demonstration code for GLM-4-9B and GLM-4V-9B open source models, including All Tools capabilities, long document interpretation, and multimodal capabilities.

  • fintune_demo : Contains

  • PEFT (LORA, P-Tuning) fine-tuning code
  • SFT fine-tuning code
Friendly Links
  • LLaMA-Factory : Efficient open-source fine-tuning framework, already supports GLM-4-9B-Chat language model fine-tuning.
  • SWIFT : LLM/VLM training framework from ModelScope, supports GLM4-9B-Chat/GLM4v-9b-chat fine-tuning.
  • Xorbits Inference : Performance-enhanced and comprehensive global inference framework, easily deploy your own models or import cutting-edge open source models with one click.
  • self-llm : Datawhale’s self-llm project, which includes the GLM-4-9B open source model cookbook.
License
  • The use of GLM-4 model weights must follow the Model License .

  • The code in this open source repository follows the Apache 2.0 license.

Please strictly follow the open source license.

Reference

If you find our work helpful, please consider citing the following paper.

@inproceedings{zeng2022glm,
  title={{GLM-130B:} An Open Bilingual Pre-trained Model},
  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
  booktitle={The Eleventh International Conference on Learning Representations,
                  {ICLR} 2023, Kigali, Rwanda, May 1-5, 2023},
  year= {2023},
}
@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}
@misc{wang2023cogvlm,
      title={CogVLM: Visual Expert for Pretrained Language Models}, 
      author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
      year={2023},
      eprint={2311.03079},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Pricing of glm-4v-9b replicate.com

Run time and cost

This model costs approximately $0.26 to run on Replicate, or 3 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker .

This model runs on Nvidia A100 (80GB) GPU hardware . Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Runs of cuuupid glm-4v-9b on replicate.com

75.6K
Total runs
100
24-hour runs
500
3-day runs
1.1K
7-day runs
4.3K
30-day runs

More Information About glm-4v-9b replicate.com Model

glm-4v-9b replicate.com

glm-4v-9b replicate.com is an AI model on replicate.com that provides glm-4v-9b's model effect (GLM-4V is a multimodal model released by Tsinghua University that is competitive with GPT-4o and establishes a new SOTA on several benchmarks, including OCR.), which can be used instantly with this cuuupid glm-4v-9b model. replicate.com supports a free trial of the glm-4v-9b model, and also provides paid use of the glm-4v-9b. Support call glm-4v-9b model through api, including Node.js, Python, http.

cuuupid glm-4v-9b online free

glm-4v-9b replicate.com is an online trial and call api platform, which integrates glm-4v-9b's modeling effects, including api services, and provides a free online trial of glm-4v-9b, you can try glm-4v-9b online for free by clicking the link below.

cuuupid glm-4v-9b online free url in replicate.com:

https://replicate.com/cuuupid/glm-4v-9b

glm-4v-9b install

glm-4v-9b is an open source model from GitHub that offers a free installation service, and any user can find glm-4v-9b on GitHub to install. At the same time, replicate.com provides the effect of glm-4v-9b install, users can directly use glm-4v-9b installed effect in replicate.com for debugging and trial. It also supports api for free installation.

glm-4v-9b install url in replicate.com:

https://replicate.com/cuuupid/glm-4v-9b

glm-4v-9b install url in github:

https://github.com/THUDM/GLM-4

Url of glm-4v-9b

glm-4v-9b replicate.com Url

glm-4v-9b Owner Github

Provider of glm-4v-9b replicate.com

Other API from cuuupid

replicate

Best-in-class clothing virtual try on in the wild (non-commercial use only)

Total runs: 534.5K
Run Growth: 66.6K
Growth Rate: 12.52%
Updated: August 25 2024
replicate

Embed text with Qwen2-7b-Instruct

Total runs: 244.3K
Run Growth: 95.3K
Growth Rate: 42.30%
Updated: August 06 2024
replicate

Convert scanned or electronic documents to markdown, very very very fast

Total runs: 2.3K
Run Growth: 100
Growth Rate: 4.35%
Updated: December 08 2023
replicate

Generate high quality videos from a prompt

Total runs: 1.6K
Run Growth: 100
Growth Rate: 6.25%
Updated: August 28 2024
replicate

Flux finetuned for black and white line art.

Total runs: 1.4K
Run Growth: 300
Growth Rate: 23.08%
Updated: August 23 2024
replicate

SDXL finetuned on line art

Total runs: 1.1K
Run Growth: 0
Growth Rate: 0.00%
Updated: June 06 2024
replicate

Microsoft's tool to convert Office documents, PDFs, images, audio, and more to LLM-ready markdown.

Total runs: 836
Run Growth: 55
Growth Rate: 75.34%
Updated: January 18 2025
replicate

Translate audio while keeping the original style, pronunciation and tone of your original audio.

Total runs: 767
Run Growth: 132
Growth Rate: 17.21%
Updated: December 06 2023
replicate

SOTA open-source model for chatting with videos and the newest model in the Qwen family

Total runs: 437
Run Growth: 27
Growth Rate: 6.19%
Updated: August 31 2024
replicate

F5-TTS, a new state-of-the-art in open source voice cloning

Total runs: 171
Run Growth: 0
Growth Rate: 0.00%
Updated: October 14 2024
replicate

Finetuned E5 embeddings for instruct based on Mistral.

Total runs: 131
Run Growth: 0
Growth Rate: 0.00%
Updated: February 04 2024
replicate

MiniCPM LLama3-V 2.5, a new SOTA open-source VLM that surpasses GPT-4V-1106 and Phi-128k on a number of benchmarks.

Total runs: 127
Run Growth: 0
Growth Rate: 0.00%
Updated: June 05 2024
replicate

Llama-3-8B finetuned with ReFT to hyperfocus on New Jersey, the Garden State, the best state, the only state!

Total runs: 105
Run Growth: 0
Growth Rate: 0.00%
Updated: June 04 2024
replicate

make meow emojis!

Total runs: 68
Run Growth: 7
Growth Rate: 10.29%
Updated: January 12 2024
replicate

An example using Garden State Llama to ReFT on the Golden Gate bridge.

Total runs: 30
Run Growth: 0
Growth Rate: 0.00%
Updated: June 04 2024