cuuupid / minicpm-llama3-v-2.5

MiniCPM LLama3-V 2.5, a new SOTA open-source VLM that surpasses GPT-4V-1106 and Phi-128k on a number of benchmarks.

replicate.com
Total runs: 127
24-hour runs: 0
7-day runs: 0
30-day runs: 0
Github
Model's Last Updated: June 04 2024

Introduction of minicpm-llama3-v-2.5

Model Details of minicpm-llama3-v-2.5

Readme

All credit to OpenBMB, check this model out on GitHub !

MiniCPM-Llama3-V 2.5

MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:

  • 🔥 Leading Performance. MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max and greatly outperforms other Llama 3-based MLLMs.

  • 💪 Strong OCR Capabilities. MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a 700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro . Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.

  • 🏆 Trustworthy Behavior. Leveraging the latest RLAIF-V method (the newest technique in the RLHF-V [CVPR‘24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a 10.3% hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. Data released .

  • 🌏 Multilingual Support. Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from VisCPM , MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to over 30 languages including German, French, Spanish, Italian, Korean etc. All Supported Languages .

  • 🚀 Efficient Deployment. MiniCPM-Llama3-V 2.5 systematically employs model quantization, CPU optimizations, NPU optimizations and compilation optimizations , achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a 150x acceleration in end-side MLLM image encoding and a 3x speedup in language decoding .

  • 💫 Easy Usage. MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) llama.cpp and ollama support for efficient CPU inference on local devices, (2) GGUF format quantized models in 16 sizes, (3) efficient LoRA fine-tuning with only 2 V100 GPUs, (4) streaming output , (5) quick local WebUI demo setup with Gradio and Streamlit , and (6) interactive demos on HuggingFace Spaces .

Citation

If you find our model/code/paper helpful, please consider cite our papers 📝 and star us ⭐️!

@article{yu2023rlhf,
  title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback},
  author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others},
  journal={arXiv preprint arXiv:2312.00849},
  year={2023}
}
@article{viscpm,
    title={Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages}, 
    author={Jinyi Hu and Yuan Yao and Chongyi Wang and Shan Wang and Yinxu Pan and Qianyu Chen and Tianyu Yu and Hanghao Wu and Yue Zhao and Haoye Zhang and Xu Han and Yankai Lin and Jiao Xue and Dahai Li and Zhiyuan Liu and Maosong Sun},
    journal={arXiv preprint arXiv:2308.12038},
    year={2023}
}
@article{xu2024llava-uhd,
  title={{LLaVA-UHD}: an LMM Perceiving Any Aspect Ratio and High-Resolution Images},
  author={Xu, Ruyi and Yao, Yuan and Guo, Zonghao and Cui, Junbo and Ni, Zanlin and Ge, Chunjiang and Chua, Tat-Seng and Liu, Zhiyuan and Huang, Gao},
  journal={arXiv preprint arXiv:2403.11703},
  year={2024}
}
@article{yu2024rlaifv,
  title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness}, 
  author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
  journal={arXiv preprint arXiv:2405.17220},
  year={2024}
}

Pricing of minicpm-llama3-v-2.5 replicate.com

Run time and cost

This model runs on Nvidia T4 GPU hardware . We don't yet have enough runs of this model to provide performance information.

Runs of cuuupid minicpm-llama3-v-2.5 on replicate.com

127
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs

More Information About minicpm-llama3-v-2.5 replicate.com Model

More minicpm-llama3-v-2.5 license Visit here:

https://github.com/OpenBMB/MiniCPM-V/blob/main/LICENSE

minicpm-llama3-v-2.5 replicate.com

minicpm-llama3-v-2.5 replicate.com is an AI model on replicate.com that provides minicpm-llama3-v-2.5's model effect (MiniCPM LLama3-V 2.5, a new SOTA open-source VLM that surpasses GPT-4V-1106 and Phi-128k on a number of benchmarks.), which can be used instantly with this cuuupid minicpm-llama3-v-2.5 model. replicate.com supports a free trial of the minicpm-llama3-v-2.5 model, and also provides paid use of the minicpm-llama3-v-2.5. Support call minicpm-llama3-v-2.5 model through api, including Node.js, Python, http.

minicpm-llama3-v-2.5 replicate.com Url

https://replicate.com/cuuupid/minicpm-llama3-v-2.5

cuuupid minicpm-llama3-v-2.5 online free

minicpm-llama3-v-2.5 replicate.com is an online trial and call api platform, which integrates minicpm-llama3-v-2.5's modeling effects, including api services, and provides a free online trial of minicpm-llama3-v-2.5, you can try minicpm-llama3-v-2.5 online for free by clicking the link below.

cuuupid minicpm-llama3-v-2.5 online free url in replicate.com:

https://replicate.com/cuuupid/minicpm-llama3-v-2.5

minicpm-llama3-v-2.5 install

minicpm-llama3-v-2.5 is an open source model from GitHub that offers a free installation service, and any user can find minicpm-llama3-v-2.5 on GitHub to install. At the same time, replicate.com provides the effect of minicpm-llama3-v-2.5 install, users can directly use minicpm-llama3-v-2.5 installed effect in replicate.com for debugging and trial. It also supports api for free installation.

minicpm-llama3-v-2.5 install url in replicate.com:

https://replicate.com/cuuupid/minicpm-llama3-v-2.5

minicpm-llama3-v-2.5 install url in github:

https://github.com/OpenBMB/MiniCPM-V

Url of minicpm-llama3-v-2.5

minicpm-llama3-v-2.5 replicate.com Url

minicpm-llama3-v-2.5 Github

minicpm-llama3-v-2.5 Owner Github

Provider of minicpm-llama3-v-2.5 replicate.com

Other API from cuuupid

replicate

Best-in-class clothing virtual try on in the wild (non-commercial use only)

Total runs: 581.3K
Run Growth: 65.2K
Growth Rate: 11.26%
Updated: August 24 2024
replicate

Embed text with Qwen2-7b-Instruct

Total runs: 337.6K
Run Growth: 155.8K
Growth Rate: 46.48%
Updated: August 06 2024
replicate

GLM-4V is a multimodal model released by Tsinghua University that is competitive with GPT-4o and establishes a new SOTA on several benchmarks, including OCR.

Total runs: 76.9K
Run Growth: 2.9K
Growth Rate: 3.77%
Updated: July 02 2024
replicate

Microsoft's tool to convert Office documents, PDFs, images, audio, and more to LLM-ready markdown.

Total runs: 3.8K
Run Growth: 3.1K
Growth Rate: 85.83%
Updated: January 17 2025
replicate

Convert scanned or electronic documents to markdown, very very very fast

Total runs: 2.3K
Run Growth: 0
Growth Rate: 0.00%
Updated: December 07 2023
replicate

Generate high quality videos from a prompt

Total runs: 1.7K
Run Growth: 100
Growth Rate: 5.88%
Updated: August 27 2024
replicate

Flux finetuned for black and white line art.

Total runs: 1.4K
Run Growth: 100
Growth Rate: 7.14%
Updated: August 23 2024
replicate

SDXL finetuned on line art

Total runs: 1.1K
Run Growth: 0
Growth Rate: 0.00%
Updated: June 05 2024
replicate

Translate audio while keeping the original style, pronunciation and tone of your original audio.

Total runs: 767
Run Growth: 70
Growth Rate: 9.13%
Updated: December 06 2023
replicate

SOTA open-source model for chatting with videos and the newest model in the Qwen family

Total runs: 448
Run Growth: 21
Growth Rate: 4.69%
Updated: August 31 2024
replicate

F5-TTS, a new state-of-the-art in open source voice cloning

Total runs: 171
Run Growth: 0
Growth Rate: 0.00%
Updated: October 14 2024
replicate

Zonos-v0.1 beta, a SOTA text-to-speech Transformer model with extraordinary expressive range, built by Zyphra.

Total runs: 164
Run Growth: 93
Growth Rate: 56.71%
Updated: February 11 2025
replicate

Finetuned E5 embeddings for instruct based on Mistral.

Total runs: 131
Run Growth: 0
Growth Rate: 0.00%
Updated: February 03 2024
replicate

Llama-3-8B finetuned with ReFT to hyperfocus on New Jersey, the Garden State, the best state, the only state!

Total runs: 105
Run Growth: 0
Growth Rate: 0.00%
Updated: June 03 2024
replicate

make meow emojis!

Total runs: 68
Run Growth: 0
Growth Rate: 0.00%
Updated: January 11 2024
replicate

An example using Garden State Llama to ReFT on the Golden Gate bridge.

Total runs: 30
Run Growth: 0
Growth Rate: 0.00%
Updated: June 03 2024