Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual features followed by large language models (LLMs) for visual-language tasks. However, the vision encoders set a strong inductive bias in abstracting visual representation, e.g., resolution, aspect ratio, and semantic priors, which could impede the flexibility and efficiency of the VLMs. Training pure VLMs that accept the seamless vision and language inputs, i.e., without vision encoders, remains challenging and rarely explored. Empirical observations reveal that direct training without encoders results in slow convergence and large performance gaps. In this work, we bridge the gap between encoder-based and encoder-free models, and present a simple yet effective training recipe towards pure VLMs. Specifically, we unveil the key aspects of training encoder-free VLMs efficiently via thorough experiments: (1) Bridging vision-language representation inside one unified decoder; (2) Enhancing visual recognition capability via extra supervision. With these strategies, we launch EVE, an encoder-free vision-language model that can be trained and forwarded efficiently. Notably, solely utilizing 35M publicly accessible data, EVE can impressively rival the encoder-based VLMs of similar capacities across multiple vision-language benchmarks. It significantly outperforms the counterpart Fuyu-8B with mysterious training procedures and undisclosed training data. We believe that EVE provides a transparent and efficient route for developing a pure decoder-only architecture across modalities.
Model Weights
We release the pretrained and instruction-tuned weights of
EVE
.
EVE-7B-HD-v1.0 huggingface.co is an AI model on huggingface.co that provides EVE-7B-HD-v1.0's model effect (), which can be used instantly with this BAAI EVE-7B-HD-v1.0 model. huggingface.co supports a free trial of the EVE-7B-HD-v1.0 model, and also provides paid use of the EVE-7B-HD-v1.0. Support call EVE-7B-HD-v1.0 model through api, including Node.js, Python, http.
EVE-7B-HD-v1.0 huggingface.co is an online trial and call api platform, which integrates EVE-7B-HD-v1.0's modeling effects, including api services, and provides a free online trial of EVE-7B-HD-v1.0, you can try EVE-7B-HD-v1.0 online for free by clicking the link below.
BAAI EVE-7B-HD-v1.0 online free url in huggingface.co:
EVE-7B-HD-v1.0 is an open source model from GitHub that offers a free installation service, and any user can find EVE-7B-HD-v1.0 on GitHub to install. At the same time, huggingface.co provides the effect of EVE-7B-HD-v1.0 install, users can directly use EVE-7B-HD-v1.0 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.