[2024.01.27]
🤗
Hugging Face demo
and
all codes & datasets
are available now! Welcome to
watch
👀 this repository for the latest updates.
😮 Highlights
MoE-LLaVA shows excellent performance in multi-modal learning.
🔥 High performance, but with fewer parameters
with just
3B sparsely activated parameters
, MoE-LLaVA demonstrates performance comparable to the LLaVA-1.5-7B on various visual understanding datasets and even surpasses the LLaVA-1.5-13B in object hallucination benchmarks.
🚀 Simple baseline, learning multi-modal interactions with sparse pathways.
With the addition of
a simple MoE tuning stage
, we can complete the training of MoE-LLaVA on
8 V100 GPUs
within 2 days.
🤗 Demo
Gradio Web UI
Highly recommend trying out our web demo by the following command, which incorporates all features currently supported by MoE-LLaVA. We also provide
online demo
in Huggingface Spaces.
# use phi2
deepspeed --include localhost:0 moellava/serve/gradio_web_server.py --model-path "LanguageBind/MoE-LLaVA-Phi2-2.7B-4e"# use qwen
deepspeed --include localhost:0 moellava/serve/gradio_web_server.py --model-path "LanguageBind/MoE-LLaVA-Qwen-1.8B-4e"# use stablelm
deepspeed --include localhost:0 moellava/serve/gradio_web_server.py --model-path "LanguageBind/MoE-LLaVA-StableLM-1.6B-4e"
git clone https://github.com/PKU-YuanGroup/MoE-LLaVA
cd MoE-LLaVA
conda create -n moellava python=3.10 -y
conda activate moellava
pip install --upgrade pip # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
# Below are optional. For Qwen model.
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# Below are optional. Installing them might be slow.# pip install csrc/layer_norm# If the version of flash-attn is higher than 2.1.1, the following is not needed.# pip install csrc/rotary
Video-LLaVA
This framework empowers the model to efficiently utilize the united visual tokens.
LanguageBind
An open source five modalities language-based retrieval framework.
👍 Acknowledgement
LLaVA
The codebase we built upon and it is an efficient large language and vision assistant.
🔒 License
The majority of this project is released under the Apache 2.0 license as found in the
LICENSE
file.
The service is a research preview intended for non-commercial use only, subject to the model
License
of LLaMA,
Terms of Use
of the data generated by OpenAI, and
Privacy Practices
of ShareGPT. Please contact us if you find any potential violation.
✏️ Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
@misc{lin2024moellava,
title={MoE-LLaVA: Mixture of Experts for Large Vision-Language Models},
author={Bin Lin and Zhenyu Tang and Yang Ye and Jiaxi Cui and Bin Zhu and Peng Jin and Junwu Zhang and Munan Ning and Li Yuan},
year={2024},
eprint={2401.15947},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@article{lin2023video,
title={Video-LLaVA: Learning United Visual Representation by Alignment Before Projection},
author={Lin, Bin and Zhu, Bin and Ye, Yang and Ning, Munan and Jin, Peng and Yuan, Li},
journal={arXiv preprint arXiv:2311.10122},
year={2023}
}
✨ Star History
🤝 Contributors
Runs of LanguageBind MoE-LLaVA-Phi2-Pretrain on huggingface.co
9
Total runs
0
24-hour runs
1
3-day runs
1
7-day runs
3
30-day runs
More Information About MoE-LLaVA-Phi2-Pretrain huggingface.co Model
MoE-LLaVA-Phi2-Pretrain huggingface.co is an AI model on huggingface.co that provides MoE-LLaVA-Phi2-Pretrain's model effect (), which can be used instantly with this LanguageBind MoE-LLaVA-Phi2-Pretrain model. huggingface.co supports a free trial of the MoE-LLaVA-Phi2-Pretrain model, and also provides paid use of the MoE-LLaVA-Phi2-Pretrain. Support call MoE-LLaVA-Phi2-Pretrain model through api, including Node.js, Python, http.
MoE-LLaVA-Phi2-Pretrain huggingface.co is an online trial and call api platform, which integrates MoE-LLaVA-Phi2-Pretrain's modeling effects, including api services, and provides a free online trial of MoE-LLaVA-Phi2-Pretrain, you can try MoE-LLaVA-Phi2-Pretrain online for free by clicking the link below.
LanguageBind MoE-LLaVA-Phi2-Pretrain online free url in huggingface.co:
MoE-LLaVA-Phi2-Pretrain is an open source model from GitHub that offers a free installation service, and any user can find MoE-LLaVA-Phi2-Pretrain on GitHub to install. At the same time, huggingface.co provides the effect of MoE-LLaVA-Phi2-Pretrain install, users can directly use MoE-LLaVA-Phi2-Pretrain installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
MoE-LLaVA-Phi2-Pretrain install url in huggingface.co: