If you like our project, please give us a star ⭐ on GitHub for latest update.
📰 News
[2024.01.27]
👀👀👀 Our
MoE-LLaVA
is released! A sparse model with 3B parameters outperformed the dense model with 7B parameters.
[2024.01.17]
🔥🔥🔥 Our
LanguageBind
has been accepted at ICLR 2024!
[2024.01.16]
🔥🔥🔥 We reorganize the code and support LoRA fine-tuning, checking
finetune_lora.sh
.
[2023.11.30]
🤝 Thanks to the generous contributions of the community, the
OpenXLab's demo
is now accessible.
[2023.11.23]
We are training a new and powerful model.
[2023.11.21]
🤝 Check out the
replicate demo
, created by
@nateraw
, who has generously supported our research!
[2023.11.20]
🤗
Hugging Face demo
and
all codes & datasets
are available now! Welcome to
watch
👀 this repository for the latest updates.
😮 Highlights
Video-LLaVA exhibits remarkable interactive capabilities between images and videos, despite the absence of image-video pairs in the dataset.
💡 Simple baseline, learning united visual representation by alignment before projection
With
the binding of unified visual representations to the language feature space
, we enable an LLM to perform visual reasoning capabilities on both images and videos simultaneously.
🔥 High performance, complementary learning with video and image
Extensive experiments demonstrate
the complementarity of modalities
, showcasing significant superiority when compared to models specifically designed for either images or videos.
🤗 Demo
Gradio Web UI
Highly recommend trying out our web demo by the following command, which incorporates all features currently supported by Video-LLaVA. We also provide
online demo
in Huggingface Spaces.
LLaVA
The codebase we built upon and it is an efficient large language and vision assistant.
Video-ChatGPT
Great job contributing the evaluation code and dataset.
🙌 Related Projects
LanguageBind
An open source five modalities language-based retrieval framework.
Chat-UniVi
This framework empowers the model to efficiently utilize a limited number of visual tokens.
🔒 License
The majority of this project is released under the Apache 2.0 license as found in the
LICENSE
file.
The service is a research preview intended for non-commercial use only, subject to the model
License
of LLaMA,
Terms of Use
of the data generated by OpenAI, and
Privacy Practices
of ShareGPT. Please contact us if you find any potential violation.
✏️ Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
@article{lin2023video,
title={Video-LLaVA: Learning United Visual Representation by Alignment Before Projection},
author={Lin, Bin and Zhu, Bin and Ye, Yang and Ning, Munan and Jin, Peng and Yuan, Li},
journal={arXiv preprint arXiv:2311.10122},
year={2023}
}
@article{zhu2023languagebind,
title={LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment},
author={Zhu, Bin and Lin, Bin and Ning, Munan and Yan, Yang and Cui, Jiaxi and Wang, HongFa and Pang, Yatian and Jiang, Wenhao and Zhang, Junwu and Li, Zongwei and others},
journal={arXiv preprint arXiv:2310.01852},
year={2023}
}
✨ Star History
🤝 Contributors
Runs of LanguageBind Video-LLaVA-7B on huggingface.co
18.7K
Total runs
0
24-hour runs
89
3-day runs
4.9K
7-day runs
6.4K
30-day runs
More Information About Video-LLaVA-7B huggingface.co Model
Video-LLaVA-7B huggingface.co is an AI model on huggingface.co that provides Video-LLaVA-7B's model effect (), which can be used instantly with this LanguageBind Video-LLaVA-7B model. huggingface.co supports a free trial of the Video-LLaVA-7B model, and also provides paid use of the Video-LLaVA-7B. Support call Video-LLaVA-7B model through api, including Node.js, Python, http.
Video-LLaVA-7B huggingface.co is an online trial and call api platform, which integrates Video-LLaVA-7B's modeling effects, including api services, and provides a free online trial of Video-LLaVA-7B, you can try Video-LLaVA-7B online for free by clicking the link below.
LanguageBind Video-LLaVA-7B online free url in huggingface.co:
Video-LLaVA-7B is an open source model from GitHub that offers a free installation service, and any user can find Video-LLaVA-7B on GitHub to install. At the same time, huggingface.co provides the effect of Video-LLaVA-7B install, users can directly use Video-LLaVA-7B installed effect in huggingface.co for debugging and trial. It also supports api for free installation.