Video-LLaVA-7B-hf huggingface.co api & LanguageBind Video-LLaVA-7B-hf github AI Model

Introduction of Video-LLaVA-7B-hf

Model Details of Video-LLaVA-7B-hf

Model Card for Video-LLaVa

Model Details

Model type: Video-LLaVA is an open-source multomodal model trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: lmsys/vicuna-13b-v1.5

Model Description: The model can generate interleaving images and videos, despite the absence of image-video pairs in the dataset. Video-LLaVa is uses an encoder trained for unified visual representation through alignment prior to projection. Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.

drawing

VideoLLaVa example. Taken from the original paper.

Paper or resources for more information: https://github.com/PKU-YuanGroup/Video-LLaVA

🗝️ Training Dataset

The images pretraining dataset is from LLaVA .
The images tuning dataset is from LLaVA .
The videos pretraining dataset is from Valley .
The videos tuning dataset is from Video-ChatGPT .

How to Get Started with the Model

Use the code below to get started with the model.

from PIL import Image
import requests
import numpy as np
import av
from huggingface_hub import hf_hub_download
from transformers import VideoLlavaProcessor, VideoLlavaForConditionalGeneration

def read_video_pyav(container, indices):
    '''
    Decode the video with PyAV decoder.

    Args:
        container (av.container.input.InputContainer): PyAV container.
        indices (List[int]): List of frame indices to decode.

    Returns:
        np.ndarray: np array of decoded frames of shape (num_frames, height, width, 3).
    '''
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

model = VideoLlavaForConditionalGeneration.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")
processor = VideoLlavaProcessor.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")

prompt = "USER: <video>Why is this video funny? ASSISTANT:"
video_path = hf_hub_download(repo_id="raushan-testing-hf/videos-test", filename="sample_demo_1.mp4", repo_type="dataset")
container = av.open(video_path)

# sample uniformly 8 frames from the video
total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 8).astype(int)
clip = read_video_pyav(container, indices)

inputs = processor(text=prompt, videos=clip, return_tensors="pt")

# Generate
generate_ids = model.generate(**inputs, max_length=80)
print(processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
>>> 'USER:  Why is this video funny? ASSISTANT: The video is funny because the baby is sitting on the bed and reading a book, which is an unusual and amusing sight.Ъ'

# Generate from images and videos mix
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
prompt = [
    "USER: <image> How many cats are there in the image? ASSISTANT:",
    "USER: <video>Why is this video funny? ASSISTANT:"
]
inputs = processor(text=prompt, images=image, videos=clip, padding=True, return_tensors="pt")

# Generate
generate_ids = model.generate(**inputs, max_length=50)
print(processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True))
>>> ['USER:   How many cats are there in the image? ASSISTANT: There are two cats in the image.\nHow many cats are sleeping on the couch?\nThere are', 'USER:  Why is this video funny? ASSISTANT: The video is funny because the baby is sitting on the bed and reading a book, which is an unusual and amusing']

👍 Acknowledgement

LLaVA The codebase we built upon and it is an efficient large language and vision assistant.
Video-ChatGPT Great job contributing the evaluation code and dataset.

🔒 License

The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
The service is a research preview intended for non-commercial use only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us if you find any potential violation.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.

@article{lin2023video,
  title={Video-LLaVA: Learning United Visual Representation by Alignment Before Projection},
  author={Lin, Bin and Zhu, Bin and Ye, Yang and Ning, Munan and Jin, Peng and Yuan, Li},
  journal={arXiv preprint arXiv:2311.10122},
  year={2023}
}

@article{zhu2023languagebind,
  title={LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment},
  author={Zhu, Bin and Lin, Bin and Ning, Munan and Yan, Yang and Cui, Jiaxi and Wang, HongFa and Pang, Yatian and Jiang, Wenhao and Zhang, Junwu and Li, Zongwei and others},
  journal={arXiv preprint arXiv:2310.01852},
  year={2023}
}

Runs of LanguageBind Video-LLaVA-7B-hf on huggingface.co

22.1K

Total runs

24-hour runs

496

3-day runs

2.5K

7-day runs

7.5K

30-day runs

More Information About Video-LLaVA-7B-hf huggingface.co Model

Video-LLaVA-7B-hf huggingface.co

Video-LLaVA-7B-hf huggingface.co is an AI model on huggingface.co that provides Video-LLaVA-7B-hf's model effect (), which can be used instantly with this LanguageBind Video-LLaVA-7B-hf model. huggingface.co supports a free trial of the Video-LLaVA-7B-hf model, and also provides paid use of the Video-LLaVA-7B-hf. Support call Video-LLaVA-7B-hf model through api, including Node.js, Python, http.

Video-LLaVA-7B-hf huggingface.co Url

https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf

LanguageBind Video-LLaVA-7B-hf online free

Video-LLaVA-7B-hf huggingface.co is an online trial and call api platform, which integrates Video-LLaVA-7B-hf's modeling effects, including api services, and provides a free online trial of Video-LLaVA-7B-hf, you can try Video-LLaVA-7B-hf online for free by clicking the link below.

LanguageBind Video-LLaVA-7B-hf online free url in huggingface.co:

https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf

Video-LLaVA-7B-hf install

Video-LLaVA-7B-hf is an open source model from GitHub that offers a free installation service, and any user can find Video-LLaVA-7B-hf on GitHub to install. At the same time, huggingface.co provides the effect of Video-LLaVA-7B-hf install, users can directly use Video-LLaVA-7B-hf installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

Video-LLaVA-7B-hf install url in huggingface.co:

https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf

huggingface.co

LanguageBind/LanguageBind_Image

Total runs: 203.9K

Run Growth: 143.0K

Growth Rate: 70.11%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Video_merge

Total runs: 186.5K

Run Growth: 124.5K

Growth Rate: 66.75%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/Video-LLaVA-7B

Total runs: 24.1K

Run Growth: 19.8K

Growth Rate: 82.29%

Updated: Abril 09 2024

huggingface.co

LanguageBind/LanguageBind_Video_FT

Total runs: 16.5K

Run Growth: -33.3K

Growth Rate: -202.08%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Audio_FT

Total runs: 5.1K

Run Growth: 1.3K

Growth Rate: 25.09%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Video_V1.5_FT

Total runs: 2.0K

Run Growth: 648

Growth Rate: 32.73%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-StableLM-1.6B-4e

Total runs: 1.6K

Run Growth: -2.0K

Growth Rate: -122.14%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-Phi2-2.7B-4e

Total runs: 518

Run Growth: 348

Growth Rate: 67.18%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Thermal

Total runs: 508

Run Growth: 318

Growth Rate: 62.60%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Depth

Total runs: 454

Run Growth: 264

Growth Rate: 58.15%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Audio

Total runs: 435

Run Growth: 186

Growth Rate: 42.76%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-Qwen-1.8B-4e

Total runs: 406

Run Growth: 180

Growth Rate: 44.33%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Video

Total runs: 401

Run Growth: 210

Growth Rate: 52.37%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/LanguageBind_Video_Huge_V1.5_FT

Total runs: 389

Run Growth: -140

Growth Rate: -35.99%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-Phi2-2.7B-4e-384

Total runs: 276

Run Growth: -701

Growth Rate: -253.99%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-Qwen-Stage2

Total runs: 116

Run Growth: 98

Growth Rate: 84.48%

Updated: Marzo 16 2024

huggingface.co

LanguageBind/MoE-LLaVA-Phi2-Stage2

Total runs: 77

Run Growth: 69

Growth Rate: 89.61%

Updated: Marzo 16 2024

huggingface.co

LanguageBind/MoE-LLaVA-StableLM-1.6B-4e-384

Total runs: 67

Run Growth: -10

Growth Rate: -14.93%

Updated: Febrero 03 2024

huggingface.co

LanguageBind/MoE-LLaVA-Phi2-Stage2-384

Total runs: 58

Run Growth: 48

Growth Rate: 82.76%

Updated: Marzo 16 2024

huggingface.co

LanguageBind/MoE-LLaVA-StableLM-Stage2

Total runs: 50

Run Growth: 41

Growth Rate: 82.00%

Updated: Marzo 16 2024

huggingface.co

LanguageBind/MoE-LLaVA-StableLM-Stage2-384

Total runs: 50

Run Growth: 41

Growth Rate: 82.00%

Updated: Marzo 16 2024

huggingface.co

LanguageBind/Video-LLaVA-Pretrain-7B

Total runs: 40

Run Growth: 22

Growth Rate: 55.00%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-Qwen-Pretrain

Total runs: 21

Run Growth: -4

Growth Rate: -19.05%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-OpenChat-7B-4e

Total runs: 19

Run Growth: 7

Growth Rate: 36.84%

Updated: Febrero 02 2024

huggingface.co

LanguageBind/MoE-LLaVA-StableLM-Pretrain

Total runs: 18

Run Growth: 10

Growth Rate: 55.56%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-Phi2-384-Pretrain

Total runs: 17

Run Growth: 7

Growth Rate: 41.18%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/MoE-LLaVA-Phi2-Pretrain

Total runs: 16

Run Growth: 8

Growth Rate: 50.00%

Updated: Febrero 01 2024

huggingface.co

LanguageBind/Open-Sora-Plan-v1.3.0

Total runs: 8

Run Growth: 5

Growth Rate: 62.50%

Updated: Diciembre 05 2024

huggingface.co

LanguageBind/MoE-LLaVA-StableLM-384-Pretrain

Total runs: 7

Run Growth: 2

Growth Rate: 28.57%

Updated: Marzo 16 2024

huggingface.co

LanguageBind/Open-Sora-Plan-v1.0.0

Total runs: 4

Run Growth: 2

Growth Rate: 50.00%

Updated: Abril 07 2024

huggingface.co

LanguageBind/Open-Sora-Plan-v1.1.0

Total runs: 3

Run Growth: 2

Growth Rate: 66.67%

Updated: Mayo 27 2024

huggingface.co

LanguageBind/Open-Sora-Plan-v1.2.0

Total runs: 2

Run Growth: 0

Growth Rate: 0.00%

Updated: Septiembre 07 2024

huggingface.co

LanguageBind/Video-LLaVA-V1.5

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Noviembre 26 2023

huggingface.co

LanguageBind/offline_feature

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Enero 29 2025

huggingface.co

LanguageBind/LanguageBind_Audio_V1.5

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: Febrero 01 2024

LanguageBind / Video-LLaVA-7B-hf

Introduction of Video-LLaVA-7B-hf

Model Details of Video-LLaVA-7B-hf

Model Card for Video-LLaVa

Model Details

🗝️ Training Dataset

How to Get Started with the Model

👍 Acknowledgement

🔒 License

✏️ Citation

Runs of LanguageBind Video-LLaVA-7B-hf on huggingface.co

More Information About Video-LLaVA-7B-hf huggingface.co Model

Video-LLaVA-7B-hf huggingface.co

Video-LLaVA-7B-hf huggingface.co Url

LanguageBind Video-LLaVA-7B-hf online free

LanguageBind Video-LLaVA-7B-hf online free url in huggingface.co:

Video-LLaVA-7B-hf install

Video-LLaVA-7B-hf install url in huggingface.co:

Url of Video-LLaVA-7B-hf

Video-LLaVA-7B-hf huggingface.co Url

Provider of Video-LLaVA-7B-hf huggingface.co

Other API from LanguageBind