mPLUG / mPLUG-Owl3-7B-240728

huggingface.co
Total runs: 16.9K
24-hour runs: 83
7-day runs: 2.6K
30-day runs: 14.9K
Model's Last Updated: Septiembre 29 2024
image-text-to-text

Introduction of mPLUG-Owl3-7B-240728

Model Details of mPLUG-Owl3-7B-240728

mPLUG-Owl3

Introduction

mPLUG-Owl3 is a state-of-the-art multi-modal large language model designed to tackle the challenges of long image sequence understanding. We propose Hyper Attention, which boosts the speed of long visual sequence understanding in multimodal large language models by sixfold, allowing for processing of visual sequences that are eight times longer. Meanwhile, we maintain excellent performance on single-image, multi-image, and video tasks.

Github: mPLUG-Owl

Quickstart

Load the mPLUG-Owl3. We now only support attn_implementation in ['sdpa', 'flash_attention_2'] .

import torch
model_path = 'mPLUG/mPLUG-Owl3-7B-240728'
config = mPLUGOwl3Config.from_pretrained(model_path)
print(config)
# model = mPLUGOwl3Model(config).cuda().half()
model = mPLUGOwl3Model.from_pretrained(model_path, attn_implementation='sdpa', torch_dtype=torch.half)
model.eval().cuda()

Chat with images.

from PIL import Image

from transformers import AutoTokenizer, AutoProcessor
from decord import VideoReader, cpu    # pip install decord
model_path = 'mPLUG/mPLUG-Owl3-7B-240728'
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = model.init_processor(tokenizer)

image = Image.new('RGB', (500, 500), color='red')

messages = [
    {"role": "user", "content": """<|image|>
Describe this image."""},
    {"role": "assistant", "content": ""}
]

inputs = processor(messages, images=image, videos=None)

inputs.to('cuda')
inputs.update({
    'tokenizer': tokenizer,
    'max_new_tokens':100,
    'decode_text':True,
})


g = model.generate(**inputs)
print(g)

Chat with a video.

from PIL import Image

from transformers import AutoTokenizer, AutoProcessor
from decord import VideoReader, cpu    # pip install decord
model_path = 'mPLUG/mPLUG-Owl3-7B-240728'
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = model.init_processor(tokenizer)


messages = [
    {"role": "user", "content": """<|video|>
Describe this video."""},
    {"role": "assistant", "content": ""}
]

videos = ['/nas-mmu-data/examples/car_room.mp4']

MAX_NUM_FRAMES=16

def encode_video(video_path):
    def uniform_sample(l, n):
        gap = len(l) / n
        idxs = [int(i * gap + gap / 2) for i in range(n)]
        return [l[i] for i in idxs]

    vr = VideoReader(video_path, ctx=cpu(0))
    sample_fps = round(vr.get_avg_fps() / 1)  # FPS
    frame_idx = [i for i in range(0, len(vr), sample_fps)]
    if len(frame_idx) > MAX_NUM_FRAMES:
        frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES)
    frames = vr.get_batch(frame_idx).asnumpy()
    frames = [Image.fromarray(v.astype('uint8')) for v in frames]
    print('num frames:', len(frames))
    return frames
video_frames = [encode_video(_) for _ in videos]
inputs = processor(messages, images=None, videos=video_frames)

inputs.to('cuda')
inputs.update({
    'tokenizer': tokenizer,
    'max_new_tokens':100,
    'decode_text':True,
})


g = model.generate(**inputs)
print(g)
Citation

If you find our work helpful, feel free to give us a cite.

@misc{ye2024mplugowl3longimagesequenceunderstanding,
      title={mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models}, 
      author={Jiabo Ye and Haiyang Xu and Haowei Liu and Anwen Hu and Ming Yan and Qi Qian and Ji Zhang and Fei Huang and Jingren Zhou},
      year={2024},
      eprint={2408.04840},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.04840}, 
}

Runs of mPLUG mPLUG-Owl3-7B-240728 on huggingface.co

16.9K
Total runs
83
24-hour runs
176
3-day runs
2.6K
7-day runs
14.9K
30-day runs

More Information About mPLUG-Owl3-7B-240728 huggingface.co Model

More mPLUG-Owl3-7B-240728 license Visit here:

https://choosealicense.com/licenses/apache-2.0

mPLUG-Owl3-7B-240728 huggingface.co

mPLUG-Owl3-7B-240728 huggingface.co is an AI model on huggingface.co that provides mPLUG-Owl3-7B-240728's model effect (), which can be used instantly with this mPLUG mPLUG-Owl3-7B-240728 model. huggingface.co supports a free trial of the mPLUG-Owl3-7B-240728 model, and also provides paid use of the mPLUG-Owl3-7B-240728. Support call mPLUG-Owl3-7B-240728 model through api, including Node.js, Python, http.

mPLUG-Owl3-7B-240728 huggingface.co Url

https://huggingface.co/mPLUG/mPLUG-Owl3-7B-240728

mPLUG mPLUG-Owl3-7B-240728 online free

mPLUG-Owl3-7B-240728 huggingface.co is an online trial and call api platform, which integrates mPLUG-Owl3-7B-240728's modeling effects, including api services, and provides a free online trial of mPLUG-Owl3-7B-240728, you can try mPLUG-Owl3-7B-240728 online for free by clicking the link below.

mPLUG mPLUG-Owl3-7B-240728 online free url in huggingface.co:

https://huggingface.co/mPLUG/mPLUG-Owl3-7B-240728

mPLUG-Owl3-7B-240728 install

mPLUG-Owl3-7B-240728 is an open source model from GitHub that offers a free installation service, and any user can find mPLUG-Owl3-7B-240728 on GitHub to install. At the same time, huggingface.co provides the effect of mPLUG-Owl3-7B-240728 install, users can directly use mPLUG-Owl3-7B-240728 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

mPLUG-Owl3-7B-240728 install url in huggingface.co:

https://huggingface.co/mPLUG/mPLUG-Owl3-7B-240728

Url of mPLUG-Owl3-7B-240728

mPLUG-Owl3-7B-240728 huggingface.co Url

Provider of mPLUG-Owl3-7B-240728 huggingface.co

mPLUG
ORGANIZATIONS

Other API from mPLUG

huggingface.co

Total runs: 998
Run Growth: -1.5K
Growth Rate: -151.00%
Updated: Septiembre 27 2024
huggingface.co

Total runs: 318
Run Growth: -10.2K
Growth Rate: -3219.50%
Updated: Abril 26 2024
huggingface.co

Total runs: 87
Run Growth: 60
Growth Rate: 68.97%
Updated: Abril 10 2024
huggingface.co

Total runs: 65
Run Growth: 19
Growth Rate: 29.23%
Updated: Abril 10 2024
huggingface.co

Total runs: 56
Run Growth: 19
Growth Rate: 33.93%
Updated: Abril 10 2024