ai-forever / KandinskyVideo_1_1

huggingface.co
Total runs: 0
24-hour runs: 0
7-day runs: 0
30-day runs: 0
Model's Last Updated: May 27 2024

Introduction of KandinskyVideo_1_1

Model Details of KandinskyVideo_1_1

Kandinsky Video 1.1 — a new text-to-video generation model

SoTA quality among open-source solutions on EvalCrafter benchmark

This repository is the official implementation of Kandinsky Video 1.1 model.

Hugging Face Spaces | Telegram-bot | Habr post | Our text-to-image model

Our previous model Kandinsky Video 1.0 , divides the video generation process into two stages: initially generating keyframes at a low FPS and then creating interpolated frames between these keyframes to increase the FPS. In Kandinsky Video 1.1 , we further break down the keyframe generation into two extra steps: first, generating the initial frame of the video from the textual prompt using Text to Image Kandinsky 3.0 , and then generating the subsequent keyframes based on the textual prompt and the previously generated first frame. This approach ensures more consistent content across the frames and significantly enhances the overall video quality. Furthermore, the approach allows animating any input image as an additional feature.

Pipeline


In the Kandinsky Video 1.0 , the encoded text prompt enters the text-to-video U-Net3D keyframe generation model with temporal layers or blocks, and then the sampled latent keyframes are sent to the latent interpolation model to predict three interpolation frames between two keyframes. An image MoVQ-GAN decoder is used to obtain the final video result. In Kandinsky Video 1.1 , text-to-video U-Net3D is also conditioned on text-to-image U-Net2D, which helps to improve the content quality. A temporal MoVQ-GAN decoder is used to decode the final video.

Architecture details

  • Text encoder (Flan-UL2) - 8.6B
  • Latent Diffusion U-Net3D - 4.15B
  • The interpolation model (Latent Diffusion U-Net3D) - 4.0B
  • Image MoVQ encoder/decoder - 256M
  • Video (temporal) MoVQ decoder - 556M
How to use
1. text2video
from kandinsky_video import get_T2V_pipeline

device_map = 'cuda:0'
t2v_pipe = get_T2V_pipeline(device_map)

prompt = "A cat wearing sunglasses and working as a lifeguard at a pool."

fps = 'medium' # ['low', 'medium', 'high']
motion = 'high' # ['low', 'medium', 'high']

video = t2v_pipe(
    prompt,
    width=512, height=512, 
    fps=fps, 
    motion=motion,
    key_frame_guidance_scale=5.0,
    guidance_weight_prompt=5.0,
    guidance_weight_image=3.0,
)

path_to_save = f'./_assets__/video.gif'
video[0].save(
    path_to_save,
    save_all=True, append_images=video[1:], duration=int(5500/len(video)), loop=0
)


Generated video

2. image2video
from kandinsky_video import get_T2V_pipeline

device_map = 'cuda:0'
t2v_pipe = get_T2V_pipeline(device_map)

from PIL import Image
import requests
from io import BytesIO

url = 'https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-1961294831.jpg'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img.show()

prompt = "A panda climbs up a tree."

fps = 'medium' # ['low', 'medium', 'high']
motion = 'medium' # ['low', 'medium', 'high']

video = t2v_pipe(
    prompt,
    image=img,
    width=640, height=384, 
    fps=fps, 
    motion=motion,
    key_frame_guidance_scale=5.0,
    guidance_weight_prompt=5.0,
    guidance_weight_image=3.0,
)

path_to_save = f'./_assets__/video2.gif'
video[0].save(
    path_to_save,
    save_all=True, append_images=video[1:], duration=int(5500/len(video)), loop=0
)


Input image.


Generated Video.

Results


Kandinsky Video 1.1 achieves second place overall and best open source model on EvalCrafter text to video benchmark. VQ: visual quality, TVA: text-video alignment, MQ: motion quality, TC: temporal consistency and FAS: final average score.


Polygon-radar chart representing the performance of Kandinsky Video 1.1 on EvalCrafter benchmark.


Human evaluation study results. The bars in the plot correspond to the percentage of “wins” in the side-by-side comparison of model generations. We compare our model with Video LDM .

Authors

BibTeX

If you use our work in your research, please cite our publication:

@article{arkhipkin2023fusionframes,
  title     = {FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline},
  author    = {Arkhipkin, Vladimir and Shaheen, Zein and Vasilev, Viacheslav and Dakhova, Elizaveta and Kuznetsov, Andrey and Dimitrov, Denis},
  journal   = {arXiv preprint arXiv:2311.13073},
  year      = {2023}, 
}

Runs of ai-forever KandinskyVideo_1_1 on huggingface.co

0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs

More Information About KandinskyVideo_1_1 huggingface.co Model

More KandinskyVideo_1_1 license Visit here:

https://choosealicense.com/licenses/apache-2.0

KandinskyVideo_1_1 huggingface.co

KandinskyVideo_1_1 huggingface.co is an AI model on huggingface.co that provides KandinskyVideo_1_1's model effect (), which can be used instantly with this ai-forever KandinskyVideo_1_1 model. huggingface.co supports a free trial of the KandinskyVideo_1_1 model, and also provides paid use of the KandinskyVideo_1_1. Support call KandinskyVideo_1_1 model through api, including Node.js, Python, http.

KandinskyVideo_1_1 huggingface.co Url

https://huggingface.co/ai-forever/KandinskyVideo_1_1

ai-forever KandinskyVideo_1_1 online free

KandinskyVideo_1_1 huggingface.co is an online trial and call api platform, which integrates KandinskyVideo_1_1's modeling effects, including api services, and provides a free online trial of KandinskyVideo_1_1, you can try KandinskyVideo_1_1 online for free by clicking the link below.

ai-forever KandinskyVideo_1_1 online free url in huggingface.co:

https://huggingface.co/ai-forever/KandinskyVideo_1_1

KandinskyVideo_1_1 install

KandinskyVideo_1_1 is an open source model from GitHub that offers a free installation service, and any user can find KandinskyVideo_1_1 on GitHub to install. At the same time, huggingface.co provides the effect of KandinskyVideo_1_1 install, users can directly use KandinskyVideo_1_1 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

KandinskyVideo_1_1 install url in huggingface.co:

https://huggingface.co/ai-forever/KandinskyVideo_1_1

Url of KandinskyVideo_1_1

KandinskyVideo_1_1 huggingface.co Url

Provider of KandinskyVideo_1_1 huggingface.co

ai-forever
ORGANIZATIONS

Other API from ai-forever

huggingface.co

Total runs: 525.5K
Run Growth: 507.4K
Growth Rate: 96.56%
Updated: November 03 2023
huggingface.co

Total runs: 10.6K
Run Growth: 1.5K
Growth Rate: 13.75%
Updated: December 05 2023
huggingface.co

Total runs: 8.2K
Run Growth: 5.1K
Growth Rate: 59.45%
Updated: December 29 2024
huggingface.co

Total runs: 5.9K
Run Growth: 3.6K
Growth Rate: 61.73%
Updated: December 11 2023
huggingface.co

Total runs: 2.3K
Run Growth: -360
Growth Rate: -15.50%
Updated: December 05 2023
huggingface.co

Total runs: 1.8K
Run Growth: 78
Growth Rate: 4.35%
Updated: December 28 2023
huggingface.co

Total runs: 315
Run Growth: 165
Growth Rate: 52.05%
Updated: January 26 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: June 08 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: December 24 2021
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: September 21 2021