Readme
Cog implementation of model: SG161222/Realistic_Vision_V5.1_noVAE
With recommended VAE and Schedulers
Give me a follow if you like my work! @lucataco93
Implementation of Realistic Vision v5.1 with VAE
Falcons.ai Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
Remove background from an image
FLUX.1-Dev LoRA Explorer
SDXL ControlNet - Canny
Juggernaut XL v9
Turn any image into a video
SDXL Inpainting developed by the HF Diffusers team
Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of SDXL, offering a 60% speedup while maintaining high-quality text-to-image generation capabilities
Hyper FLUX 8-step by ByteDance
CLIP Interrogator for SDXL optimizes text prompts to match a given image
A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.
FLUX.1-Dev Multi LoRA Explorer
SDXL v1.0 - A text-to-image generative AI model that creates beautiful images
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance
Latent Consistency Model (LCM): SDXL, distills the original model into a version that requires fewer steps (4 to 8 instead of the original 25 to 50)
FLUX.1-Schnell LoRA Explorer
Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1
Robust face restoration algorithm for old photos/AI-generated faces - (A40 GPU)
RealvisXL-v2.0 with LCM LoRA - requires fewer steps (4 to 8 instead of the original 40 to 50)
Implementation of SDXL RealVisXL_V2.0
Animate Your Personalized Text-to-Image Diffusion Models
😊 Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL
moondream2 is a small vision language model designed to run efficiently on edge devices
Practical face restoration algorithm for *old photos* or *AI-generated faces* (for larger images)
DreamShaper is a general purpose SD model that aims at doing everything well, photos, art, anime, manga. It's designed to match Midjourney and DALL-E.
Realistic Vision v5.0 Image 2 Image
A unique fusion that showcases exceptional prompt adherence and semantic understanding, it seems to be a step above base SDXL and a step closer to DALLE-3 in terms of prompt comprehension
CLIP Interrogator (for faster inference)
Real-ESRGAN Video Upscaler
dreamshaper-xl-lightning is a Stable Diffusion model that has been fine-tuned on SDXL
Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets
Realistic Vision V4.0
SDXL_Niji_Special Edition
Dreamshaper-7 img2img with LCM LoRA for faster inference
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
PixArt-Alpha 1024px is a transformer-based text-to-image diffusion system trained on text embeddings from T5
Implementation of SDXL RealVisXL_V1.0
SDXL Image Blending
Ostris AI-Toolkit for Flux LoRA Training (Proof of Concept). Please use the official trainer at: ostris/flux-dev-lora-trainer
BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture
lmsys/vicuna-13b-v1.3
(Academic and Non-commercial use only) Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
Gemma2 2b by Google
Real-ESRGAN with optional face correction and adjustable upscale (for larger images)
Realistic Vision v5.0 Inpainting
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate SDXL images with an image prompt
SDXL ControlNet - Depth
lmsys/vicuna-7b-v1.3
(Research only) IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Realistic Vision v5.0 with VAE
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
SDXL ControlNet - OpenPose
Meta's Llama 2 7b Chat - GPTQ
AI-driven audio enhancement for your audio files, powered by Resemble AI
sdxs-512-0.9 can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching
Meta's Llama 2 13b Chat - GPTQ
Stylized Audio-Driven Single Image Talking Face Animation
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
ThinkDiffusionXL is a go-to model capable of amazing photorealism that's also versatile enough to generate high-quality images across a variety of styles and subjects without needing to be a prompting genius
This is wizard-vicuna-13b trained with a subset of the dataset - responses that contained alignment / moralizing were removed
Hyper FLUX 16-step by ByteDance
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
InterpAny-Clearer: Clearer anytime frame interpolation & Manipulated interpolation
Segments an audio recording based on who is speaking (on A100)
(Research only) Moondream1 is a vision language model that performs on par with models twice its size
Image to Image enhancer using DemoFusion
Open diffusion model for high-quality video generation
Image-to-video - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
DemoFusion: Democratising High-Resolution Image Generation With No 💰
Ollama Nemotron 70b
Implementation of SDXL RealVisXL_V2.0 img2img
Auto fuse a user's face onto the template image, with a similar appearance to the user
Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets
SDXL lightning mult-controlnet, img2img & inpainting
Stable Diffusion x4 upscaler model
360 Panorama SDXL image with inpainted wrapping seam
Segment Anything 2 (SAM2) by Meta - Automatic mask generation
Projection module trained to add vision capabilties to Llama 3 using SigLIP
Convert your videos to DensePose and use it with MagicAnimate
Realistic Vision V5 with OpenPose
Realistic Vision V3.0 with VAE
Fuyu-8B is a multi-modal text and image transformer trained by Adept AI
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
Controlnet v1.1 - Tile Version
SDXL using DeepCache
Playground v2 is a diffusion-based text-to-image generative model trained from scratch. Try out all 3 models here
nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks
FalconAIs NSFW detection model, extended for videos
Segmind Stable Diffusion Model (SSD-1B) img2img
Implementation of SDXL RealVisXL_V1.0 img2img
Upstage/Llama-2-70B-instruct-v2 - GPTQ
POC to run inference on Realvisxl2 LoRAs
A combination of ip_adapter SDv1.5 and mediapipe-face to inpaint a face
Phi-2 by Microsoft
llava-phi-3-mini is a LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct
POC to run inference on SSD-1B LoRAs
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data