Readme
Implementation of SDXL Image blending via Compels weighted prompt blending
Inspired by the ComfyUI ReVision tutorial here
Changelog
10/11/23 - Updated Cog to use pget to download weights
SDXL Image Blending
Remove background from an image
Falcons.ai Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
Implementation of Realistic Vision v5.1 with VAE
FLUX.1-Dev LoRA Explorer
SDXL ControlNet - Canny
SDXL Inpainting by the HF Diffusers team
Juggernaut XL v9
Turn any image into a video
Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of SDXL, offering a 60% speedup while maintaining high-quality text-to-image generation capabilities
Hyper FLUX 8-step by ByteDance
CLIP Interrogator for SDXL optimizes text prompts to match a given image
A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.
FLUX.1-Dev Multi LoRA Explorer
Robust face restoration algorithm for old photos/AI-generated faces
FLUX.1-Schnell LoRA Explorer
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
SDXL v1.0 - A text-to-image generative AI model that creates beautiful images
😊 Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL
snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance
Latent Consistency Model (LCM): SDXL, distills the original model into a version that requires fewer steps (4 to 8 instead of the original 25 to 50)
Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1
RealvisXL-v2.0 with LCM LoRA - requires fewer steps (4 to 8 instead of the original 40 to 50)
Implementation of SDXL RealVisXL_V2.0
Animate Your Personalized Text-to-Image Diffusion Models
moondream2 is a small vision language model designed to run efficiently on edge devices
Practical face restoration algorithm for *old photos* or *AI-generated faces* (for larger images)
DreamShaper is a general purpose SD model that aims at doing everything well, photos, art, anime, manga. It's designed to match Midjourney and DALL-E.
Realistic Vision v5.0 Image 2 Image
Real-ESRGAN Video Upscaler
A unique fusion that showcases exceptional prompt adherence and semantic understanding, it seems to be a step above base SDXL and a step closer to DALLE-3 in terms of prompt comprehension
CLIP Interrogator (for faster inference)
dreamshaper-xl-lightning is a Stable Diffusion model that has been fine-tuned on SDXL
Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets
Realistic Vision V4.0
SDXL_Niji_Special Edition
PixArt-Alpha 1024px is a transformer-based text-to-image diffusion system trained on text embeddings from T5
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Dreamshaper-7 img2img with LCM LoRA for faster inference
AI-driven audio enhancement for your audio files, powered by Resemble AI
Ostris AI-Toolkit for Flux LoRA Training (DEPRECATED. Please use: ostris/flux-dev-lora-trainer)
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Implementation of SDXL RealVisXL_V1.0
(Academic and Non-commercial use only) Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture
lmsys/vicuna-13b-v1.3
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
Real-ESRGAN with optional face correction and adjustable upscale (for larger images)
Realistic Vision v5.0 Inpainting
Gemma2 2b by Google
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate SDXL images with an image prompt
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
SDXL ControlNet - Depth
Realistic Vision v5.0 with VAE
lmsys/vicuna-7b-v1.3
(Research only) IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts
SDXL ControlNet - OpenPose
Meta's Llama 2 7b Chat - GPTQ
sdxs-512-0.9 can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching
Stylized Audio-Driven Single Image Talking Face Animation
Meta's Llama 2 13b Chat - GPTQ
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
ThinkDiffusionXL is a go-to model capable of amazing photorealism that's also versatile enough to generate high-quality images across a variety of styles and subjects without needing to be a prompting genius
This is wizard-vicuna-13b trained with a subset of the dataset - responses that contained alignment / moralizing were removed
Hyper FLUX 16-step by ByteDance
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
FalconAIs NSFW detection model, extended for videos
Image-to-video - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
InterpAny-Clearer: Clearer anytime frame interpolation & Manipulated interpolation
Segments an audio recording based on who is speaking (on A100)
Latest model in the Qwen family for chatting with video and image models
(Research only) Moondream1 is a vision language model that performs on par with models twice its size
Image to Image enhancer using DemoFusion
Open diffusion model for high-quality video generation
Auto fuse a user's face onto the template image, with a similar appearance to the user
DemoFusion: Democratising High-Resolution Image Generation With No 💰
SDXL lightning mult-controlnet, img2img & inpainting
Ollama Nemotron 70b
Implementation of SDXL RealVisXL_V2.0 img2img
Segment Anything 2 (SAM2) by Meta - Automatic mask generation
Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets
Stable Diffusion x4 upscaler model
Cog wrapper for Ollama llama3:70b
360 Panorama SDXL image with inpainted wrapping seam
Convert your videos to DensePose and use it with MagicAnimate
Projection module trained to add vision capabilties to Llama 3 using SigLIP
Realistic Vision V5 with OpenPose
Realistic Vision V3.0 with VAE
Fuyu-8B is a multi-modal text and image transformer trained by Adept AI
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
Controlnet v1.1 - Tile Version
SDXL using DeepCache
Playground v2 is a diffusion-based text-to-image generative model trained from scratch. Try out all 3 models here
nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks
Segmind Stable Diffusion Model (SSD-1B) img2img
A combination of ip_adapter SDv1.5 and mediapipe-face to inpaint a face
Phi-2 by Microsoft
Implementation of SDXL RealVisXL_V1.0 img2img
Upstage/Llama-2-70B-instruct-v2 - GPTQ
POC to run inference on Realvisxl2 LoRAs
A Flux LoRA trained on watercolor style photos