Readme
Proof of concept Cog wrapper for Ollama model: Llama3 70b
Cog wrapper for Ollama llama3:70b
Remove background from an image
Falcons.ai Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
Implementation of Realistic Vision v5.1 with VAE
FLUX.1-Dev LoRA Explorer
SDXL ControlNet - Canny
SDXL Inpainting by the HF Diffusers team
Juggernaut XL v9
Turn any image into a video
Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of SDXL, offering a 60% speedup while maintaining high-quality text-to-image generation capabilities
Hyper FLUX 8-step by ByteDance
CLIP Interrogator for SDXL optimizes text prompts to match a given image
FLUX.1-Dev Multi LoRA Explorer
A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.
Robust face restoration algorithm for old photos/AI-generated faces
FLUX.1-Schnell LoRA Explorer
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
SDXL v1.0 - A text-to-image generative AI model that creates beautiful images
😊 Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL
snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance
Latent Consistency Model (LCM): SDXL, distills the original model into a version that requires fewer steps (4 to 8 instead of the original 25 to 50)
Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1
RealvisXL-v2.0 with LCM LoRA - requires fewer steps (4 to 8 instead of the original 40 to 50)
moondream2 is a small vision language model designed to run efficiently on edge devices
Implementation of SDXL RealVisXL_V2.0
Animate Your Personalized Text-to-Image Diffusion Models
Practical face restoration algorithm for *old photos* or *AI-generated faces* (for larger images)
DreamShaper is a general purpose SD model that aims at doing everything well, photos, art, anime, manga. It's designed to match Midjourney and DALL-E.
Realistic Vision v5.0 Image 2 Image
Real-ESRGAN Video Upscaler
A unique fusion that showcases exceptional prompt adherence and semantic understanding, it seems to be a step above base SDXL and a step closer to DALLE-3 in terms of prompt comprehension
CLIP Interrogator (for faster inference)
dreamshaper-xl-lightning is a Stable Diffusion model that has been fine-tuned on SDXL
Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets
Realistic Vision V4.0
SDXL_Niji_Special Edition
PixArt-Alpha 1024px is a transformer-based text-to-image diffusion system trained on text embeddings from T5
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Dreamshaper-7 img2img with LCM LoRA for faster inference
AI-driven audio enhancement for your audio files, powered by Resemble AI
Ostris AI-Toolkit for Flux LoRA Training (DEPRECATED. Please use: ostris/flux-dev-lora-trainer)
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Implementation of SDXL RealVisXL_V1.0
SDXL Image Blending
(Academic and Non-commercial use only) Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture
lmsys/vicuna-13b-v1.3
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
Real-ESRGAN with optional face correction and adjustable upscale (for larger images)
Realistic Vision v5.0 Inpainting
Gemma2 2b by Google
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate SDXL images with an image prompt
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
SDXL ControlNet - Depth
Realistic Vision v5.0 with VAE
lmsys/vicuna-7b-v1.3
(Research only) IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts
SDXL ControlNet - OpenPose
Meta's Llama 2 7b Chat - GPTQ
sdxs-512-0.9 can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching
Stylized Audio-Driven Single Image Talking Face Animation
Meta's Llama 2 13b Chat - GPTQ
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
ThinkDiffusionXL is a go-to model capable of amazing photorealism that's also versatile enough to generate high-quality images across a variety of styles and subjects without needing to be a prompting genius
FalconAIs NSFW detection model, extended for videos
This is wizard-vicuna-13b trained with a subset of the dataset - responses that contained alignment / moralizing were removed
Hyper FLUX 16-step by ByteDance
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
Image-to-video - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
InterpAny-Clearer: Clearer anytime frame interpolation & Manipulated interpolation
Segments an audio recording based on who is speaking (on A100)
Latest model in the Qwen family for chatting with video and image models
(Research only) Moondream1 is a vision language model that performs on par with models twice its size
Image to Image enhancer using DemoFusion
Open diffusion model for high-quality video generation
Auto fuse a user's face onto the template image, with a similar appearance to the user
SDXL lightning mult-controlnet, img2img & inpainting
DemoFusion: Democratising High-Resolution Image Generation With No 💰
Segment Anything 2 (SAM2) by Meta - Automatic mask generation
Ollama Nemotron 70b
Implementation of SDXL RealVisXL_V2.0 img2img
Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets
Stable Diffusion x4 upscaler model
360 Panorama SDXL image with inpainted wrapping seam
Convert your videos to DensePose and use it with MagicAnimate
Projection module trained to add vision capabilties to Llama 3 using SigLIP
Realistic Vision V5 with OpenPose
Realistic Vision V3.0 with VAE
Fuyu-8B is a multi-modal text and image transformer trained by Adept AI
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
Controlnet v1.1 - Tile Version
SDXL using DeepCache
Playground v2 is a diffusion-based text-to-image generative model trained from scratch. Try out all 3 models here
nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks
Segmind Stable Diffusion Model (SSD-1B) img2img
A combination of ip_adapter SDv1.5 and mediapipe-face to inpaint a face
Phi-2 by Microsoft
Implementation of SDXL RealVisXL_V1.0 img2img
Upstage/Llama-2-70B-instruct-v2 - GPTQ
POC to run inference on Realvisxl2 LoRAs
A Flux LoRA trained on watercolor style photos