Readme
Cog implementation of MoonDream1 . Referenced code from the huggingface space here
moondream1
1.6B parameter model built by @vikhyatk using SigLIP, Phi-1.5 and the LLaVa training dataset. The model is release for research purposes only, commercial use is not allowed.
Try it out on Huggingface Spaces ! Or check out the moondream repository on GitHub for inference code and other details.
Model | Parameters | VQAv2 | GQA | VizWiz | TextVQA |
---|---|---|---|---|---|
LLaVA-1.5 | 13.3B | 80.0 | 63.3 | 53.6 | 61.3 |
LLaVA-1.5 | 7.3B | 78.5 | 62.0 | 50.0 | 58.2 |
MC-LLaVA-3B | 3B | 64.2 | 49.6 | 24.9 | 38.6 |
LLaVA-Phi | 3B | 71.4 | - | 35.9 | 48.6 |
moondream1 | 1.6B | 74.3 | 56.3 | 30.3 | 39.8 |