ViSFT huggingface.co api & TencentARC ViSFT github AI Model

Introduction of ViSFT

Model Details of ViSFT

This is the official repo for paper Supervised Fine-tuning in turn Improves Visual Foundation Models .

📃 Paper (ArXiv) | Code | 🤗 Huggingface

News

[2024/01/19] We open source the ViSFT including training scripts and weights. Evaluation codes will be released soon.

Introduction

Image-text training like CLIP has dominated the pretraining of vision foundation models in recent years. Subsequent efforts have been made to introduce region-level visual learning into CLIP’s pretraining but face scalability challenges due to the lack of large-scale region-level datasets. Drawing inspiration from supervised fine-tuning (SFT) in natural language processing such as instruction tuning, we explore the potential of fine-grained SFT in enhancing the generation of vision foundation models after their pretraining. Thus a two-stage method ViSFT ( Vi sion SFT ) is proposed to unleash the fine-grained knowledge of vision foundation models. In ViSFT, the vision foundation model is enhanced by performing visual joint learning on some in-domain tasks and then tested on out-of-domain benchmarks. With updating using ViSFT on 8 V100 GPUs in less than 2 days, a vision transformer with over 4.4B parameters shows improvements across various out-of-domain benchmarks including vision and vision-linguistic scenarios.

Installation

creating a conda environment

conda create -n ViSFT python=3.8

conda activate ViSFT

Install pytorch

we use torch1.12 with CUDA11.3 on 8 NVIDIA Volta V100- SXM2-32GB GPUs

pip install --extra-index-url https://download.pytorch.org/whl/cu113 torch==1.12.0

pip install --extra-index-url https://download.pytorch.org/whl/cu113 torchvision==0.13.0

pip install --extra-index-url https://download.pytorch.org/whl/cu113 torchaudio==0.12.0

xformers installation

Flash attention is required for running EVA-ViT-E. please refer to xformers

loralib installation

pip install --user git+https://github.com/microsoft/LoRA

compile MSDeform for Mask2former head

cd ./mmf/models/visft/ops
sudo sh make.sh
# back to root dir
cd ../../../../

Other packages installation

pip install -r requirements.txt

Dataset Preparation

export DATA_PATH=your_data_path

image caption

Generating hdf5 files for image caption following hdf5

file strcture:

DATA_PATH/
└── processed_datasets/
    └─── coco_caption_hdf5_files
        ├──TEST_CAPLENS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TEST_CAPTIONS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TEST_IMAGES_coco_5_cap_per_img_5_min_word_freq.hdf5
        ├──TRAIN_CAPLENS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TRAIN_CAPTIONS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TRAIN_IMAGES_coco_5_cap_per_img_5_min_word_freq.hdf5
        ├──VAL_CAPLENS_coco_5_cap_per_img_5_min_word_freq.json
        ├──VAL_CAPTIONS_coco_5_cap_per_img_5_min_word_freq.json
        ├──VAL_IMAGES_coco_5_cap_per_img_5_min_word_freq.hdf5
        └───WORDMAP_coco_5_cap_per_img_5_min_word_freq.json

Detection & Segmentation

file strcture:

DATA_PATH/
└── public_datasets/
    └─── coco
        ├──train2017
        ├──val2017
        ├──test2017
        └───annotations
            ├──instances_train2017.json
            ├──instances_val2017.json
            └───image_info_test-dev2017.json

Training

Stage1

To get compatible in-domain task heads. Using 8 NVIDIA Volta V100-SXM2-32GB GPUs for every in-domain task head.

For eva-vit-g

Preparing weights from LAVIS

wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth

Adding your weights path to configs under dir:./projects/visft/configs/stage1/eva_g/

backbone_dir: path/eva_vit_g.pth

Implementing training

bash ./scripts/stage1_train/eva_g/caption.sh
bash ./scripts/stage1_train/eva_g/detection.sh
bash ./scripts/stage1_train/eva_g/segment.sh

For eva-vit-e

Preparing EVA-CLIP weights from EVA

Extact ViT weights

python ./scripts/preprocess/extract_eva_e_vit.py

Adding your weights path to configs under dir:./projects/visft/configs/stage1/eva_e/

backbone_dir: path/EVA02_CLIP_E_psz14_plus_s9B_Visual.pt

Implementing training

# can be executed in parallel
bash ./scripts/stage1_train/eva_e/caption.sh
bash ./scripts/stage1_train/eva_e/detection.sh
bash ./scripts/stage1_train/eva_e/segment.sh

Or you can use the weights we provided.

In-domain Heads
	EVA-G	EVA-E
Caption Head	weights	weights
Segment Head	weights	weights
Detection Head	weights	weights

Stage2

For eva-vit-g

Adding your weights path to configs under dir:./projects/visft/configs/stage2/eva_g/stage2.yaml

backbone_dir: path/eva_vit_g.pth
caption_ckpt_path: 'path/eva_g_caption_heads.ckpt'
segment_ckpt_path:'path/eva_g_segment_heads.ckpt'
detection_ckpt_path: 'path/eva_g_detection_heads.ckpt'

Implementing training

bash ./scripts/stage2_train/eva_g/stage2.sh

For eva-vit-e

Adding your weights path to configs under dir:./projects/visft/configs/stage2/eva_e/stage2.yaml

backbone_dir: path/EVA02_CLIP_E_psz14_plus_s9B_Visual.pt
caption_ckpt_path: 'path/eva_e_caption_heads.ckpt'
segment_ckpt_path:'path/eva_e_segment_heads.ckpt'
detection_ckpt_path: 'path/eva_e_detection_heads.ckpt'

Implementing training

bash ./scripts/stage2_train/eva_e/stage2.sh

Get LoRA Weights

You can extract expected LoRA weights by

python ./scripts/postprocess/extract_lora_weights.py

Or use the LoRA weights we provide:

LoRA weights
Iters	EVA-G	EVA-E
5k	weights	weights
10k	weights	weights
15k	weights	weights
20k	weights	weights
50k	weights	weights

Evaluation Benchmarks

[] Zero-shot Image Classification
[] Zero-shot Image-text Retrieval
[] OCR
[] Grounded Object Indentification
[] VQA
[] Image Captioning on NoCaps

Acknowledgement

The code of ViSFT is based on the official implementation of mmf , EVA and LAVIS

Citation

If you found our work valuable, please cite:

@misc{jiang2024supervised,
      title={Supervised Fine-tuning in turn Improves Visual Foundation Models}, 
      author={Xiaohu Jiang and Yixiao Ge and Yuying Ge and Chun Yuan and Ying Shan},
      year={2024},
      eprint={2401.10222},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Runs of TencentARC ViSFT on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About ViSFT huggingface.co Model

More ViSFT license Visit here:

https://choosealicense.com/licenses/apache-2.0

ViSFT huggingface.co

ViSFT huggingface.co is an AI model on huggingface.co that provides ViSFT's model effect (), which can be used instantly with this TencentARC ViSFT model. huggingface.co supports a free trial of the ViSFT model, and also provides paid use of the ViSFT. Support call ViSFT model through api, including Node.js, Python, http.

ViSFT huggingface.co Url

https://huggingface.co/TencentARC/ViSFT

TencentARC ViSFT online free

ViSFT huggingface.co is an online trial and call api platform, which integrates ViSFT's modeling effects, including api services, and provides a free online trial of ViSFT, you can try ViSFT online for free by clicking the link below.

TencentARC ViSFT online free url in huggingface.co:

https://huggingface.co/TencentARC/ViSFT

ViSFT install

ViSFT is an open source model from GitHub that offers a free installation service, and any user can find ViSFT on GitHub to install. At the same time, huggingface.co provides the effect of ViSFT install, users can directly use ViSFT installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

ViSFT install url in huggingface.co:

https://huggingface.co/TencentARC/ViSFT

huggingface.co

TencentARC/InstantMesh

Total runs: 75.8K

Run Growth: -691

Growth Rate: -1.04%

Updated: April 11 2024

huggingface.co

TencentARC/PhotoMaker

Create photos, paintings and avatars for anyone in any style within seconds.

Total runs: 35.2K

Run Growth: -43.4K

Growth Rate: -124.12%

Updated: July 22 2024

huggingface.co

TencentARC/PhotoMaker-V2

Total runs: 30.3K

Run Growth: 6.6K

Growth Rate: 21.61%

Updated: July 22 2024

huggingface.co

TencentARC/t2i-adapter-sketch-sdxl-1.0

Total runs: 9.8K

Run Growth: 134

Growth Rate: 1.37%

Updated: September 08 2023

huggingface.co

TencentARC/t2i-adapter-canny-sdxl-1.0

Total runs: 6.8K

Run Growth: 1.1K

Growth Rate: 16.29%

Updated: September 07 2023

huggingface.co

TencentARC/t2i-adapter-lineart-sdxl-1.0

Total runs: 6.7K

Run Growth: 363

Growth Rate: 5.55%

Updated: September 07 2023

huggingface.co

TencentARC/t2i-adapter-depth-midas-sdxl-1.0

Total runs: 5.9K

Run Growth: 1.0K

Growth Rate: 17.41%

Updated: September 07 2023

huggingface.co

TencentARC/t2i-adapter-openpose-sdxl-1.0

Total runs: 5.3K

Run Growth: 1.4K

Growth Rate: 27.11%

Updated: September 07 2023

huggingface.co

TencentARC/t2i-adapter-depth-zoe-sdxl-1.0

Total runs: 4.3K

Run Growth: 38

Growth Rate: 0.89%

Updated: September 08 2023

huggingface.co

TencentARC/t2iadapter_depth_sd15v2

Total runs: 2.6K

Run Growth: 327

Growth Rate: 12.57%

Updated: July 31 2023

huggingface.co

TencentARC/t2iadapter_sketch_sd15v2

Total runs: 2.4K

Run Growth: 57

Growth Rate: 2.32%

Updated: August 01 2023

huggingface.co

TencentARC/t2iadapter_canny_sd15v2

Total runs: 2.4K

Run Growth: 152

Growth Rate: 6.31%

Updated: July 31 2023

huggingface.co

TencentARC/LLaMA-Pro-8B

Total runs: 2.3K

Run Growth: 962

Growth Rate: 42.17%

Updated: January 08 2024

huggingface.co

TencentARC/LLaMA-Pro-8B-Instruct

Total runs: 2.2K

Run Growth: 743

Growth Rate: 34.30%

Updated: January 07 2024

huggingface.co

TencentARC/t2iadapter_zoedepth_sd15v1

Total runs: 2.0K

Run Growth: 210

Growth Rate: 10.38%

Updated: July 31 2023

huggingface.co

TencentARC/Mistral_Pro_8B_v0.1

Total runs: 217

Run Growth: -181

Growth Rate: -82.27%

Updated: February 27 2024

huggingface.co

TencentARC/StereoCrafter

Total runs: 160

Run Growth: 44

Growth Rate: 26.04%

Updated: December 27 2024

huggingface.co

TencentARC/t2iadapter_openpose_sd14v1

Total runs: 148

Run Growth: -542

Growth Rate: -354.25%

Updated: July 31 2023

huggingface.co

TencentARC/NVComposer

Total runs: 122

Run Growth: -78

Growth Rate: -55.71%

Updated: December 16 2024

huggingface.co

TencentARC/flux-mini

Total runs: 114

Run Growth: 22

Growth Rate: 19.30%

Updated: November 29 2024

huggingface.co

TencentARC/t2iadapter_depth_sd14v1

Total runs: 57

Run Growth: 35

Growth Rate: 61.40%

Updated: July 31 2023

huggingface.co

TencentARC/t2iadapter_color_sd14v1

Total runs: 54

Run Growth: -18

Growth Rate: -35.29%

Updated: July 31 2023

huggingface.co

TencentARC/t2iadapter_sketch_sd14v1

Total runs: 50

Run Growth: 29

Growth Rate: 60.42%

Updated: July 31 2023

huggingface.co

TencentARC/t2iadapter_canny_sd14v1

Total runs: 41

Run Growth: 12

Growth Rate: 30.77%

Updated: July 31 2023

huggingface.co

TencentARC/QA-CLIP-ViT-L-14

Total runs: 41

Run Growth: -103

Growth Rate: -264.10%

Updated: May 16 2023

huggingface.co

TencentARC/QA-CLIP-ViT-B-16

Total runs: 33

Run Growth: -78

Growth Rate: -243.75%

Updated: May 16 2023

huggingface.co

TencentARC/t2iadapter_seg_sd14v1

Total runs: 27

Run Growth: 9

Growth Rate: 40.91%

Updated: July 31 2023

huggingface.co

TencentARC/MetaMath-Mistral-Pro

Total runs: 24

Run Growth: 3

Growth Rate: 12.50%

Updated: February 27 2024

huggingface.co

TencentARC/Divot

Total runs: 19

Run Growth: 11

Growth Rate: 57.89%

Updated: December 10 2024

huggingface.co

TencentARC/Open-MAGVIT2-Tokenizer-128-resolution

Total runs: 19

Run Growth: 3

Growth Rate: 15.79%

Updated: January 02 2025

huggingface.co

TencentARC/Open-MAGVIT2-Tokenizer-256-resolution

Total runs: 18

Run Growth: 9

Growth Rate: 50.00%

Updated: January 02 2025

huggingface.co

TencentARC/SEED-Story

Total runs: 16

Run Growth: -3

Growth Rate: -17.65%

Updated: August 26 2024

huggingface.co

TencentARC/IBQ-Tokenizer-16384

Total runs: 12

Run Growth: 2

Growth Rate: 16.67%

Updated: December 30 2024

huggingface.co

TencentARC/Open-MAGVIT2-AR-XL-256-resolution

Total runs: 12

Run Growth: 2

Growth Rate: 16.67%

Updated: January 02 2025

huggingface.co

TencentARC/t2iadapter_keypose_sd14v1

Total runs: 12

Run Growth: 2

Growth Rate: 16.67%

Updated: July 14 2023

huggingface.co

TencentARC/Open-MAGVIT2-AR-B-256-resolution

Total runs: 9

Run Growth: 2

Growth Rate: 22.22%

Updated: January 02 2025

huggingface.co

TencentARC/IBQ-AR-XXL

Total runs: 9

Run Growth: 3

Growth Rate: 33.33%

Updated: December 30 2024

huggingface.co

TencentARC/IBQ-Tokenizer-262144

Total runs: 7

Run Growth: -5

Growth Rate: -71.43%

Updated: December 30 2024

huggingface.co

TencentARC/IBQ-Tokenizer-1024

Total runs: 6

Run Growth: -2

Growth Rate: -33.33%

Updated: December 30 2024

huggingface.co

TencentARC/IBQ-Tokenizer-8192

Total runs: 5

Run Growth: -2

Growth Rate: -40.00%

Updated: December 30 2024

huggingface.co

TencentARC/IBQ-AR-XL

Total runs: 5

Run Growth: -1

Growth Rate: -20.00%

Updated: December 30 2024

huggingface.co

TencentARC/IBQ-AR-L

Total runs: 5

Run Growth: -2

Growth Rate: -40.00%

Updated: December 30 2024

huggingface.co

TencentARC/Open-MAGVIT2-AR-L-256-resolution

Total runs: 5

Run Growth: -6

Growth Rate: -120.00%

Updated: January 02 2025

huggingface.co

TencentARC/IBQ-Tokenizer-16384-Pretrain

Total runs: 4

Run Growth: 4

Growth Rate: 100.00%

Updated: February 13 2025

huggingface.co

TencentARC/IBQ-Tokenizer-262144-Pretrain

Total runs: 4

Run Growth: 4

Growth Rate: 100.00%

Updated: February 13 2025

huggingface.co

TencentARC/IBQ-AR-B

Total runs: 4

Run Growth: -6

Growth Rate: -150.00%

Updated: December 30 2024

huggingface.co

TencentARC/ViT-Lens

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: June 29 2024

huggingface.co

TencentARC/FreeSplatter

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: December 19 2024

huggingface.co

TencentARC/mllm-npu-llama2-qwenvl-vit

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: July 10 2024

huggingface.co

TencentARC/ColorFlow

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: January 12 2025

huggingface.co

TencentARC/SmartEdit-7B

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: April 27 2024

huggingface.co

TencentARC/MasaCtrl

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: August 20 2023

huggingface.co

TencentARC/ImageConductor

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: July 09 2024

huggingface.co

TencentARC/T2I-Adapter

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: August 22 2023

huggingface.co

TencentARC/BrushEdit

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: December 16 2024

huggingface.co

TencentARC/DI-PCG

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: December 20 2024

huggingface.co

TencentARC/QA-CLIP

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: August 28 2023

huggingface.co

TencentARC/Moto

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: December 17 2024

huggingface.co

TencentARC/SmartEdit-13B

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: April 27 2024

huggingface.co

TencentARC/Mira-v1

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: August 13 2024

huggingface.co

TencentARC/MotionCtrl

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: July 19 2024

huggingface.co

TencentARC/Mira-v0

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: April 11 2024

huggingface.co

TencentARC/Open-MAGVIT2

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: September 09 2024

huggingface.co

TencentARC/GFPGANv1

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: October 08 2022

huggingface.co

TencentARC/CustomNet

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated: July 22 2024

TencentARC / ViSFT

Introduction of ViSFT

Model Details of ViSFT

News

Introduction

Installation

creating a conda environment

Install pytorch

xformers installation

loralib installation

compile MSDeform for Mask2former head

Other packages installation

Dataset Preparation

image caption

Detection & Segmentation

Training

Stage1

Stage2

Get LoRA Weights

Evaluation Benchmarks

Acknowledgement

Citation

Runs of TencentARC ViSFT on huggingface.co

More Information About ViSFT huggingface.co Model

More ViSFT license Visit here:

ViSFT huggingface.co

ViSFT huggingface.co Url

TencentARC ViSFT online free

TencentARC ViSFT online free url in huggingface.co:

ViSFT install

ViSFT install url in huggingface.co:

Url of ViSFT

ViSFT huggingface.co Url

Provider of ViSFT huggingface.co

Other API from TencentARC