TencentARC / ViSFT

huggingface.co
Total runs: 0
24-hour runs: 0
7-day runs: 0
30-day runs: 0
Model's Last Updated: January 20 2024

Introduction of ViSFT

Model Details of ViSFT

This is the official repo for paper Supervised Fine-tuning in turn Improves Visual Foundation Models .

News
  • [2024/01/19] We open source the ViSFT including training scripts and weights. Evaluation codes will be released soon.
Introduction

Image-text training like CLIP has dominated the pretraining of vision foundation models in recent years. Subsequent efforts have been made to introduce region-level visual learning into CLIP’s pretraining but face scalability challenges due to the lack of large-scale region-level datasets. Drawing inspiration from supervised fine-tuning (SFT) in natural language processing such as instruction tuning, we explore the potential of fine-grained SFT in enhancing the generation of vision foundation models after their pretraining. Thus a two-stage method ViSFT ( Vi sion SFT ) is proposed to unleash the fine-grained knowledge of vision foundation models. In ViSFT, the vision foundation model is enhanced by performing visual joint learning on some in-domain tasks and then tested on out-of-domain benchmarks. With updating using ViSFT on 8 V100 GPUs in less than 2 days, a vision transformer with over 4.4B parameters shows improvements across various out-of-domain benchmarks including vision and vision-linguistic scenarios.

Installation
creating a conda environment
conda create -n ViSFT python=3.8

conda activate ViSFT
Install pytorch

we use torch1.12 with CUDA11.3 on 8 NVIDIA Volta V100- SXM2-32GB GPUs

pip install --extra-index-url https://download.pytorch.org/whl/cu113 torch==1.12.0

pip install --extra-index-url https://download.pytorch.org/whl/cu113 torchvision==0.13.0

pip install --extra-index-url https://download.pytorch.org/whl/cu113 torchaudio==0.12.0 
xformers installation

Flash attention is required for running EVA-ViT-E. please refer to xformers

loralib installation
pip install --user git+https://github.com/microsoft/LoRA
compile MSDeform for Mask2former head
cd ./mmf/models/visft/ops
sudo sh make.sh
# back to root dir
cd ../../../../
Other packages installation
pip install -r requirements.txt
Dataset Preparation

export DATA_PATH=your_data_path

image caption

Generating hdf5 files for image caption following hdf5

file strcture:

DATA_PATH/
└── processed_datasets/
    └─── coco_caption_hdf5_files
        ├──TEST_CAPLENS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TEST_CAPTIONS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TEST_IMAGES_coco_5_cap_per_img_5_min_word_freq.hdf5
        ├──TRAIN_CAPLENS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TRAIN_CAPTIONS_coco_5_cap_per_img_5_min_word_freq.json
        ├──TRAIN_IMAGES_coco_5_cap_per_img_5_min_word_freq.hdf5
        ├──VAL_CAPLENS_coco_5_cap_per_img_5_min_word_freq.json
        ├──VAL_CAPTIONS_coco_5_cap_per_img_5_min_word_freq.json
        ├──VAL_IMAGES_coco_5_cap_per_img_5_min_word_freq.hdf5
        └───WORDMAP_coco_5_cap_per_img_5_min_word_freq.json
Detection & Segmentation

file strcture:

DATA_PATH/
└── public_datasets/
    └─── coco
        ├──train2017
        ├──val2017
        ├──test2017
        └───annotations
            ├──instances_train2017.json
            ├──instances_val2017.json
            └───image_info_test-dev2017.json
Training
Stage1

To get compatible in-domain task heads. Using 8 NVIDIA Volta V100-SXM2-32GB GPUs for every in-domain task head.

For eva-vit-g

Preparing weights from LAVIS

wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth

Adding your weights path to configs under dir:./projects/visft/configs/stage1/eva_g/

backbone_dir: path/eva_vit_g.pth

Implementing training

bash ./scripts/stage1_train/eva_g/caption.sh
bash ./scripts/stage1_train/eva_g/detection.sh
bash ./scripts/stage1_train/eva_g/segment.sh

For eva-vit-e

Preparing EVA-CLIP weights from EVA

Extact ViT weights

python ./scripts/preprocess/extract_eva_e_vit.py

Adding your weights path to configs under dir:./projects/visft/configs/stage1/eva_e/

backbone_dir: path/EVA02_CLIP_E_psz14_plus_s9B_Visual.pt

Implementing training

# can be executed in parallel
bash ./scripts/stage1_train/eva_e/caption.sh
bash ./scripts/stage1_train/eva_e/detection.sh
bash ./scripts/stage1_train/eva_e/segment.sh

Or you can use the weights we provided.

In-domain Heads
EVA-G EVA-E
Caption Head weights weights
Segment Head weights weights
Detection Head weights weights
Stage2

For eva-vit-g

Adding your weights path to configs under dir:./projects/visft/configs/stage2/eva_g/stage2.yaml

backbone_dir: path/eva_vit_g.pth
caption_ckpt_path: 'path/eva_g_caption_heads.ckpt'
segment_ckpt_path:'path/eva_g_segment_heads.ckpt'
detection_ckpt_path: 'path/eva_g_detection_heads.ckpt'

Implementing training

bash ./scripts/stage2_train/eva_g/stage2.sh

For eva-vit-e

Adding your weights path to configs under dir:./projects/visft/configs/stage2/eva_e/stage2.yaml

backbone_dir: path/EVA02_CLIP_E_psz14_plus_s9B_Visual.pt
caption_ckpt_path: 'path/eva_e_caption_heads.ckpt'
segment_ckpt_path:'path/eva_e_segment_heads.ckpt'
detection_ckpt_path: 'path/eva_e_detection_heads.ckpt'

Implementing training

bash ./scripts/stage2_train/eva_e/stage2.sh
Get LoRA Weights

You can extract expected LoRA weights by

python ./scripts/postprocess/extract_lora_weights.py

Or use the LoRA weights we provide:

Evaluation Benchmarks
  • [] Zero-shot Image Classification
  • [] Zero-shot Image-text Retrieval
  • [] OCR
  • [] Grounded Object Indentification
  • [] VQA
  • [] Image Captioning on NoCaps
Acknowledgement

The code of ViSFT is based on the official implementation of mmf , EVA and LAVIS

Citation

If you found our work valuable, please cite:

@misc{jiang2024supervised,
      title={Supervised Fine-tuning in turn Improves Visual Foundation Models}, 
      author={Xiaohu Jiang and Yixiao Ge and Yuying Ge and Chun Yuan and Ying Shan},
      year={2024},
      eprint={2401.10222},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Runs of TencentARC ViSFT on huggingface.co

0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs

More Information About ViSFT huggingface.co Model

ViSFT huggingface.co

ViSFT huggingface.co is an AI model on huggingface.co that provides ViSFT's model effect (), which can be used instantly with this TencentARC ViSFT model. huggingface.co supports a free trial of the ViSFT model, and also provides paid use of the ViSFT. Support call ViSFT model through api, including Node.js, Python, http.

TencentARC ViSFT online free

ViSFT huggingface.co is an online trial and call api platform, which integrates ViSFT's modeling effects, including api services, and provides a free online trial of ViSFT, you can try ViSFT online for free by clicking the link below.

TencentARC ViSFT online free url in huggingface.co:

https://huggingface.co/TencentARC/ViSFT

ViSFT install

ViSFT is an open source model from GitHub that offers a free installation service, and any user can find ViSFT on GitHub to install. At the same time, huggingface.co provides the effect of ViSFT install, users can directly use ViSFT installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

ViSFT install url in huggingface.co:

https://huggingface.co/TencentARC/ViSFT

Url of ViSFT

Provider of ViSFT huggingface.co

TencentARC
ORGANIZATIONS

Other API from TencentARC

huggingface.co

Create photos, paintings and avatars for anyone in any style within seconds.

Total runs: 35.2K
Run Growth: -43.4K
Growth Rate: -124.12%
Updated: July 22 2024
huggingface.co

Total runs: 122
Run Growth: -78
Growth Rate: -55.71%
Updated: December 16 2024
huggingface.co

Total runs: 114
Run Growth: 22
Growth Rate: 19.30%
Updated: November 29 2024
huggingface.co

Total runs: 19
Run Growth: 11
Growth Rate: 57.89%
Updated: December 10 2024
huggingface.co

Total runs: 5
Run Growth: -1
Growth Rate: -20.00%
Updated: December 30 2024
huggingface.co

Total runs: 5
Run Growth: -2
Growth Rate: -40.00%
Updated: December 30 2024
huggingface.co

Total runs: 4
Run Growth: -6
Growth Rate: -150.00%
Updated: December 30 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: June 29 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: August 20 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: December 16 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: December 20 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: August 28 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: December 17 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: August 13 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: April 11 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated: October 08 2022