Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art visionlanguage models do not perform well on these data. We propose MATCHA (Math reasoning and Chart derendering pretraining) to enhance visual language models’ capabilities jointly modeling charts/plots and language data. Specifically we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MATCHA pretraining starting from Pix2Struct, a recently proposed imageto-text visual language model. On standard benchmarks such as PlotQA and ChartQA, MATCHA model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MATCHA pretraining transfers to domains such as screenshot, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MATCHA pretraining on broader visual language tasks.
Using the model
You should ask specific questions to the model in order to get consistent generations. Here we are asking the model whether the sum of values that are in a chart are greater than the largest value.
from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration
import requests
from PIL import Image
processor = Pix2StructProcessor.from_pretrained('google/matcha-chartqa')
model = Pix2StructForConditionalGeneration.from_pretrained('google/matcha-chartqa')
url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/20294671002019.png"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, text="Is the sum of all 4 places greater than Laos?", return_tensors="pt")
predictions = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(predictions[0], skip_special_tokens=True))
>>> No
To run the predictions on GPU, simply add
.to(0)
when creating the model and when getting the inputs (
inputs = inputs.to(0)
)
Once saved, you can push your converted model with the following snippet:
from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor
model = Pix2StructForConditionalGeneration.from_pretrained(PATH_TO_SAVE)
processor = Pix2StructProcessor.from_pretrained(PATH_TO_SAVE)
model.push_to_hub("USERNAME/MODEL_NAME")
processor.push_to_hub("USERNAME/MODEL_NAME")
Contribution
This model was originally contributed by Fangyu Liu, Francesco Piccinno et al. and added to the Hugging Face ecosystem by
Younes Belkada
.
Citation
If you want to cite this work, please consider citing the original paper:
@misc{liu2022matcha,
title={MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering},
author={Fangyu Liu and Francesco Piccinno and Syrine Krichene and Chenxi Pang and Kenton Lee and Mandar Joshi and Yasemin Altun and Nigel Collier and Julian Martin Eisenschlos},
year={2022},
eprint={2212.09662},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Runs of google matcha-chartqa on huggingface.co
435
Total runs
124
24-hour runs
26
3-day runs
66
7-day runs
-245
30-day runs
More Information About matcha-chartqa huggingface.co Model
matcha-chartqa huggingface.co is an AI model on huggingface.co that provides matcha-chartqa's model effect (), which can be used instantly with this google matcha-chartqa model. huggingface.co supports a free trial of the matcha-chartqa model, and also provides paid use of the matcha-chartqa. Support call matcha-chartqa model through api, including Node.js, Python, http.
matcha-chartqa huggingface.co is an online trial and call api platform, which integrates matcha-chartqa's modeling effects, including api services, and provides a free online trial of matcha-chartqa, you can try matcha-chartqa online for free by clicking the link below.
google matcha-chartqa online free url in huggingface.co:
matcha-chartqa is an open source model from GitHub that offers a free installation service, and any user can find matcha-chartqa on GitHub to install. At the same time, huggingface.co provides the effect of matcha-chartqa install, users can directly use matcha-chartqa installed effect in huggingface.co for debugging and trial. It also supports api for free installation.