Qwen2.5-Math-RM-72B is specifically designed to guide the Qwen2.5-Math model throughout the training process by offering more granular feedback on the quality of reasoning and intermediate steps, ultimately facilitating more robust model improvements.
Key Highlights:
Multilingual and Multi-Modal Support: Offers preference signals across two languages (Chinese and English) and in dual modes (Chain-of-Thought and Tool-integrated Reasoning), enhancing versatility.
Model Training Guide:
Training Data Enhancement: Employs a data selection process via reward model scoring combined with Rejection Sampling to incrementally enhance the quality of responses
Reinforcement Learning Training: Integrates seamlessly into the reinforcement learning training and provide effective reward signal, further improving model performance.
Inference Boosting:
Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2.5-Math-1.5B-Instruct obtains 83.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2.5-Math-7B-Instruct 83.6 with greedy decoding.
Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
transformers>=4.40.0
for Qwen2.5-Math models. The latest version is recommended.
🚨 This is a must because `transformers` integrated Qwen2.5 codes since `4.37.0`.
For requirements on GPU memory and the respective throughput, see similar results of Qwen2
here
.
Quick Start
Qwen2.5-Math-RM-72B
is a reward model typically used for offering feedback on the quality of reasoning and intermediate steps, serving in Rejection Sampling, reinforcement learning training and RM@N.
🤗 Hugging Face Transformers
Here we show a code snippet to show you how to use the Qwen2.5-Math-RM-72B with
transformers
:
import torch
from transformers import AutoModel, AutoTokenizer
model_name = "Qwen/Qwen2.5-Math-RM-72B"
device = "auto"# the device to load the model onto
model = AutoModel.from_pretrained(
model_name,
device_map=device,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
chat = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"},
{"role": "assistant", "content": "To determine how much Janet makes from selling the duck eggs at the farmers' market, we need to follow these steps:\n\n1. Calculate the total number of eggs laid by the ducks each day.\n2. Determine how many eggs Janet eats and bakes for herself each day.\n3. Find out how many eggs are left to be sold.\n4. Calculate the revenue from selling the remaining eggs at $2 per egg.\n\nLet's start with the first step:\n\n1. Janet's ducks lay 16 eggs per day.\n\nNext, we calculate how many eggs Janet eats and bakes for herself each day:\n\n2. Janet eats 3 eggs for breakfast every morning.\n3. Janet bakes 4 eggs for her friends every day.\n\nSo, the total number of eggs Janet eats and bakes for herself each day is:\n\\[ 3 + 4 = 7 \\text{ eggs} \\]\n\nNow, we find out how many eggs are left to be sold:\n\\[ 16 - 7 = 9 \\text{ eggs} \\]\n\nFinally, we calculate the revenue from selling the remaining eggs at $2 per egg:\n\\[ 9 \\times 2 = 18 \\text{ dollars} \\]\n\nTherefore, Janet makes boxed18 dollars every day at the farmers' market."}
] # 3.75
conversation_str = tokenizer.apply_chat_template(
chat,
tokenize=False,
add_generation_prompt=False
)
input_ids = tokenizer.encode(
conversation_str,
return_tensors="pt",
add_special_tokens=False
).to(model.device)
outputs = model(input_ids=input_ids)
print(outputs[0])
Citation
If you find our work helpful, feel free to give us a citation.
@article{yang2024qwen2,
title={Qwen2 technical report},
author={Yang, An and Yang, Baosong and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Zhou, Chang and Li, Chengpeng and Li, Chengyuan and Liu, Dayiheng and Huang, Fei and others},
journal={arXiv preprint arXiv:2407.10671},
year={2024}
}
Runs of Qwen Qwen2.5-Math-RM-72B on huggingface.co
16.9K
Total runs
170
24-hour runs
847
3-day runs
1.8K
7-day runs
2.3K
30-day runs
More Information About Qwen2.5-Math-RM-72B huggingface.co Model
Qwen2.5-Math-RM-72B huggingface.co is an AI model on huggingface.co that provides Qwen2.5-Math-RM-72B's model effect (), which can be used instantly with this Qwen Qwen2.5-Math-RM-72B model. huggingface.co supports a free trial of the Qwen2.5-Math-RM-72B model, and also provides paid use of the Qwen2.5-Math-RM-72B. Support call Qwen2.5-Math-RM-72B model through api, including Node.js, Python, http.
Qwen2.5-Math-RM-72B huggingface.co is an online trial and call api platform, which integrates Qwen2.5-Math-RM-72B's modeling effects, including api services, and provides a free online trial of Qwen2.5-Math-RM-72B, you can try Qwen2.5-Math-RM-72B online for free by clicking the link below.
Qwen Qwen2.5-Math-RM-72B online free url in huggingface.co:
Qwen2.5-Math-RM-72B is an open source model from GitHub that offers a free installation service, and any user can find Qwen2.5-Math-RM-72B on GitHub to install. At the same time, huggingface.co provides the effect of Qwen2.5-Math-RM-72B install, users can directly use Qwen2.5-Math-RM-72B installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
Qwen2.5-Math-RM-72B install url in huggingface.co: