Austin Deep Learning Meetup: Scaling Test-Time Compute + LLMs with Function Calling
This will be a journal club event
Two Talks:
1. Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Link to Paper
2. LLMs + Function Calling
Speakers
Matthew Gunton of Amazon
Orlando Kalossakas of Toolhouse.ai
Abstract
1. Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of inference-time computation in LLMs, with a focus on answering the question: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? Answering this question has implications not only on the achievable performance of LLMs, but also on the future of LLM pretraining and how one should tradeoff inference-time and pre-training compute. Despite its importance, little research attempted to understand the scaling behaviors of various test-time inference methods. Moreover, current work largely provides negative results for a number of these strategies. In this work, we analyze two primary mechanisms to scale test-time computation: (1) searching against dense, process-based verifier reward models; and (2) updating the model's distribution over a response adaptively, given the prompt at test time. We find that in both cases, the effectiveness of different approaches to scaling test-time compute critically varies depending on the difficulty of the prompt. This observation motivates applying a "compute-optimal" scaling strategy, which acts to most effectively allocate test-time compute adaptively per prompt. Using this compute-optimal strategy, we can improve the efficiency of test-time compute scaling by more than 4x compared to a best-of-N baseline. Additionally, in a FLOPs-matched evaluation, we find that on problems where a smaller base model attains somewhat non-trivial success rates, test-time compute can be used to outperform a 14x larger model.
Info
Austin Deep Learning Journal Club is group for committed machine learning practitioners and researchers alike. The group typically meets every first Tuesday of each month to discuss research publications. The publications are usually the ones that laid foundation to ML/DL or explore novel promising ideas and are selected by a vote. Participants are expected to read the publications to be able to contribute to discussion and learn from others. This is also a great opportunity to showcase your implementations to get feedback from other experts.
Sponsors:
Capital Factory (Austin, Texas)
Antler
社交媒体聆听
Overcoming the Challenges of Building Agentic AI
Join Daniele Bernardi, CEO of Toolhouse AI and Dan Loman, Data and AI Practitioner at Groq for a technical webinar on the challenges of building agentic AI. They'll provide a high level overview of agentic architecture, highlight the current challenges of building agents (including building tools, prompting, and latency), explain how you can overcome them with Toolhouse AI and Groq, and share a demo. Using Toolhouse for Tool Use with Groq API: https://github.com/groq/groq-api-cookbook/blob/main/toolhouse-for-tool-use-with-groq-api/Groq%20%3C%3E%20Toolhouse.ipynb Groq API Cookbook: https://github.com/groq/groq-api-cookbook To get access and the $150 worth of credits https://join.toolhouse.ai Toolhouse generates an API directly in the onboarding experience. Users can generate new API Keys here: https://app.toolhouse.ai/settings/api-keys Direct link to install Code Interpreter: https://app.toolhouse.ai/store/code_interpreter
Austin Deep Learning Meetup: Scaling Test-Time Compute + LLMs with Function Calling
This will be a journal club event Two Talks: 1. Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Link to Paper 2. LLMs + Function Calling Speakers Matthew Gunton of Amazon Orlando Kalossakas of Toolhouse.ai Abstract 1. Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of inference-time computation in LLMs, with a focus on answering the question: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? Answering this question has implications not only on the achievable performance of LLMs, but also on the future of LLM pretraining and how one should tradeoff inference-time and pre-training compute. Despite its importance, little research attempted to understand the scaling behaviors of various test-time inference methods. Moreover, current work largely provides negative results for a number of these strategies. In this work, we analyze two primary mechanisms to scale test-time computation: (1) searching against dense, process-based verifier reward models; and (2) updating the model's distribution over a response adaptively, given the prompt at test time. We find that in both cases, the effectiveness of different approaches to scaling test-time compute critically varies depending on the difficulty of the prompt. This observation motivates applying a "compute-optimal" scaling strategy, which acts to most effectively allocate test-time compute adaptively per prompt. Using this compute-optimal strategy, we can improve the efficiency of test-time compute scaling by more than 4x compared to a best-of-N baseline. Additionally, in a FLOPs-matched evaluation, we find that on problems where a smaller base model attains somewhat non-trivial success rates, test-time compute can be used to outperform a 14x larger model. Info Austin Deep Learning Journal Club is group for committed machine learning practitioners and researchers alike. The group typically meets every first Tuesday of each month to discuss research publications. The publications are usually the ones that laid foundation to ML/DL or explore novel promising ideas and are selected by a vote. Participants are expected to read the publications to be able to contribute to discussion and learn from others. This is also a great opportunity to showcase your implementations to get feedback from other experts. Sponsors: Capital Factory (Austin, Texas) Antler
Groq Speed Read Newsletter (Video Edition)
Welcome to the video version of our Speed Read newsletter! 0:00 Intro Subscribe to the email version of the newsletter here https://share.hsforms.com/1DlcL2281QmeRPq1FHAL93wqedzp Here are all those links as foretold in the video: 0:13 Flex Tier Beta https://console.groq.com/docs/service-tiers 1:07 Sourcetable https://www.youtube.com/watch?v=Ebr00lZr3JU www.sourcetable.com https://groq.com/groq-customer-use-case-sourcetable-2/?utm_campaign=APEX%20Weekly%20Newsletter&utm_medium=email&_hsenc=p2ANqtz-8uvg2fzqN8E7NUkuO5tgXakvCR5lcEEUKAfVzxYalFgSlVneBHvz3EW-kAqJRT9jqDpDSsoZqNZJCKsEE0bUK520tj9Q&_hsmi=342545649&utm_content=342545649&utm_source=hs_email 1:43 LottieFiles Motion Copilot https://www.youtube.com/watch?v=AhUDhl-25Ao&pp=ygUbbG90dGllIGZpbGVzIG1vdGlvbiBjb3BpbG90 https://lottiefiles.com/groq 2:21 Toolhouse https://console.groq.com/docs/toolhouse https://toolhouse.ai 3:00 Hemanth HM https://github.com/hemanth/notebooks/blob/main/notebooks/crewai_code_review_agents.ipynb 3:31 Groq Dev Meetup Feb 13, 2025 https://lu.ma/6pk5espk 3:44 Getting Started on Groq Console https://youtu.be/Ig7esRBhFPY Join our Discord https://discord.gg/invite/groq
总共有 8 条社交媒体数据需要解锁才能查看