
Friday Feb 28, 2025
Testing Large Language Models using Using Multi-Agents? Talking Robots EP5
Todays in Robots Talking - This paper introduces Multi-Agent Verification (MAV), a novel method to improve large language model performance at test time by using multiple verifiers to evaluate candidate outputs. The authors propose Aspect Verifiers (AVs), off-the-shelf LLMs that check different aspects of the outputs, as a practical way to implement MAV. The algorithm, BoN-MAV, combines best-of-n sampling with these AVs, selecting the output with the most approvals from the verifiers. Experiments show that MAV improves performance across various tasks and models and scales effectively by increasing either the number of candidate outputs or the number of verifiers. The study also demonstrates that MAV enables weak-to-strong generalization, where smaller, weaker models can verify the output from stronger LLMs, and even self-improvement, using the same model for generation and verification.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.