MIT Develops AI Training Method to Improve Confidence...

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have created a training method that enables artificial intelligence models to produce calibrated confidence estimates along with their answers, addressing a common problem of AI overconfidence. The new technique, named Reinforcement Learning with Calibration Rewards (RLCR), improves how AI systems express uncertainty without compromising their accuracy.

Current state-of-the-art AI reasoning models often provide answers with unwarranted certainty, regardless of how confident or uncertain they truly are. This overconfidence stems from the reinforcement learning methods used during training, which reward correct answers and penalize incorrect ones but offer no incentive to express uncertainty. As a result, AI systems learn to respond with absolute confidence even when guessing.

RLCR introduces a modification to this training by incorporating the Brier score into the reward function. The Brier score penalizes the discrepancy between a model’s stated confidence and its actual correctness, encouraging the AI to develop both accurate answers and reliable confidence estimates. During training, models are rewarded for not only correctness but also for well-calibrated confidence, avoiding overly confident incorrect answers and unnecessarily uncertain correct responses.

Experimental results on a 7-billion-parameter model tested across multiple question-answering and mathematics benchmarks show that RLCR reduced calibration errors by up to 90 percent while maintaining or improving task accuracy. These improvements held true even on six datasets that the model had never encountered during training. The approach also outperforms post-training calibration methods that assign confidence scores after the fact.

The researchers further demonstrated that using the model’s confidence scores during inference leads to better accuracy and calibration. For example, selecting answers with higher self-reported confidence or applying confidence-weighted majority voting improves overall system reliability. Additionally, explicit reasoning about uncertainty embedded in the model’s outputs proved useful in improving the performance of classifiers analyzing the AI’s answers, especially for smaller models.

Mehul Damani and Isha Puri, PhD students and co-lead authors, highlighted that standard reinforcement learning not only fails to promote proper confidence calibration but actively worsens it as models become more capable yet more overconfident. The new RLCR method reverses this trend, supporting safer and more trustworthy AI predictions.

Why it matters

AI systems are increasingly used in critical areas such as healthcare, law, and finance, where decisions rely heavily on automated outputs. Overconfident AI that cannot reliably express uncertainty poses risks because users may not recognize when to seek further verification. RLCR’s ability to produce well-calibrated confidence estimates enhances the transparency and safety of AI-assisted decision-making, potentially reducing the reliance on misleadingly confident but incorrect answers.

Background

Reinforcement learning has driven recent AI breakthroughs by incentivizing correct task completion, but it traditionally neglected confidence calibration. Previous attempts to fix overconfidence involved post-hoc techniques that assess uncertainty after the model produces an answer. RLCR represents a novel approach by integrating confidence calibration directly into the training objective, aligning accuracy with reliable uncertainty assessment in AI reasoning models.

The findings will be presented at the upcoming International Conference on Learning Representations, marking a significant advance in AI reliability and interpretability research led by a team that includes Jacob Andreas and Yoon Kim as senior authors.

Read more Science Discoveries stories on Goka World News.

Sources

This article is based on reporting and publicly available information from the following source:

news.mit.edu

MIT Develops AI Training Method to Improve Confidence Calibration

Why it matters

Background

Sources

Giorgio Kajaia

Why it matters

Background

Sources

More Science Discoveries coverage

Giorgio Kajaia

Share this article