Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Harvard University’s School of Engineering Applied Sciences have developed a method to enhance artificial intelligence (AI) agents’ ability to ask more effective questions through a “Collaborative Battleship” game. This innovation enables AI models to search uncertain environments more efficiently, a crucial step toward applying AI in high-stakes fields like medical diagnosis and scientific research.
What happened
The research team adapted the classic game Battleship into a collaborative task where one participant, the “captain,” asks yes-no questions to locate hidden ships, while a “spotter” answers in real time. Using this framework, the researchers collected a dataset of human-generated questions and answers (“BattleshipQA”), setting a benchmark for evaluating AI question-asking strategies.
They tested state-of-the-art language models (LMs), including GPT-5 and smaller models like Llama 4 Scout, on the task. Initially, larger models outperformed humans by completing the game with fewer guesses, while smaller models struggled due to less effective questioning.
To address this, the researchers implemented a Monte Carlo inference strategy that treats potential guesses as probabilistic particles, dynamically adjusted with each answer. This approach helped AI models ask more informative questions, substantially improving performance.
Notably, Llama 4 Scout’s win rate increased from 8% to 82% against human players after integrating these techniques. The model also surpassed GPT-5’s performance at a fraction of the computational cost. Additionally, converting questions into executable Python code allowed AI spotter models to verify answers more accurately, boosting answer accuracy by an average of 15%.
The team further validated their approach on the game “Guess Who?”, improving the smaller LMs’ success rates from 30% to over 72%, and GPT-4o’s from 62% to 90%, demonstrating the method’s generalizability.
Why it matters
This research advances AI’s capacity for strategic inquiry and information gathering, essential for navigating complex, uncertain environments where solutions must be found among many possibilities—for example, in scientific discovery and diagnosis. Improved questioning enables AI to reduce the guesswork and accelerate problem solving while working with limited data.
The methodology also offers a cost-effective way to enhance smaller AI models, making sophisticated cognitive capabilities more accessible where computational resources are constrained.
Background
Language models, which drive today’s AI applications in tasks like customer support and coding, typically focus on answering predetermined questions rather than generating effective questions themselves. Effective inquiry is vital in uncertain settings where answers are not directly available, requiring AI to explore and eliminate possibilities efficiently.
Previous work has shown benefits from “auto-formalization,” where AI generates code to verify answers. This study extends that concept to improve how AI asks questions by integrating probabilistic reasoning and executable code validation into the questioning process.
The research, supported by several institutions including MIT’s Siegel Family Quest for Intelligence and the MIT-IBM Watson AI Lab, was presented at the International Conference on Learning Representations (ICLR) in April 2026.
Sources
This article is based on reporting and publicly available information from the following source:
Read more Artificial Intelligence stories on Goka World News.
