Federal Agencies’ Choice of AI Vendors Affects Policy...

Federal agencies adopting large language models (LLMs) to analyze policy documents may encounter significant variations in how these AI systems interpret and prioritize key policy dimensions, according to recent research. The differences highlight risks in relying on AI for administrative and high-impact government functions.

The study compared commercial AI models, including OpenAI’s ChatGPT, Anthropic’s Claude, and Meta’s Grok, by applying an expert-informed analytical framework across multiple policy domains such as national security, civil rights, and economic resilience. Using 91 policy documents plus nearly 900 additional governance texts, researchers found consistent differences in how each model weighted various policy aspects.

For example, when analyzing the Biden administration’s Framework for Artificial Intelligence Diffusion—a national security policy regulating foreign access to advanced AI systems—ChatGPT and Claude identified multiple embedded dimensions like cybersecurity safeguards and compliance audits. Grok, however, primarily focused on export controls and often failed to recognize these secondary safety measures.

The study noted Grok concentrated over 70% of its analysis on a single policy dimension for nearly two-thirds of the documents, frequently overlooking the multifaceted nature of policies. In contrast, ChatGPT and Claude more frequently acknowledged that policies pursue multiple goals simultaneously.

Similar patterns appeared in LLMs developed outside the U.S., such as China’s DeepSeek and Kimi, which also tended to assign zero significance to policy dimensions they considered irrelevant. This suggests that model-specific variation is widespread and not limited to a particular country or vendor.

Researchers emphasized that AI outputs can vary with changes in prompting, settings, and model updates. Thus, the documented differences may evolve over time, complicating efforts to treat any single AI system as a definitive analytic baseline.

Why it matters

As federal agencies increasingly adopt AI for tasks ranging from document summarization to complex policy analysis and enforcement, these findings reveal that vendor and model choice can materially affect government decision-making. Misinterpretation or oversimplification of policy content poses risks for regulatory compliance, civil rights protections, and national security.

The research underscores the importance of agencies documenting the exact model and version used, the prompts issued, and performing ongoing behavioral testing to ensure reliable AI-assisted analysis. Such practices are vital to understanding what the AI “sees” or overlooks within policy texts and maintaining accountability in government AI use.

While protests highlight ethical concerns over military applications of AI, much of government AI deployment is less visible yet equally consequential. Managing vendor choices and model behavior in these contexts is a practical step to mitigate unintended consequences.

Sources

This article is based on reporting and publicly available information from the following source:

techpolicy.press

Read more AI Regulation stories on Goka World News.

Federal Agencies’ Choice of AI Vendors Affects Policy Interpretation

Why it matters

Sources

Giorgio Kajaia

Why it matters

Sources

More AI Regulation coverage

Giorgio Kajaia

Share this article