At the 2026 India AI Impact Summit in New Delhi, a landmark initiative named the New Delhi Frontier Model Voluntary Commitments was launched, calling on AI model providers to perform multilingual evaluations of their systems. This nonbinding requirement marks a significant step toward addressing the uneven performance of large language models (LLMs) across the world’s approximately 7,000 languages.
Multilingual evaluations aim to ensure AI systems, especially LLMs used in chatbots, healthcare, information services, and decision-making tools, operate effectively beyond primarily English language contexts. The current standard often overlooks the linguistic and contextual diversity of global users, primarily benefiting English speakers while leaving millions underserved.
Unequal AI Performance Across Languages
The International AI Safety Report highlights that AI advancements are “jagged,” with markedly poorer model performance in languages other than English. Existing evaluation benchmarks tend to focus heavily on English, with many widely spoken languages either underrepresented or evaluated with insufficient rigor. Many multilingual evaluations lack domain specificity, failing to consider the cultural and contextual aspects that affect how users interact with AI systems in different languages and regions.
For example, a model designed to provide reproductive health information in Tagalog would need testing tailored to local sociocultural factors in the Philippines rather than generic language proficiency tests. Similarly, health-related AI in Indian languages must consider region-specific issues such as high tuberculosis incidence, which might not be captured by translations of Western-centric evaluation tools.
Calls for Contextual and Inclusive Evaluation Approaches
Experts and advocates stress that evaluations should involve local language speakers, subject matter experts, and prospective users to ensure multilingual AI systems are contextually relevant and culturally accurate. Participation from communities in designing and conducting evaluations enhances the alignment of AI outputs with real-world needs.
Initiatives like Microsoft’s collaboration with Accredited Social Health Activists (ASHAs) in India exemplify this approach. Their ‘Samiksha’ evaluation suite integrates localized health language and user preferences, improving model assessment in healthcare contexts.
Challenges and Next Steps for AI Transparency and Inclusivity
The voluntary commitments at the summit face challenges due to the limited participation of independent evaluators and the reliance of AI companies on internal testing. Current policies often restrict independent research, and companies tend to select their own evaluators, which could limit transparency and accountability.
To advance equitable AI, experts urge the establishment of independent, arm’s-length evaluation processes involving diverse stakeholders including public interest technologists, civil society, human rights organizations, and regulators. They also call for public disclosure of evaluation outcomes and clear feedback loops from AI labs to affected communities.
Why it matters
As AI systems become increasingly embedded in essential services globally, ensuring they perform effectively across all languages is vital for equitable access. Without robust multilingual evaluations and inclusive participation in AI development, there is a risk of exacerbating the digital language divide, leaving vast populations with inferior or biased AI tools.
Background
Large language models have traditionally been trained on vast datasets dominated by English and a few other major languages, resulting in uneven language coverage. Recent research and advocacy have highlighted the need for multilingual and culturally informed evaluation methods. The 2026 New Delhi commitments represent the first coordinated effort encouraging AI providers to acknowledge and address this issue in voluntary frameworks.
Read more Digital Policy stories on Goka World News.
Sources
This article is based on reporting and publicly available information from the following source:
