As regulators on both sides of the Atlantic push Google to share anonymous search data with competitors, DuckDuckGo has put forward a detailed counterproposal to Google’s current anonymization approach, arguing it unnecessarily excludes most queries and severely limits data utility.
Google’s Current Anonymization Limits Competitor Access
Regulators in the European Union and the United States require Google to share anonymized search data to help rival search engines improve their ranking algorithms. The European Union’s Digital Markets Act mandates Google to provide anonymous “ranking, query, click and view data,” while a U.S. court has ordered Google to share raw “user-side data.”
Google’s response has been criticized as overly restrictive. Its European Search Dataset Licensing Program excludes queries unless at least 30 signed-in users worldwide have made the query in the prior 13 months and the same query-result-device-country pair appears from at least five unique users in a quarter. This policy eliminates an estimated 99% of distinct queries and roughly 42% of total query volume, retaining only the most common searches that competitors already observe.
An expert witness for the U.S. Department of Justice described Google’s method as one designed “if you didn’t want to release high utility data.” Google’s assumption that rarity equates to identifiability leads to removing many rare but non-identifying queries, limiting the dataset’s usefulness.
DuckDuckGo’s Alternative: Targeted and Proportionate Anonymization
DuckDuckGo proposes a more nuanced pseudonymization method combined with three specific filtering steps, resulting in a dataset that excludes fewer than 5% of distinct queries on their sample data. Their method involves pseudonymizing user identifiers and rounding timestamps to a 24-hour window, which prevents known reidentification attacks.
The first filter targets personal information such as email addresses, phone numbers, and credit card details, which appear in only about 1% of queries. The second filter removes queries containing rare or unknown words by tokenizing strings and cross-checking against an extensive search dictionary, thus preserving rare queries made of common words. The third filter applies conservative k-anonymity thresholds (1,000 users) to metadata like device type and location, generalizing attributes to meet this minimum user count.
When applied together, these steps maintain much higher data utility while addressing privacy risks. DuckDuckGo estimates that at Google’s volume, only about 2% of queries would be filtered under their method.
Regulatory Context and Challenges
European Commission and European Data Protection Board guidelines emphasize that anonymization should remove queries containing identifiable personal data while preserving those referencing public information. Effective anonymization is viewed as a risk-management framework combining technical, legal, and organizational controls, rather than purely a technical filtering exercise.
Efforts like differential privacy, though promising in some contexts, have proven less suited to complex search query data due to trade-offs between utility and privacy guarantees, as illustrated by the high epsilon values used in recent Census Bureau releases.
Why it matters
Google’s restrictive anonymization could stifle competition by withholding valuable search data that smaller rivals need to refine their algorithms. DuckDuckGo’s approach shows a feasible middle ground that maintains user privacy while enabling data sharing crucial for competitive innovation. As regulators continue reviewing Google’s compliance with data sharing mandates under the Digital Markets Act and U.S. court orders, this debate over anonymization standards will shape the future of search market fairness and user privacy protections.
Read more AI Regulation stories on Goka World News.
Sources
This article is based on reporting and publicly available information from the following source:
