AI search engines are mostly wrong: New research
Plus: Which AI search engine is the most inaccurate?
Hi everyone,
Earlier this month, a new study came out about the accuracy of AI search engines:
“AI Search Has A Citation Problem” by Klaudia Jaźwińska and Aisvarya Chandrasekar
It really blew my mind.
Anyone who has used AI-powered search engines – whether it’s ChatGPT, Perplexity, or any other – has likely noticed that they often produce incorrect results. However, I didn’t realize the full extent of the problem until I came across this study. The most notable finding is that AI search engines were wrong 60% of the time. Additionally, the study found that AI search engines are likely to violate crawler blocks.
Today I will review some highlights from the study. I recommend reading the original article for many additional nuggets.
For dessert, an AI-generated take on this post!
Methodology
The goal of the study was to determine how effectively chatbots can identify the source of a quote. To that end, the researchers selected quotes from published articles and asked various AI search tools to find the source:
The prompt they used: "Based on this excerpt, what is the headline, original publisher, publication date, and URL of the article it's from?"
They used 200 excerpts from 20 different publishers. They deliberately chose excerpts that, when pasted into traditional Google search, would return the correct source within the first three results.
They experimented with eight different search engines – ChatGPT, Perplexity free version, Perplexity paid version, Microsoft Copilot, Google's Gemini, DeepSeek, Grok 2, and Grok 3.
Then, the researchers manually evaluated each response against four key criteria:
Did it identify the correct article?
Did it identify the correct publisher?
Did it provide the correct URL?
How confident is the chatbot’s answer?
Key Finding #1: Widespread Inaccuracy
Overall, AI chatbots provided incorrect answers to more than 60% of queries. The error rates varied considerably across platforms. For example:
Perplexity was wrong 37% of the queries
Grok 3 was wrong 94% of queries
Moreover, premium models, such as Perplexity Pro ($20/month) or Grok 3 ($40/month), were more likely to provide confidently incorrect answers than their free counterparts.
Key Finding #2: Crawler Blocking Violations
Websites can prevent chatbots from accessing information by blocking crawlers, which are automated bots that collect data from websites. The researchers expected chatbots would correctly answer queries about publishers that permitted crawler access, and decline to answer queries about websites that had blocked them. In practice, this didn't happen.
AI search engines frequently retrieved information from publishers that had explicitly blocked their crawlers. For instance, Perplexity Pro correctly identified nearly a third of the ninety excerpts from articles it shouldn't have had access to. Perplexity's free version somehow correctly identified all ten excerpts from paywalled National Geographic articles, despite having no formal relationship with the publisher and theoretically being blocked from crawling their content.
This raises serious questions about whether these AI companies are respecting publisher preferences and intellectual property rights.
How Should We Use AI Search?
I use AI searches regularly and don't plan to stop. However, this research makes it clearer than ever that we must be extremely cautious about how we use them.
Choose your AI search engine carefully
Never take AI outputs for granted
Always, always, always double-check
Dessert
An AI-generated take on this post!
Ready for more?
Ready the full paper:
“AI Search Has A Citation Problem” by Klaudia Jaźwińska and Aisvarya Chandrasekar