Google’s AI Overviews is still wrong 10% of the time, and that’s a lot of lies

Google’s AI Overviews is still wrong 10% of the time, and that’s a lot of lies

7 0 0

Remember when Google launched AI Overviews in 2024 and it told people to eat glue on pizza? That was a rough launch. Since then, Google has been quietly patching things up, and by most accounts, the system has gotten better. But “better” is a low bar when your starting point is “occasionally dangerous nonsense.”

A new analysis from The New York Times, done with help from a startup called Oumi, tried to put a number on how accurate AI Overviews actually is. The result: about 90 percent correct. That sounds decent until you realize the other 10 percent is wrong. And for Google, which handles billions of searches a day, 10 percent is a staggering number of lies.

The test used the SimpleQA benchmark, which OpenAI released back in 2024. It’s basically a list of over 4,000 questions with clear, verifiable answers. Oumi started running these tests when Gemini 2.5 was still Google’s flagship model, and the accuracy came in at 85 percent. After the Gemini 3 update, that number climbed to 91 percent.

So Google is moving in the right direction. But let’s be real: even 91 percent accuracy means nearly 1 in 10 answers is wrong. If you extrapolate that across all Google searches, you’re looking at tens of millions of incorrect answers per day. Hundreds of thousands per hour. That’s not a rounding error; that’s a firehose of misinformation.

I’ve been watching this space for years, and what bothers me isn’t that AI models make mistakes. They all do. It’s that Google has positioned AI Overviews as the default experience for hundreds of millions of users. You can’t opt out easily. And when the system is wrong, it’s often confident and detailed, which makes it harder to spot the error.

Oumi is an interesting choice for this analysis. They’re not exactly neutral—they build AI models themselves—but the SimpleQA test is standardized enough that the methodology holds up. The Times also had access to the raw data, so this isn’t just a vendor’s marketing claim.

What I’d really like to see is a breakdown of what kinds of questions AI Overviews gets wrong. Is it bad at niche topics? Recent events? Subjective stuff? The Times analysis didn’t go that deep, but it’s the obvious next step. Until then, I’ll keep double-checking everything AI Overviews tells me, especially if it involves pizza toppings.

Comments (0)

Be the first to comment!