Technology

Is ChatGPT having “robot dreams”? AI is hallucinating and producing incorrect information and experts don’t know why

You may want to rethink relying too much on AI for answering questions or summarizing information as newer systems are “hallucinating” more.

ChatGPT dreaming of more and more electric sheep
Dado Ruvic
Greg Heilman
Update:

The beauty of computers, depending on what they are doing, is that they work 24/7. But perhaps just like their human counterparts, the lack of rest is having an adverse effect on their effectiveness.

The New York Times reports that as artificial intelligence becomes more powerful, so too do these machines’ hallucinations, a studied consequence of sleep deprivation. The creators of the powerful AI tools like ChatGPT, and companies that are trying to help them get to the root cause of the hallucinations, don’t know why it’s happening.

ChatGPT dreaming of more and more electric sheep

The problem has been known about since OpenAI, creator of ChatGPT, and other companies unleashed their AI systems into the wild. “Despite our best efforts, they will always hallucinate. That will never go away,” Amr Awadallah, a former Google executive and CEO of Vectara, a start-up that builds AI tools for businesses told the Times.

Awadallah’s company has been tracking how often chatbots are inventing their own truth since 2023. Vectara found that when asked to summarize specific news articles, chatbots pulled out of thin air information at least 3% of the time, but sometimes up to 27% of the time.

Companies like Google and OpenAI have been able to get their numbers down to between 1% and 2%, but others have had less luck getting into that range. What’s problematic is that the newer “reasoning systems” are pumping out more errors than earlier systems.

OpenAI, through its own research found that its most powerful system currently, o3, hallucinated twice as much, 33% of the time, as the previous system when performing the company’s PersonQA. That benchmark test asks the chatbot to answer questions about public figures.

The new o4-mini did even worse, hallucinating 48% of the time. When running the more general question SimpleQA test, o3 and o4-mini hallucinated 51 percent and 79 percent of the time. That compared with 44% for the previous system, o1.

“Hallucinations are not inherently more prevalent in reasoning models,” OpenAI spokeswoman Gaby Raila said. “Though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini. We’ll continue our research on hallucinations across all models to improve accuracy and reliability.

Related stories

Get your game on! Whether you’re into NFL touchdowns, NBA buzzer-beaters, world-class soccer goals, or MLB home runs, our app has it all.

Dive into live coverage, expert insights, breaking news, exclusive videos, and more – plus, stay updated on the latest in current affairs and entertainment. Download now for all-access coverage, right at your fingertips – anytime, anywhere.

Tagged in:

We recommend these for you in Latest news