Having analyzed about 200 quotes from 20 different publications, Columbia’s Tow Center for Digital Journalism discovered that ChatGPT search struggles to attribute sources correctly.
Researchers asked ChatGPT search to find the sources of each quote. The survey found that some responses were correctly attributed to the right online sources, but others were plagued with inaccuracies. Concretely, more than a third of queries used fabricated source material instead of clarifying that the correct quote source couldn’t be found, or the chatbot was blocked from retrieving it through the robots.txt file. Making matters worse, ChatGPT incorrectly attributed stories from partner publications and those with no existing licensing deals with OpenAI, at times even plagiarizing news content.
The results don’t come as a surprise to anyone who has spent some time with chatbots like ChatGPT, which continue to hallucinate and serve a combination of fact and misinformation. Launched in October, ChatGPT search promises to let you search the web “in a much better way than before,” providing “fast, timely answers with links to relevant web sources, which you would have previously needed to go to a search engine for.”
A spokesperson for OpenAI has downplayed the report, taking issues with the testing methods. “We’ve collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt,” the spokesperson said. “We’ll keep enhancing search results.”
OpenAI’s ChatGPT search function gathers data in the same way Google or any classic search engine does. Its crawlers download and index content from all over the web, ignoring sites that block crawlers with robots.txt files. In search results, ChatGPT provides links to relevant web sources, such as news articles and blog posts, for additional context.
Some publications, like The New York Times (which dragged OpenAI and Microsoft to court over copyright violations), have opted out. With many others, including the European media giant Axel Springer (which owns Politico), OpenAI has commercial licensing agreements to permit its web crawlers to sift through their massive journalistic archives.
If you had any illusions that ChatGPT results were 100% trustworthy or that ChatGPT could replace traditional web search, this is your wake-up call. I’m not saying chatbots don’t have utility. They can be helpful, but only as an additional tool in your arsenal. What you shouldn’t do is trust AI-powered search to attribute sources or serve factual information. If you must use such a tool in your next project, be sure to always verify the answers to ensure they’re correct (which defeats the primary allure of AI-driven search).