BBC recently conducted extensive research on the ability of AI-powered chatbots to summarize news stories, including Microsoft Copilot, OpenAI’s ChatGPT, Google’s Gemini, and Perplexity. While safety and security continue to be major deterrents to the progression of generative AI, the high propensity of AI tools generating inaccurate or outrightly wrong responses to queries continues to puzzle users.
The outlet leveraged the listed AI tools to summarize news stories and consequently asked them questions based on the content of the summarized news posts. Interestingly, the research indicated that the AI-generated answers included significant inaccuracies and distortions.
It said the resulting answers contained “significant inaccuracies” and distortions. According to BBC News and Current Affairs CEO Deborah Turness:
“The team found ‘significant issues’ with just over half of the answers generated by the assistants. The AI assistants introduced clear factual errors into around a fifth of answers they said had come from BBC material.
And where AI assistants included ‘quotations’ from BBC articles, more than one in ten had either been altered, or didn’t exist in the article.
Part of the problem appears to be that AI assistants do not discern between facts and opinion in news coverage; do not make a distinction between current and archive material; and tend to inject opinions into their answers.
The results they deliver can be a confused cocktail of all of these – a world away from the verified facts and clarity that we know consumers crave and deserve.”
For context, BBC’s study entailed tasking the AI chatbots to summarize 100 news stories published by the outlet. The publication leveraged the expertise of its seasoned reporters and journalists across a wide range of topics to rate the AI-generated answers based on accuracy.
Perhaps more concerning, 51% of the AI-generated answers featured “significant issues of some form.” The study also revealed that 19% of the answers generated by the chatbots citing BBC content introduced factual errors, including figures, statements, and dates.
While speaking to BBC about its findings, an OpenAI spokesman indicated:
“We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution.”
Can I trust AI-generated news summaries? Apple already pulled the plug
“We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm?” BBC’s Turness rhetorically asked.
BBC’s research further revealed that Copilot and Gemini featured more complex issues compared to ChatGPT and Perplexity. Further highlighting that the tools “struggled to differentiate between opinion and fact, editorialized, and often failed to include essential context”.
Related: Microsoft Copilot is the poster child of ‘an annoying kid in class’
It’s worth noting that the inaccuracy highlighted in the intricate report cast its net beyond the listed chatbots. As you may know, Apple recently temporarily pulled the plug on its Apple Intelligence notifications after the tool was spotted sharing erroneous headlines, prompting backlash from news organizations and freedom groups.
In conclusion, BBC recommends a “pull back” on AI news summaries, pending a much-needed conversation with AI service providers. “We can work together in partnership to find solutions”.