AI Chatbots Are Still Bad at Facts, Says BBC Study



A recent study by the BBC found major inaccuracies in the news answers provided by AI assistants. Researchers tested four well-known AI assistants—ChatGPT, Copilot, Gemini, and Perplexity—by allowing them to use the BBC website for their responses.

The study revealed that those popular AI assistants often gave incorrect information and distorted the facts in their replies. A set of 100 questions was asked to AI assistants about different trending news topics. BBC journalists evaluated the answers based on seven criteria: accuracy, giving proper sources, being unbiased, distinguishing between opinion and fact, avoiding personal viewpoints, providing context, and including BBC content appropriately.

The evaluation found that over half (51%) of the responses had significant problems in these areas. Additionally, 91% of the answers had at least some inaccuracies. With something like news, being even a little bit wrong is a big deal.

Many of the mistakes were due to incorrect facts. About 19% of answers that mentioned BBC content included errors like wrong statements, numbers, or dates. Also, 13% of quotes supposedly from BBC articles were either changed or didn’t come from the original source. Some examples include AI assistants incorrectly stating the status of former politicians, misrepresenting NHS advice on vaping, and misreporting conflicts in the Middle East.

The study pointed out issues with how sources were used and the context provided. AI assistants often picked old articles or current web pages as their sources, which caused some inaccuracies. Sometimes, the information was correct but was wrongly credited to the BBC. Moreover, the AI didn’t provide enough context in its responses, which led to misunderstandings.

Apparently, the AI had trouble telling the difference between opinion and fact. This led to it often treating opinions as facts, and it frequently left out important context, resulting in biased or incomplete answers.

What’s most interesting is the research found that different AI assistants have different problems. For example, Gemini had the most issues with accuracy and also struggled to provide reliable sources. Both Copilot and Perplexity had difficulties accurately representing BBC content. This inadvertently proves AI from different companies aren’t interchangeable, and the quality of one may be better than the others—yet they’re still not as good as humans.

One big topic that was brought up in the study was the big concern about how easily this wrong information can spread on social media. The BBC’s research shows that AI assistants aren’t reliable for accurate news reporting right now. Though many of them warn users that there could be mistakes, there’s no system in place to fix errors as traditional news outlets do.

The BBC is calling for more control over how AI companies use their content, more transparency about how AI works, and a better understanding of the inaccuracies that can occur. The BBC plans to repeat this study in the future to see if things improve and may also include other publishers and media organizations in their research.

Source: BBC



Source link

Previous articleWindows 11’s February update brings some welcome improvements