Large language models (LLMs) are capable of generating text that is grammatically flawless, stylistically convincing and semantically rich. While this technological leap has brought efficiency gains to journalism, education and business communication, it has also complicated the detection of misinformation. How do you identify fake news when even experts struggle to distinguish artificial intelligence (AI)-generated content from human-authored text? 

This question was central to a recent symposium in Amsterdam on disinformation and LLMs, hosted by CWI, the research institute for mathematics and computer science in the Netherlands, and co-organised with Utrecht University and the University of Groningen. International researchers gathered to explore how misinformation is evolving and what new tools and approaches are needed to counter it. 

Among the organisers was CWI researcher Davide Ceolin, whose work focuses on information quality, bias in AI models and the explainability of automated assessments. The warning signs that once helped identify misinformation – grammatical errors, awkward phrasing and linguistic inconsistencies – are rapidly becoming obsolete as AI-generated content becomes indistinguishable from human writing.  

This evolution represents more than just a technical challenge. The World Economic Forum has identified misinformation as the most significant short-term risk globally for the second consecutive year, with the Netherlands ranking it among its top five concerns through 2027. The sophistication of AI-generated content is a key factor driving this heightened concern, presenting a fundamental challenge for organisations and individuals alike.

For years, Ceolin’s team developed tools and methods to identify fake news through linguistic and reputation patterns, detecting the telltale signs of content that characterised much of the early misinformation.  

Their methods make use of natural language processing (NLP), with colleagues from the Vrije Universiteit Amsterdam; logical reasoning, with colleagues from the University of Milan; and human computation (crowdsourcing, with colleagues from the University of Udine, University of Queensland, and Royal Melbourne Institute of Technology), and help identify suspicious pieces of text and check their veracity. 

Game changer

The game has fundamentally changed. “LLMs are starting to write more linguistically correct texts,” said Ceolin. “The credibility and factuality are not necessarily aligned – that’s the issue.”

Traditional markers of deception are disappearing just as the volume, sophistication and personalisation of generated content increase exponentially.  

Tommy van Steen, a university lecturer in cyber security at Leiden University, explained the broader challenge facing researchers. At a recent interdisciplinary event organised by Leiden University – the Night of Digital Security, which brought together experts from law, criminology, technology and public administration – he noted: “Fake news as a theme or word really comes from Trump around the 2016 elections. Everything he disagreed with, he simply called fake news.” 

However, Van Steen said the problem extends far beyond blatant fabrications. “It’s important to distinguish between misinformation and disinformation,” he said. “Both involve sharing information that isn’t correct, but with misinformation, it’s accidental; with disinformation, it’s intentional.” 

Beyond linguistic analysis

For researchers like Ceolin, the implications of AI-generated content extend far beyond simple text generation. Recent research from his team (in collaboration with INRIA, CWI’s sister institute in France) – accepted in the findings of the flagship computational linguistics conference, ACL – revealed how LLMs exhibit different political biases depending on the language they’re prompted in and the nationality they’re assigned. When the same model answered identical political compass questions in different languages or while embodying different national personas, the results varied significantly. 

Van Steen’s work highlights that misinformation isn’t simply a binary of true versus false content. He employs a seven-category framework ranging from satire and parody through to completely fabricated content.

“It’s not just about complete nonsense or complete truth – there’s actually quite a lot in-between, and that can be at least as harmful, maybe even more harmful,” he said.

However, Ceolin argued that technological solutions alone are insufficient. “I think it’s a dual effort,” he said. “Users should cooperate with the machine and with other users to foster identification of misinformation.”  

The approach represents a significant shift from purely automated detection to what Ceolin called “transparent” systems, which provide users with the reasoning behind their assessments. Rather than black-box algorithms delivering binary verdicts, the new generation of tools aims to educate and empower users by explaining their decision-making process. 

Content farming and micro-targeting concerns

The symposium at CWI highlighted three escalation levels of AI-driven misinformation: content farming, LLM vulnerabilities and micro-targeting.

Ceolin identified content farming as the most concerning. “It’s very easy to generate content, including content with negative intentions, but it’s much harder for humans to detect fake generated content,” he said.  

Van Steen highlighted a fundamental asymmetry that makes detection increasingly challenging. “One of the biggest problems with fake news is this disconnect – how easy it is to create versus how difficult and time-consuming it is to verify,” he noted. “You’re never going to balance that equation easily.”

The challenge intensifies when sophisticated content generation combines with precision targeting. “If bad AI-generated content effectively targets a specific group of users, it’s even harder to spot and detect,” said Ceolin.  

Tackling this new generation of sophisticated misinformation requires a fundamental rethinking of detection methodologies. He advocates for explainable AI systems that prioritise transparency over pure accuracy metrics. When asked to justify choosing an 85% accurate but explainable system over a 99% accurate black box, he poses a crucial counter-question: “Can you really trust the 99% black box model 99% of the time?” 

The 1% inaccuracy in black box models could present systematic bias beyond random error, and without transparency, organisations cannot identify or address these weaknesses. “In the transparent model, you can identify areas where the model could be deficient and target specific aspects for improvement,” said Ceolin.

This philosophy extends to the broader challenge of assessing AI bias. “We are now looking at whether we can benchmark and measure the bias of these models so that we can help users understand the quality of information they receive from them,” he said. 

Preparing for an uncertain future

For organisations grappling with the new landscape, Ceolin’s advice emphasised the fundamentals. “We shouldn’t forget that all the technology we’ve developed so far can still play a big role,” he said.

Even as LLMs become more sophisticated, traditional verification approaches remain relevant. 

“These LLMs, in several cases, also show the sources they use for their answers,” said Ceolin. “We should teach users to look beyond the text they receive as a response to check that these really are the sources used, and then check the reputation, reliability and credibility of those sources.” 

The future requires what the CWI researcher describes as a “joint effort” involving companies, citizens and institutions. “We as researchers are highlighting the issues and risks, and proposing solutions,” he said.

“It will be fundamental for us to help citizens understand the benefits but also the limitations of these models. The last judgement should come from users – but informed users, supported by transparent tools that help them understand not just what they’re reading, but why they should trust it.”



Source link

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *