Oslo-based startup Iris.ai has developed an AI Chat feature for its Researcher Workspace platform which it says can reduce ‘AI hallucinations’ to single-figure percentages.
What Are AI Hallucinations?
AI hallucinations, also known as ‘adversarial examples’ or ‘AI-generated illusions,’ are where AI systems generate or disseminate information that is inaccurate, misleading, or simply false. The fact that the information appears convincing and authoritative despite lacking any factual basis means that it can create problems for companies that use the information without verifying it.
Examples
A couple of high-profile examples of when AI hallucinations have occurred are:
– When Facebook / Meta demonstrated its Galactica LLM (designed for science researchers and students) and, when asked to draft a paper about creating avatars, the model cited a fake paper from a genuine author working on that subject.
– Back in February, when Google demonstrated its Bard chatbot in a promotional video, Bard gave incorrect information about which satellite first took pictures of a planet outside the Earth’s solar system. Although it happened before a presentation by Google, it was widely reported, resulting in Alphabet Inc losing $100 billion in market value on its shares.
Why Do AI Hallucinations Occur?
There are a number of reasons why chatbots (e.g. ChatGPT) generate AI hallucinations, including:
– Generalisation issues. AI models generalise from their training data, and this can sometimes result in inaccuracies, such as predicting incorrect years due to over-generalisation.
– No ground truth. LLMs don’t have a set “correct” output during training, differing from supervised learning. As a result, they might produce answers that seem right but aren’t.
– Model limitations and optimisation targets. Despite advances, no model is perfect. They’re trained to predict likely next words based on statistics, not always ensuring factual accuracy. Also, there has to be a trade-off between a model’s size, the amount of data it’s been trained on, its speed, and its accuracy.
What Problems Can AI Hallucinations Cause?
Using the information from AI hallucinations can have many negative consequences for individuals and businesses. For example:
– Reputational damage and financial consequences (as in the case of Google and Bard’s mistake in the video).
– Potential harm to individuals or businesses, e.g. through taking and using incorrect medical, business, or legal advice (although ChatGPT passed the Bar Examination and business school exams early this year).
– Legal consequences, e.g. through publishing incorrect information obtained from an AI chatbot.
– Adding to time and workloads in research, i.e. through trying to verify information.
– Hampering trust in AI and AI’s value in research. For example, an Iris.ai survey of 500 corporate R&D workers showed that although 84 per cent of workers use ChatGPT as their primary AI research support tool, only 22 per cent of them said they trust it and systems like it.
Iris.ai’s Answer
Iris.ai has therefore attempted to address these factuality concerns by creating a new system that has an AI engine for understanding scientific text. This is because the company developed it primarily for use in its Researcher Workspace platform (to which it’s been added as a chat feature) so that its (mainly large) clients, such as the Finnish Food Authority can use it confidently in research.
Iris.ai has reported that the inclusion of the system accelerated research on a potential avian flu crisis can essentially save 75 per cent of a researcher’s time (by not having to verify whether information is correct or made up).
How Does The Iris.ai System Reduce AI Hallucinations?
Iris.ai says its system is able to address the factuality concerns of AI using a “multi-pronged approach that intertwines technological innovation, ethical considerations, and ongoing learning.” This means using:
– Robust training data. Iris.ai says that it has meticulously curated training data from diverse, reputable sources to ensure accuracy and reduce the risk of spreading misinformation.
– Transparency and explainability. Iris.ai says using advanced NLP techniques, it can provide explainability for model outputs. Tools like the ‘Extract’ feature, for example, show confidence scores, allowing researchers to cross-check uncertain data points.
– The use of knowledge graphs. Iris.ai says it incorporates knowledge graphs from scientific texts, directing language models towards factual information and reducing the chance of hallucinations. The company says this is because this kind of guidance is more precise than merely predicting the next word based on probabilities.
Improving Factual Accuracy
Iris.ai’s techniques for improving factual accuracy in AI outputs, therefore, hinge upon using:
– Knowledge mapping, i.e. Iris.ai maps key knowledge concepts expected in a correct answer, ensuring the AI’s response contains those facts from trustworthy sources.
– Comparison to ground truth. The AI outputs are compared to a verified “ground truth.” Using the WISDM metric, semantic similarity is assessed, including checks on topics, structure, and vital information.
– Coherence examination. Iris.ai’s new system reviews the output’s coherence, ensuring it includes relevant subjects, data, and sources pertinent to the question.
These combined techniques set a standard for factual accuracy and the company says its aim has been to create a system that generates responses that align closely with what a human expert would provide.
What Does This Mean For Your Business?
It’s widely accepted (and publicly admitted by AI companies themselves) that AI hallucinations are an issue that can be a threat for companies (and individuals) who use the output of generative AI chatbots without verification. Giving false but convincing information highlights both one of the strengths of AI chatbots, i.e. how it’s able to present information, as well as one of its key weaknesses.
As Iris.ai’s own research shows, although most companies are now likely to be using AI chatbots in their R&D, they are aware that they may not be able to fully trust all outputs, thereby losing some of the potential time savings by having to verify as well as facing many potentially costly risks. Although Iris.ai’s new system was developed specifically for understanding scientific text with a view to including it as a useful tool for researchers who use its own site, the fact that it can reduce AI hallucinations to single-figure percentages is impressive. Its methodology may, therefore, have gone a long way toward solving one of the big drawbacks of generative AI chatbots and, if it weren’t so difficult to scale up for popular LLMs it may already have been more widely adopted.
As good as it appears to be, Iris.ai’s new system can still not solve the issue of people simply misinterpreting the results they receive.
Looking ahead, some tech commentators have suggested that methods like using coding language rather than the diverse range of data sources and collaborations with LLM-makers to build larger datasets may bring further reductions in AI hallucinations. For most businesses now, it’s a case of finding the balance of using generative AI outputs to save time and increase productivity while being aware that those results can’t always be fully trusted and conducting verification checks where appropriate and possible.