LLMs are a powerful technology. They can do a lot – but they also make a lot of mistakes and repeatedly provide incorrect information.
The origin of the problem is now well known. Language models are statistical models: they represent language, not facts. Factual accuracy is only a side effect of statistical frequency, a side effect that is laboriously nurtured into a feature through a lot of manual feedback.
This makes certain tasks more difficult for a language model than others. As long as general contexts are involved, the risk is manageable. For example, if we want to know how a market launch typically proceeds, it is helpful for the model to ‘collect’ many cases and show different possibilities. However, it becomes critical when it comes to concrete facts. If we ask about the market launch of ACME Ltd.'s Supersoft detergent in France in 2023, for example, we don't want it to be mixed up with other cases and we don't want any ‘may’ or ‘possibly’ – but this is precisely what language models find particularly difficult. This results in the well-known mishaps with fabricated case studies in court or our everyday problems with AI-supported research.
But it's not just specific questions that challenge an LLM. We also make life difficult for the language model with differentiated tasks – and we can already see this in our instructions.
Here are a few typical examples of complex, differentiated instructions that lead to an accumulation of errors in our AI applications:
It is no surprise that these types of instructions are problematic: they almost always involve what linguists call pragmatics, i.e. understanding the conversational situation and the effect of linguistic utterances. This is far beyond the capabilities of LLMs, not only in terms of generating their output. It starts with the input. There is no real distinction between the model's own instructions, the information made available by RAG, and what the conversation partner, the user, has said. Everything is part of a text input for which the most suitable output is then sought.
This means that language models cannot make a clear distinction between reliable and unreliable information. This not only makes control more difficult, but also makes them susceptible to abuse and targeted attacks (a separate issue).
Some of these risks can be mitigated in the design of the application, others cannot. Often, a solution in one area leads to new problems in another. Some approaches are only theoretically possible, but too costly or unattractive in practice, e.g. if the actual benefit of LLMs is lost in the process – for example, if all confidential information has to be manually removed in advance or the data first has to be laboriously structured in order to be selectively released. Or if we dynamically compile the prompts valid for a specific situation from a larger pool and, with this selection process, ultimately programme the intelligence that the language model should actually provide.
And some things are simply not feasible, especially where strict limits on authority must be observed.
Simply hoping that the model will follow the instructions is not enough. We need a systematic approach to dealing with these risks. This starts with recognising when instructions are repeatedly violated: during the development phase, in order to debug in a targeted manner; during runtime, in order to immediately detect critical authority violations, for example; and finally in monitoring, in order to continuously improve the application.
In doing so, we must not rely solely on additional language models that we use as a monitoring authority. Apart from higher costs and longer latency, this would only expose us to the same problems again. Instead, we rely on techniques that allow us to look inside the black box – and thus make it more transparent how the model actually handles our specifications.
LLMs make mistakes primarily when instructions are complex, contradictory or only valid under certain conditions, when information cannot always be used or can only be used for certain users. This is where instructions first need to be specified with knowledge (‘Who is a competitor?’) and where users repeatedly push us to the limits of our authority.
To spot the resulting mistakes is our first challenge. To eliminate these mistakes is only partially a question of better prompting. We need techniques that monitor LLMs, look into the black box and warn us of their errors.