How to make it difficult for the AI

Some tasks are more difficult than others for LLMs and lead to an accumilation of errors. We will discuss a couple of risk factors and possible mitigation actions.

LLMs are a powerful technology. They can do a lot – but they also make a lot of mistakes and repeatedly provide incorrect information.

The origin of the problem is now well known. Language models are statistical models: they represent language, not facts. Factual accuracy is only a side effect of statistical frequency, a side effect that is laboriously nurtured into a feature through a lot of manual feedback.

This makes certain tasks more difficult for a language model than others. As long as general contexts are involved, the risk is manageable. For example, if we want to know how a market launch typically proceeds, it is helpful for the model to ‘collect’ many cases and show different possibilities. However, it becomes critical when it comes to concrete facts. If we ask about the market launch of ACME Ltd.'s Supersoft detergent in France in 2023, for example, we don't want it to be mixed up with other cases and we don't want any ‘may’ or ‘possibly’ – but this is precisely what language models find particularly difficult. This results in the well-known mishaps with fabricated case studies in court or our everyday problems with AI-supported research.

But it's not just specific questions that challenge an LLM. We also make life difficult for the language model with differentiated tasks – and we can already see this in our instructions.

Typical risk factors

Here are a few typical examples of complex, differentiated instructions that lead to an accumulation of errors in our AI applications:

  • Risk factor conditional RAG - Retrieval Augmented Generation (RAG) has established itself as a technique when we want to get the LLM to stick to facts, perhaps even company-specific facts. Now, it may be that the content accessed by the retrieval also contains confidential data that should only be accessible to selected users, or technical details that are not intended for every target group. In this case, we can instruct the language model accordingly, but this is highly error-prone: the model will tend to communicate the information provided to it, even in situations where we do not want it to.
  • Risk factor conditional, situation-dependent instructions – We often want to control the language model's conversational behaviour using instructions. Let's take a technical support application as an example. Here, it makes sense to instruct the LLM not to jump to conclusions about the cause of an error. This prevents the application from making unfounded diagnoses when a symptom is first reported. But there are exceptions, of course: for example, when users come up with a clear error code. However, it is difficult for an LLM to recognise when the cause of an error is obvious; often, a kind of world knowledge is required that it does not have. We often find this type of conditional instruction where the language model itself is supposed to take over the conversation, e.g. when it is supposed to conduct a sales pitch, offer coaching or training, carry out recruiting, or handle routine communication for the user, such as making bookings. In all these cases, we expect the AI application to follow a plan, but not in a mindless way.
  • Risk factor contradictory instructions - Even more serious are direct contradictions. At the corporate level, the rule may be: ‘No statements about internal projects.’ At the same time, however, a recruiting AI is supposed to provide applicants with information about precisely these projects, because they may be expected to work on them. Such conflicts can only be avoided if the entire instruction stack is under control.
  • Risk factor Implication of instructions – Let's take a sales bot that is not supposed to talk about competitors. The difficulty: the model would have to recognise which products are competitors itself – a task for which it is hardly equipped to perform reliably. If we explicitly specify the competitors to the LLM, we are back to a variant of conditional RAG (‘use this information only to *not* talk about them’) with the additional problem that LLMs are notoriously weak at handling negative instructions.
    Risk factor limited authority - Even tasks with clear boundaries can be risky. Example: a medical assistant AI is allowed to book appointments or explain findings, but under no circumstances is it allowed to make diagnoses. It is very difficult for LLMs to consistently adhere to this boundary, especially because users will repeatedly push them to the limits of their authority, usually completely unintentionally.

The limits of language models

It is no surprise that these types of instructions are problematic: they almost always involve what linguists call pragmatics, i.e. understanding the conversational situation and the effect of linguistic utterances. This is far beyond the capabilities of LLMs, not only in terms of generating their output. It starts with the input. There is no real distinction between the model's own instructions, the information made available by RAG, and what the conversation partner, the user, has said. Everything is part of a text input for which the most suitable output is then sought.

This means that language models cannot make a clear distinction between reliable and unreliable information. This not only makes control more difficult, but also makes them susceptible to abuse and targeted attacks (a separate issue).

Some of these risks can be mitigated in the design of the application, others cannot. Often, a solution in one area leads to new problems in another. Some approaches are only theoretically possible, but too costly or unattractive in practice, e.g. if the actual benefit of LLMs is lost in the process – for example, if all confidential information has to be manually removed in advance or the data first has to be laboriously structured in order to be selectively released. Or if we dynamically compile the prompts valid for a specific situation from a larger pool and, with this selection process, ultimately programme the intelligence that the language model should actually provide.

And some things are simply not feasible, especially where strict limits on authority must be observed.

Effort is not enough – LLMs need help

Simply hoping that the model will follow the instructions is not enough. We need a systematic approach to dealing with these risks. This starts with recognising when instructions are repeatedly violated: during the development phase, in order to debug in a targeted manner; during runtime, in order to immediately detect critical authority violations, for example; and finally in monitoring, in order to continuously improve the application.

In doing so, we must not rely solely on additional language models that we use as a monitoring authority. Apart from higher costs and longer latency, this would only expose us to the same problems again. Instead, we rely on techniques that allow us to look inside the black box – and thus make it more transparent how the model actually handles our specifications.

Summary

LLMs make mistakes primarily when instructions are complex, contradictory or only valid under certain conditions, when information cannot always be used or can only be used for certain users. This is where instructions first need to be specified with knowledge (‘Who is a competitor?’) and where users repeatedly push us to the limits of our authority.

To spot the resulting mistakes is our first challenge. To eliminate these mistakes is only partially a question of better prompting. We need techniques that monitor LLMs, look into the black box and warn us of their errors.