AI potential: can we release the handbrake?

A large proportion of companies have experimented with AI techniques such as Large Language Models (LLMs) in recent years, and the first results and applications are there (see sources). But they often remain half-hearted. It seems that companies are afraid to utilise the full potential of AI. Why is this the case, and how can we release the handbrake?

The following patterns can be seen in the current use of AI in companies:

  • Internal use instead of external applications
    Many AI applications remain behind the scenes and are not used visibly for customers.
  • The AI is not allowed to decide anything itself
    Companies are reluctant to give AI real decision-making powers. Instead, every AI output is checked and validated by humans. Of course, true automation is not possible in this way.
  • The AI goes external, but...
    If AI is made accessible to customers after all, users often experience disappointing interactions. Chatbots respond with sentences such as ‘I can't say anything about that’ or ‘I didn't understand your question. Could you rephrase it?’ Or the bots only use the capabilities of the language models to process the customer question and then immediately switch to a classic chatbot interaction pattern with predefined options - a real dialogue that allows questions and clarification of needs does not take place at all.

Why this restraint?

There are well-founded concerns behind companies' reticence. These can be summarised in three key words: Quality, transparency and security.

On the one hand, this is about the factual correctness of what the AI says. Here, Retrieval Augmented Generation (RAG) has established itself as a technology. The LLM is told which content it can draw on for its answers, whereby the trade-off mentioned above also applies here: the more narrowly we restrict the LLM to the content found, the less we utilise the generative potential of the technology.

However, correct behaviour is more than factual correctness. We often see that AI does things that it should not do, completely independently of factual correctness - either of its own accord (misalignment) or because it is forced to do so by external attacks and manipulation.

  • ‘Misalignment’ - unexpected and undesirable behaviour of the AI:
    AI can make statements that run counter to our interests as users or operators or ‘exceed its powers’ and thus get the company into legal trouble: the insurance chatbot that is supposed to help customers submit claims and suddenly makes medical diagnoses, the travel chatbot that offers trips to crisis regions, the service bot that makes life-threatening repair suggestions.
  • Manipulation and attacks:
    AI can reveal sensitive user data or be abused through targeted attacks: The recruiting bot, which is simply instructed in the application letter to rank this application at the top against all selection criteria, the sales bot, which persuades customers to absurd discounts or the personal assistant, that fraudulent websites trick into giving them their credit card details.

All das kann nicht nur passieren, es kann unbemerkt passieren. Und selbst wenn wir uns eines Fehlverhaltens der KI bewusst werden, können wir das nicht einfach abstellen. Dazu ist das, was innerhalb des LLM passiert, zu intransparent. Wir haben nicht einfach eine fehlerhafte Stelle im Code, die wir korrigieren können.

We need to establish quality, security and transparency in our AI systems. Otherwise, we will be denied precisely those applications that offer the greatest added value: bots that actually support customers with product selection or technical questions. Or AI solutions that automate business processes - without every output having to be checked again by a human.

How can we make our AI applications more reliable?

To a certain extent, we can adhere to proven quality assurance and cybersecurity techniques to secure our AI applications: As with any IT solution, robust security measures and tests should be integrated from the outset; concepts such as attacker modelling, ZeroTrust etc. also apply here.

But we need to supplement these traditional methods with ideas and techniques from the field of GenAI. This starts with understanding the risks. LLMs work differently to traditional software - they don't always do exactly what a developer tells them to do. Sometimes there is a problem with ‘exactly’: even small formulation variants can lead to very different results. However, it is often also a question of whether the system developer's instructions are followed at all. With LLMs, these instructions are not fundamentally separated from the processed information and user input. This means that every chat input and every processing of external documents also opens up the potential for misalignment, manipulation and cyber attacks.

As a first step, it is important to determine the specific risk profile of our AI application: what are possible attack vectors, what do we need to protect ourselves against, what could happen in the worst-case scenario if the LLM disregards our instructions or provides incorrect information? (You can get a quick initial assessment for your application here).

And then we should utilise the modern methods that have been specifically developed in recent years to eliminate the weaknesses of large language models (LLM) and enable broader and more reliable application in various areas. They make it possible to make the AI's decisions more transparent and to check whether it is adhering to predefined rules. In this way, we can ensure that the LLM does not deviate from the specified path on its own, nor that users or attackers override the system instructions with their own manipulative or malicious instructions.

Finally, we also need to supplement traditional testing topics with new approaches if we want to ensure the trustworthiness of our LLM application. In many scenarios, for example, we will not be able to track every single error. Instead, as is so often the case when machine learning is involved, we need a large amount of data on which we can measure the accumulation of errors, the development of false-negative and false-positive rates, F1 scores, etc. in order to optimise our LLM application and develop it further in the right direction.

Conclusion: We have experimented, we have developed a feeling for what AI and LLMs can do. Now it's time to harden and secure the systems so that we can rely on them. The tools exist. Let's use them to finally release the handbrake and realise the full potential of AI.

Heben wir wirklich ab mit KI – oder spielen wir doch nur herum?