AI gets creative, we do the mindless work.

Does this sound familiar? You've asked ChatGPT or some other LLM-based chatbot to create marketing texts or product descriptions. In no time at all, it has generated hundreds of texts and images based on a few keywords. At first glance, they seem perfectly usable. But how do you get this content into your database, onto your website or onto your social media channels?

Quite often: by hand, namely ours, using copy and paste. And so we find ourselves in the curious situation of mechanically, mindlessly transferring texts from one window to another or from one application to another, while AI is creatively realising itself.

Does it have to be like that?

This is not an isolated case. Compiling the content for an RAG setup, making the figures from LLM-based market research comparable with your own BI data – there is a lot of hard work to be done between the generative performance of AI and the usable result, media breaks have to be overcome, etc. And this work is left to us.

It stands to reason that we want to move away from this division of labour, but it is not that simple: on the one hand, ‘repackaging’ information is not exclusively mechanical and mindless. While we copy the text, we skim through it again to see if it fits, notice that an error has crept in here because an instruction to the LLM is still incomplete, and perhaps the goal is unclear there. In short, we use this method to introduce quality control and iterative improvement.

The big challenge: integration

The other obstacle is even more obvious: as a rule, the LLM-based application we use to generate texts cannot access the systems in which we need those texts (the product database, the content management system, etc.). This brings us to the question of how we can enable LLM-based applications to interact productively with our IT landscape, a key issue when it comes to deriving real added value from generative AI. And an exciting topic that raises further questions.

On the one hand, there is the question of reliable technical communication: we have to persuade the LLM to convert its output into the JSON, csv or XML format we need in a syntactically correct manner. This is not without irony. For the first time, we have technical systems with which we can communicate in natural language, and now we are going to great lengths to get them to formulate formal commands to old style IT systems or squeeze data into predefined schemas. But as long as entire information production and communication chains are not taken over by agent systems, we will probably have to put up with this.

A matter of trust

But the much more critical question is: Do we want to allow LLM-based applications to access our systems in writing? Do we trust them that much? What if the LLM accidentally overwrites other information with its generated texts? Or deletes the entire database? What if the LLM also interacts with external users or external information and is tricked into deleting our data through prompt injection? Or tricked into spreading phishing messages en masse?

How can we hand over the tedious work to AI?

We need tools for integration. Tools that understand and take seriously the world of enterprise data and systems, but at the same time make it easy to connect to them.

We need a quality of output from LLMs that makes human-in-the-loop redundant.

And finally, we need a level of trust in the capabilities of AI that we do not yet have at this point.

No one expects AI to manage this on its own. It is accepted that we have to give clear instructions – especially in the field of IT and communication with IT systems. The problem is reliability in dealing with these instructions: the tendency of language models to occasionally simply not follow even very clear instructions without us noticing, the fact that we cannot bindingly specify to the LLM which of conflicting instructions are more important and must be followed in any case – these are the weaknesses of the technology that we must overcome before we can entrust AI agents with the key to our server room.

Conclusion

Currently, working with generative AI often feels wrong – we are constantly busy controlling and channelling its output. We need to find a way to make it binding and reliable. After all, we shouldn't be babysitting the tireless and uncontrolled creativity of LLMs, but rather using them to enhance our own creativity.