Challenges to succeed with LLMs 

November 27, 2023 - Nicolás Andrés Morandi

You have probably read volumes about what Generative AI and Large Language Models (LLMs) are, the value they bring to business, and some interesting use cases. However, as you may have experienced in the concept phase or after developing a proof of concept, there are several challenges to bringing one of these use cases into the real world.  

Let’s dive into what the main challenges your organization will need to tackle to use LLMs successfully, broken into four categories: 

User centric  

 The user is at the core of every successful LLM implementation. First, we need to understand how quickly users are going to adopt the technology and how willing they are to do so, challenges which are closely linked to change management. Generative AI is linked to a high increase in productivity, which you will want to leverage as soon as possible. 

At the core of user adoption lies the correctness, truthfulness, completeness, and explainability of the result. It is no secret that LLMs tend to hallucinate, providing answers that sound realistic but are factually incorrect.  This phenomenon can greatly reduce the confidence of users. 

Another challenge to credibility is related to training cut offs: I.e. the fact that models are trained with data until a certain point in time and know nothing about the world since that point. To ensure trustworthy results, we need to augment their knowledge by keeping it up to date. 

Technical 

Many of the technical challenges to implementing Generative AI solutions are similar to those experienced when using standard AI tools.  The new tools also bring new challenges along several dimensions. Until recently, we had large models with millions of parameters, but now we are rapidly going from billions (GPT-2, PaLM), into potentially trillions of parameters (GPT-4). This brings the complexity related to training a model from scratch one or more orders of magnitude higher, shrinking the pool of organizations capable of doing so.  

As for those who are not part of these tech giants, we are given thousands of foundational models with impressive capabilities that we can use out of the box or fine-tuned for the task at hand, allowing us to benefit from decade’s worth of research and development. When working with these models, you will need a team that is able to understand and implement several techniques such as prompt engineering, prompt tuning, fine-tuning, reinforcement learning from human feedback, and optimization. The last one is especially relevant if there is a constraint on hardware, either because of costs or lack of availability (NY Times). 

You will need a framework to evaluate the model along several dimensions such as correctness, completeness, truthfulness, and guardrails against unintended behavior. 

Lastly, operationalizing such models brings challenges not only in terms of computing resources but also in terms of scalability, deployment orchestration, monitoring, and maintenance, that are giving growth to an LLMOps discipline like the existing MLOps framework. 

Data challenges

Just as with traditional AI, the data we feed to the model directly determines the type and quality of output that the user receives. When working with LLMs, it is almost impossible to know what data they were trained on. The training data might contain biased information, dangerous content, and ethically questionable content. Thus, we need to build guardrails to prevent toxic language, aggressive responses, tackle bias and (un)fairness in the data, and most importantly prevent the models from providing dangerous information. 

If you need to fine-tune an LLM model, you still need to build training, validation, and test sets, which can be more time consuming than before, and these are steps that can’t be ignored. You also need to create the right evaluation tools, which might not be straightforward. When building a model to estimate something like housing prices, it is easy to measure the error between the true and predicted value. When evaluating pieces of text, however, it is much harder for a computer to evaluate if they both have the same meaning, if the result contains the right information, and if it is provided with the right tone and vocabulary.  

Compliance & security 

There are two main challenges in this area: firstly compliance, laws, regulations, copyright, and intellectual property (IP); and secondly private data security. 

Compliance with laws and regulations; copyright & IP 

Regulations  such as the AI Act in the EU are already triggering the need to review the AI lifecycle and review existing AI products.  

There are also concerns about AI generating content that is not novel and instead replicates data on which it was trained on, potentially breaking copyright and IP laws. Some companies are already seeking ways to address these concerns. For example, Microsoft has pledged to assume responsibility for the potential legal risks involved in the usage of their tool Copilot

We need not only to check what data we use to train our own models, but also to be aware of the fact that most companies will leverage existing foundational models for which the provenance of the training data has not been disclosed . To go a step further, some of the big players such as Meta provides open use of models such as Llama2, but in their conditions of usage prevent using the outputs of their models to train new models. Now, how can we make sure that a foundational model that we choose to use hasn’t been trained following this constraint? 

Private data security 

Secondly, we must not only deal with security from a traditional IT point of view when putting an AI product in production, but we also need to make sure that only the necessary information is exposed to the end user. If the model is trained on confidential or sensitive data, it might be inadvertently exposed when providing an answer. Such was the case with Samsung’s data being leaked via Chat-GPT, which led to several organizations banning the usage of Chat-GPT. 

 Another important safeguard is ensuring that even internal users can only retrieve data relevant to their role. In traditional applications and databases, we have multiple ways to do this, such as RBAC (Role Based Access Control) and column and row level security. With an LLM, we will need to envision new ways to enforce access rules on the knowledge base. 

Where do we go from here? 

While it is true that some unique challenges arise when using Large Language Models, most of the process and lifecycle of AI to date remains relevant and the knowledge around NLP till this point will prove highly valuable. We will need to create new tools, frameworks, and governance around these models, but it is undeniable that they will bring increased productivity and become a game changer across all industries. Curious? Check out our offer for more information here and give us a call if you’d like to sit down with us and explore your options.