How can we trust non-explainable AI?

Kushal Shah
6 min readJun 15, 2024

--

Explainability helps in building trust, but it is not an essential requirement.

How many times have you been able to explain your decisions to your friends, relatives or co-workers? Perhaps not very often. But does it mean that they don’t trust you? It is true that explainability surely helps in promoting trust, but from our own experience, we can see that explainability and trust are not necessarily soul mates. What is actually essential for building trust is consistency in performance. Most scientists can’t explain where from and how they get their creative ideas, but we trust them to keep doing good science because of their past consistency in performance. Same goes for reputed chefs whom we trust to cook good food for us despite knowing nothing about their process. So the first thing to understand is that while explainability of AI models is desirable, it is not essential for building trust in AI models.

What is explainability?

Before exploring the means and mechanism of building trustworthy AI models, lets try to understand if they can ever become explainable. What does explainable AI (or XAI) even mean?

An AI model would be called explainable if we can understand how it arrives at its output for a given input in human terms.

We, of course, know the model details and the parameter values and so in a way we have access to the entire AI model, but thats not what we mean by explainability. What do these parameter values actually mean in human terms?

If a particular tweet is classified as “hate speech” by an AI model, can we explain to the author why that tweet was going to get blocked from distribution? If a particular image on Facebook is classified as “vulgar” by an AI model, can we explain to the user why that image was inappropriate for a social media platform? If a weather AI model predicted rain for tomorrow, can we explain how it arrived at that prediction? Same goes for myriad of other applications for which AI models are deployed.

Why is explainability important?

Explainability helps in building trust! If we fully understand how an AI model works and makes its predictions, its easier for us to figure out when and where it will make wrong predictions and then we can try to fix that through other means. But if we don’t understand how a model works, it can go wrong in unexpected ways leading to embarrassing situations, biased outputs, monetary loss and even a health crisis! And given the huge number of applications for which AI agents are going to be deployed, explainability of the underlying models can lead to significant benefits.

Can AI models ever become explainable?

There are surely a lot of blogs and articles by major companies and even academicians that try to sell the dream of “Explainable AI”, but all of them are just marketing tools.

It is very important for practitioners to understand that AI models are fundamentally non-explainable, and any investment at building explainable models is just a huge waste of time and resources.

In classical Machine Learning, we use “features” in our input data to make predictions. For tabular data, the input columns act as the features. And for unstructured data like text, we try to extract features using our domain knowledge. For some tasks, the number of words and number of sentences in a text may be good features. For another task, the number of certain Part-Of-Speech (POS) tags may be good features. For yet another task, presence or absence of certain specific words may be our features. And the list goes on. Now these feature based methods were quite successful for the problems they could solve, but unfortunately that set of problems is very small.

Modern AI algorithms such as the Transformer model have completely changed the way we approach text based problems. Nowadays, there is no need to manually figure out appropriate features since the word embeddings learnt by the transformer model can be easily fine tuned for any given given task.

Transformer models learn complex patterns in the input data by modelling the conditional probabilities of word sequences. The resulting representations capture intricate relationships and dependencies that are typically not reducible to simple, human-understandable ‘features’.

This is not a limitation of the transformer model but an intrinsic feature of natural human language. Unlike formal languages, natural languages do not have a well defined grammar due to which there is a huge variety in the way humans use words to convey their ideas. Even a highly structured language like Sanskrit has a lot of exceptions in its rule book, which makes it almost impossible to write all the grammar rules as a complete computer program. And the problem is much deeper in other languages, all of which have a much more flexible structure. This huge variety in the usage of any particular natural language cannot be captured by simple rules and features.

For a specific task, it may be possible to come up with good explainable models, but the probability of finding such a model quickly drops to zero as the complexity of the task increases.

So how can we trust non-explainable AI?

Trust comes from consistency!

I don’t understand medical science, but if I know a doctor has a past history of successfully treating his/her patients, then I am likely to trust his/her diagnosis. The same applies for AI models as well. What need to do is to rigorously test the AI models in a diverse set of situations so that we can be quite confident in their performance. So will these tests guarantee that the AI models will never fail? Of course, not! As the famous scientist, James Clerk Maxwell once said:

“True logic of this world lies in the calculus of probabilities.”

Testing is not about guaranteeing success, but about ensuring it has a high probability. In other words, rigorous testing significantly decreases the probability of failure. So, after deploying our AI models after rigorous testing, we still need to keep an eye on its performance. This is just like we deal with humans. Just because we have elected a certain politician to the chair does not mean that we stop keeping an eye on his/her decisions and actions. Every system, be it human or a machine, needs to be continuously evaluated.

What about critical domains like healthcare?

A wrong weather prediction is alright, but not a wrong health diagnosis.

There are some specific domains where failure can be very expensive! It is true that even human doctors make wrong predictions, and that should be taken lot of more seriously than what it currently it, but the society seems to expect much higher standards of performance from AI models than human doctors. And this is why adoption of AI models in healthcare has been very slow despite remarkable progress in model capabilities. Now while we can’t change the society, we can surely figure out how to enable greater adoption of these AI models in critical domains.

Machine Learning or AI models are generally classified as “white-box” or “black-box”. The white-box models are fully explainable and there is usually no issue in deploying them even for critical domains since we fully understand whats going on inside them. Logistic Regression is one of the most widely used example of such white-box models. In contrast, black-box models are opaque and we don’t really understand their predictions despite having fully access to the model and the parameters. Artificial Neural Networks and its cousins fall in this category. Now while we would ideally like to use white-box models for all our applications, the problem is that these models are not powerful enough to solve complex tasks due to which using black-box models becomes necessary especially when we are dealing with unstructured data like text, images, audio, etc.

A good middle ground can be achieved by what are called “glass-box” models (terminology introduced by DARPA XAI program).

These glass-box models have human-in-the-loop, due to which chances of AI model prediction going completely haywire becomes quite limited. So why then do we need to use AI models at all, if there is a human needed to babysit the AI model? The important thing to note is that even human experts make mistakes, and they do it a lot of times! So in critical systems, it is very important for us to augment human capabilities with AI models with the hope that together they can lead to significantly lower errors (almost zero).

--

--

Kushal Shah

Now faculty at Sitare University. Studied at IIT Madras, and earlier faculty at IIT Delhi. Join my online LLM course : https://www.bekushal.com/llmcourse