Blog

Zero Trust AI

Charles K. Fisher

January 25, 2023

Machine learning researchers from Meta released a large language model for science called Galactica on November 15th, 2022; they took it down three days later. The public outcry centered on the tendency of the model to produce scientifically formatted and authoritative-sounding but factually inaccurate text. This episode highlights that people are becoming increasingly aware of and sensitive to the potential for harm from the misuse of artificial intelligence (AI). It also raises an interesting question, is it possible to apply AI in such a way that it cannot cause harm? This is the central goal of a framework I call “Zero Trust AI.”

A zero trust application of AI uses AI to solve a problem in such a way that you can trust the solution of the problem even if you don’t trust the accuracy of the AI used to solve it.

You may think that zero trust AI is impossible to achieve. In fact, it may be impossible to achieve for some, or even most, applications of AI. But, it is possible for some problems. It’s even possible for some high-stakes applications. As an example, I’ll walk through one application of AI in medical research that requires zero trust even though it’s of critical importance.

Suppose I want to compare the average efficacy of two medical treatments to figure out which treatment is more beneficial for patients, like in a typical clinical trial. In addition, imagine that I have an AI that can predict how a patient will respond to one, or both, treatments that I’d like to use to help answer this question. How can I use this AI to help answer my question if I can’t trust its predictions?

The typical way to compare the efficacy of two treatments is to enroll a large group of patients into a research study and then randomly assign half of them to receive the first treatment and the other half to receive the second treatment. The difference between the outcomes from these two groups provides an estimate for the relative treatment effect (i.e., the difference between efficacy of the two treatments). Of course, there will be an error bar on this estimate, but if I enroll enough patients into the research study, then I can make the error bar small enough to make a confident statement about which treatment is better.

Consider a simple modification to the same study. At the beginning of the study, I use my AI to predict how well each patient will respond to the first (or second) treatment. I can use these predictions to categorize patients into likely responders and likely non-responders. For both categories, half of the patients will be randomized to the first treatment and half to the second treatment. At the end of the study, I compute the average difference between the outcomes for the patients receiving the two treatments in each group, and I average the differences across the groups to estimate the relative efficacy of the two treatments.

The average estimate for the relative efficacy of the two treatments that I would get from the first procedure (without the AI) is exactly the same as the one I would get from the second procedure (with the AI). Moreover, the size of the error bar from the second procedure is less than or equal to the size of the error bar from the first procedure, on average.

If the AI is producing nonsense, then grouping the patients based on its predictions doesn’t do anything. However, if the predictions from the AI are accurate, then using the AI allows me to be more confident in my assessment of which treatment is better. I can use the AI to help solve the problem even if I don’t trust it; the worst thing that could happen to me is that I get the same answer I would have gotten without it.

I chose to highlight this method (i.e., using an AI to stratify the population) only because I thought it was the easiest to explain. In fact, this is just one of many procedures for incorporating an AI into a clinical trial that requires zero trust. For example, see the Qualification Opinion from the European Medicines Agency on Unlearn’s PROCOVA™ methodology.

It’s somewhat remarkable that zero trust applications of AI actually exist, even for critical applications like comparing two medical treatments to decide which is better. However, zero trust may be too high of a bar for most use cases.

Zero trust means that a user can trust the solution to their problem even if the AI they’re using to help solve it is complete garbage, but I’d venture that every AI researcher aims to create models that are at least pretty good rather than complete garbage. In general, then, it’s probably best to pursue limited trust applications. A limited trust application of AI is one that generally produces the right solution as long as the accuracy of the AI is within a certain range. The solution to the problem is robust to some inaccuracy in the AI. In a sense, a limited trust application is one that only needs an AI that is “good enough” for the problem rather than one that is perfect.

A key takeaway from this post is that it is the way an AI is used that creates, or mitigates, the risk of harm. The AI cannot be evaluated separately from its context of use. Most applications of AI that we encounter every day only require limited trust because they can’t create much harm. You may find it annoying when a poorly targeted ad shows up in your web browser, but it’s not that big of a deal. But, it’s probably prudent to invest more into research on zero (or limited) trust frameworks as AI-based solutions are increasingly used in safety-critical applications.