Blog

Data powers machine learning, machine learning powers data?

August 17, 2023

Unlearn’s purpose is to advance AI to eliminate trial and error in medicine. We create digital twins of patients that forecast individual health outcomes and are used to accelerate clinical innovation. To create digital twins, we train AI models that we call digital twin generators on historical data. Our recent releases include digital twin generators in Alzheimer’s Disease, ALS, and Frontotemporal Dementia (FTD).

While the right machine learning architecture is the sports car that drives a release, the fuel supporting all of machine learning is data. In order to advance AI, Unlearn is building the best database of human health in the world.

To accomplish our Top Secret Plan, we need to bring together data from disparate sources spanning from clinical trials to doctor visits. These data can only realize their full potential for training generative AI algorithms after sufficient cleaning and standardization. The problem is that across healthcare, very little of the data collected by practitioners, hospitals, and pharma manufacturers are consistently measured and stored.

To illustrate our problem, I will focus on just a few of the challenges within clinical trials. Meticulous records must be taken during a clinical trial on anything that could be of interest to the FDA and a biopharma company. However, clinical trial data is fundamentally stored for audits — not analytics. As a result, the lack of comprehensive standards in clinical trial data prevents the seamless consolidation of data from various trials into one large database. We have had to create our own standards to build a large, feature-complete database for disease indications. Once we have one database, we dive deep into the data to ensure the quality meets our stringent requirements. Our rigorous quality control process has been applied to a total of 85,000+ participants with 470,000+ visits in 7 indications. This scale of data already rivals the size and quality of those found in large pharma companies. However, our best is yet to come as we work through our backlog of 600,000+ participants across 33 indications.

What is the secret sauce that enables a startup to compete with large pharma companies in the data arena? The answer is the right people with the right tools. We believe that AI is the core of everything we do at Unlearn, so we are constantly looking for new applications of AI. At the start of our journey, we followed the typical startup playbook and focused on getting things done even if they didn’t scale. This meant that our initial databases were a painstaking labor of love created by our data and clinical scientists. However, the next generation of our databases is powered by AI. We've discovered that using large language models to match tables between different data sources can significantly accelerate our database workflow. This has only begun to scratch the surface of what AI can do for organizing unstructured data or detecting database anomalies, and we are excited to continue to innovate.

If you are passionate about data and AI, join us as we build the most comprehensive database of human health in the world.