March 28, 2023
TLDR; It’s almost always better to run a few well thought out experiments than many poorly thought out experiments.
At Unlearn, we do a mixture of both fundamental and applied machine learning research. That is, we aim to invent new neural network architectures and training methods, and we also deploy those methods to solve problems in medicine.
Our fundamental research has always been motivated by unsolved problems we encounter in our applied work. In fact, that’s one of the reasons we decided to focus on machine learning for medicine in the first place — there are so many unsolved problems that there’s constant inspiration for fundamental machine learning research.
Unfortunately, fundamental machine learning research is hard. Or, perhaps that’s fortunate; it’s certainly my view that I’d rather spend my time on hard and important problems then comparatively easier and less important problems. In any case, I’ve found that I can maximize my chance of success on fundamental research problems by applying four simple principles.
These principles are:
- Build the baseline first.
- Do things that don’t scale.
- Hone your physical intuition.
- Get rid of things that don’t work.
The first principle is “build the baseline first”. Too often, I see researchers set out to solve a machine learning problem by immediately starting to implement some new and complicated method. There are three obvious problems with this. First, if you don’t have a baseline then you don’t have anything to compare your new and complicated method to in order to judge how well it works. Second, you won’t have any of the intuition for the problem that you would have developed by first implementing a simpler baseline method. Third, research in industry is rarely completely open-ended, you’re expected to eventually deliver something, and if your new and complicated method doesn’t end up working you’ll be scrambling to implement a simpler one at the last minute (I’ve seen this happen so many times…).
The second principle is “do things that don’t scale”. Machine learning engineers like things to be modular, extensible, efficient, and automated. However, trying to build something that is modular, extensible, efficient, and automated before you know what method actually works is a waste of time at best and, at worst, will prevent you from ever developing something that actually works. The best way to advance in machine learning research is by “graduate student descent”. Just implement something quickly, try it, look at the results, then adjust your method and try it again. The goal of your experiments should be to develop intuition for the problem so that you can eventually make a big, non-obvious leap forward. Just sit at your machine, try stuff out locally, and think about why some things work and others don’t. Only worry about tidying things up, increasing scale, and automating further experiments after you’re 80% sure you’ve figured out the right method.
The third principle is “hone your physical intuition”. Machine learning is an empirical science. Ultimately, the only way to know if something works is to try it. However, it’s not possible to try everything. You need a guide to tell you which experiments to try. What should this guide be? Mathematical analysis? No, too hard. Which techniques are currently popular? No, too banal. The best way to make progress in machine learning research is to develop your intuition. Build a baseline to get a sense of the problem. Maybe make up a toy problem. Implement some things and do some graduate student descent. Think about the results! Think about them more! Just marinate on them for a while! Keep doing this until something clicks. To quote Richard Feynman “Now, all these things you can feel. You don't have to feel them; you can work them out by making diagrams and calculations, but as problems get more and more difficult, and as you try to understand nature in more and more complicated situations, the more you can guess at, feel, and understand without actually calculating, the much better off you are!”
The fourth principle is “get rid of things that don’t work”. Inevitably, you’ll be implementing new architectures, training methods, and tricks as you run your experiments. Over time, you’ll start to accumulate code for lots of things that you’ve tried, but didn’t work. Go back and delete those things! This is, perhaps, my most controversial and difficult to follow principle. Few people want to go back and delete something they spent time working on. I get it, but that time is a sunk cost; who cares. The bigger problem is that it will be easier to run future experiments on things you’ve already implemented than it will be to implement new things. You’ll be biased to keep trying the same thing over and over again rather than honing your intuition so that you can make a big, non-obvious leap forward. You’re better off deleting them (you can always recover them from your git history if you have to) and trying something else next time. Burn the bridge.
Caveats. There are two things to keep in mind when applying these principles to machine learning research when you’re working as part of a team. “Doing things that don’t scale” and “honing your physical intuition,” when taken to the extreme, can lead you to doing things by yourself. That’s definitely not good, because you want to share your knowledge with the rest of the team and help them develop their intuition too. Therefore, while you are doing things that don’t scale, you should still be doing things that are easy to communicate. For example, your initial implementation shouldn’t be modular, extensible, efficient, and automated, but it should be written well enough that it’s easy for others to understand and reliable enough that you (and others) believe the results of your experiments. Similarly, you should document the experiments you run and, better yet, keep an updated memo laying out your intuition for the problem. Why do you think you got the results you got? What does it all mean?
I’m sure that other people have different principles that guide their research in machine learning, maybe theirs are better, but these are mine: “build the baseline first,” “do things that don’t scale,” “hone your physical intuition,” and “get rid of things that don’t work.”