The future of biology
Yet a future where biological solutions are generated from the extracted principles underlying emergent behaviors won’t arrive by simply putting biologists and machine-learning experts in the same room.
There is an urgent need for bilingual scientists. Only those who speak the languages of both biology and computation will be able to frame the biological problems in machine-understandable grammar and improve the ways machines are learning from the data, enabling AI to generate the most predictive insights. The path from problem to data to machine to insight will be iterative, not linear, with multiple points that require human interpretation and testing. In the same way humans, augmented by machine learning programs, play better chess than either computers or humans playing alone.
For a machine to learn effectively, someone must choose the right algorithm to apply to a given problem, which depends entirely on the specifics of the problem. Biological systems rarely behave like the systems where machine-learning algorithms have typically been deployed. In chess or Go, the rules of the game are the same every time; in biology, the rules are highly dynamic. The data may be very noisy, in ways that are hard to understand. The same molecules that are anti-inflammatory in one context can be inflammatory in another. Cells that are wired one way in health are wired differently when they are diseased. The molecular components that drive unique cellular behaviors are even more complex. Machine-learning experts unversed in biology might guess that analyzing protein structures would be an image-recognition problem. But unlike most physical objects, proteins are highly dynamic, vastly modifiable, and not scale-invariant, which means conventional algorithms will often fail. In each case, deep expertise in the biology and computation is essential.
Similarly, biologists and data scientists need to work together to design and run experiments in ways that maximize a computer’s ability to extract the molecular drivers of emergent behaviors. This is not just a question of generating more data but rather of accepting complexity as a vital aspect of useful data. Scientists should continue to knock out all genes systematically with CRISPR, measure the levels of all mRNA transcripts in every cell, read the metabolomic and proteomic profiles of primary tissues, and generate cryo-electron micrographs of complex protein assemblies. But an emergent perspective does not view these data sets as fine-grained information, an opportunity to better separate the signal from the noise. Instead, the bilingual biologist will see all the data as positive examples of the emergent properties of the system and designs experiments to extract those patterns. As Miles Davis said about jazz, the silence is as important as the sound.
Furthermore, an emergent perspective will deprioritize lab-adapted experimental models in favor of primary, patient-derived, as-true-to-human-disease-state-as-possible systems. Instead of seeking models that can be automated and stripped down to probe single mechanistic hypotheses, the bilingual biologist must recognize that nature is the system from which the most relevant and useful governing principles can be extracted. It will do little good to learn the emergent behavior of a model cell line or the governing principles of a cancer-free mouse.
The successful therapeutics companies of the future won’t apply just one or two of these technologies to see more deeply into data. Rather, they will combine multiple lenses to capture the most nuanced view of the system possible, focus on complex human tissues in primary settings, and leverage machines to extract the fundamental rules that drive this complexity.