[Updated Mar. 2021]
Interactomics today bears a certain resemblance to genomics in the 1990s... Big gaps in knowledge, but an explosively-growing field of great promise.
If you're unfamiliar with the terms, genomics is about deciphering the gene sequence of an organism, while interactomics is about describing all the relevant bio-molecules and their web of interactions.
A Detective StoryThink of a good police-detective story; typically there is a multitude of characters, and an impossible-to-remember number of relationships: A hates B, who loves C, who had a crush on D, who always steers clear of E, who was best friends with A until D arrived...
Yes, just like those detective stories, things get very complex with our biological story! Examples of webs of interactions, familiar to many who took intro biology, are the Krebs cycle for metabolism or the Calvin cycle to fix carbon into sugars in plant photosynthesis.
Now, imagine vastly expanding those cycles of reactions - the bane of biology students who need to memorize them - to cover all the cellular functions, in all cell types, at various points in time, in various organism. Oh, and add quantitative information, such as concentration (a function of location and time), and reaction parameters...
Welcome to Interactomics :)
[We choose to go to the Moon in this decade and do the other things,] not because they are easy, but because they are hard; [...] because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one we intend to win -J. F. Kennedy's speech
The CharactersBack to the detective-story analogy, who are the "characters"? Well, the genome (DNA) is well-described. The proteins not as well. The number of proteins in humans (the "proteome") is some 20 to 30 thousand. The Human Proteome Map (HPM) project mentions about 30,000 proteins - and that's without counting proteolysis events (protein breakdowns), and other post-translational modifications ("translation" is one of major steps in the generation of new proteins.) In another blog entry, I provide a primer on the complexities of proteins.
In addition to DNA and proteins, the "cast of actors" of course also includes a variety of other biomolecules, such as RNA, lipids and ATP (the molecule widely used for energy storage), not to mention various small molecules.
The InteractionsJust as our detective story would get dull if it the characters remained in isolation, the story of Life gets interesting when the biomolecules start interacting with one another. In principle, with 30,000 proteins, one could have about 450 million pairwise interactions! Fortunately, proteins tend to be specific in their interactions, so many of the conceptual pairings don't actually occur.
Still, the numbers are large. And that's just the protein-protein interactions! Needless to say, protein-DNA interactions are equally vital (in particular, to regulate gene expression), and other biomolecules cannot be left out, either...
So, we have large numbers of "actors" (bio-molecules) and dizzying numbers of "relationships" (reactions.)
The Missing PartsTo further complicate matters, not all "actors" have been characterized yet. And that's even more true for the "relationships." Projects such as the REACTOME have been hard at work to round up all the known interactions. I think it's fascinating to take a peek at their interactive Pathway Browser : to me, it feels like being let in past the curtains to peek at the Inner Workings of the life force!
But to do with the unknown parts? Generally speaking, they can be explored experimentally or with computer simulations ("molecular dynamics" simulations.)
Molecular Dynamics Computing
"Molecular dynamics" simulations are very complex, even with powerful computers. In brief, that's because of the large number of atomic nuclei and electrons in biomolecules, interacting with all other atomic nuclei and electrons (less so with those farther away, but still a large number interactions.)
Hence the Supercomputers and Quantum Computers mentioned in the title. Supercomputers have been riding the recent revolution in GPU performance. "For the first time in history, most of the flops added to the TOP500 [supercomputer] list came from GPUs instead of CPUs" (June 2018 article.)
Very recently, at the end of 2020, a breakthrough Machine-Learning approach, the AlphaFold 2 project by Google's DeepMind company, has been able to find patterns in known protein shapes, to the point of fairly accurately predicting shapes of other proteins (5-min summary, slightly more detailed intro, more depth, and a Nov. 2020 article in journal Nature)
And quantum computers are expected to be especially helpful for simulating molecular dynamics. A topics for future blog entries! For now, let me just mention a 12-minute PBS video with one of the best intros to qubits and quantum computing (especially its underlying math), and an interesting, though a little dated (2018) online course: The Building Blocks of a Quantum Computer
Systems Biology : Quantitative ModelingLet's try to put it all together. We have a relatively good set of "actors" and a rather incomplete set of relationships. What's next? Quantitative modeling! But how do we do that, given our rather limited knowledge of "initial conditions" (for example, concentrations in each of the grid partitions introduced for modeling the cell), and given our wobbly knowledge of reaction parameters?
Well... unknown initial conditions... partially known "weights"... that's a job for Machine-Learning style optimization techniques! Perhaps a mix of gradient descent and genetic algorithms (i.e. artificial Directed Evolution, one of my research areas in Theoretical Neuroscience.)
Machine Learning approaches are also discussed in this 2012 article in the Proceedings of the National Academy of Science, Computational design of genomic transcriptional networks with adaptation to varying environments. Of course, Machine Learning has many more immediate uses in medicine, such as finding cancerous patterns in medical images (here's a very recent, 2/2019 article on AI in Cancer Imaging), but the focus of this blog entry is quantitative modeling of the cell.
Yes, it's a tall order. A good place to start is probably the simplest of organism. For example, here's a fascinating 2012 article in Cell, where whole-cell simulations are applied to Mycoplasma genitalium, one of the simplest bacteria known, with just 525 genes in its genome. In that article, the authors' computer simulations provided insight into that bacterium protein-DNA association, and into its replication.
Envisioning the FutureA possible sequence of events that could profoundly shape Medicine in the 21st century is quantitative modeling of prokaryote (bacterial) cells, followed by quantitative modeling of eukaryote cells (complex cells with a nucleus, including human cells), and finally quantitative modeling of tissues and finally of whole systems/organisms.
How will all that unfold? Among the key players, I envision institutions or companies that are fluent in bringing together the best available biological datasets (such as the REACTOME) and their frequent updates, and add good analytics, web interface and API for usability... And then work closely with academia to add quantitative modeling and machine learning. A mix of open-source/open-data and licensed, might be especially good - to work tightly with academia and public institutions, while at the same time raise money for operations and research.