Interactomics + Super (or Quantum) Computers + Machine Learning : the Future of Medicine?

[Updated Mar. 2022]

Interactomics today bears a certain resemblance to genomics in the 1990s... Big gaps in knowledge, but an explosively-growing field of great promise.

If you're unfamiliar with the terms, genomics is about deciphering the gene sequence of an organism, while interactomics is about describing all the relevant bio-molecules and their web of interactions.

A Detective Story

Think of a good police-detective story; typically there is a multitude of characters, and an impossible-to-remember number of relationships: A hates B, who loves C, who had a crush on D, who always steers clear of E, who was best friends with A until D arrived...

Yes, just like those detective stories, things get very complex with our biological story! Examples of webs of interactions, familiar to many who took intro biology, are the Krebs cycle for metabolism or the Calvin cycle to fix carbon into sugars in plant photosynthesis.

Now, imagine vastly expanding those cycles of reactions - the bane of biochem students who need to memorize them - to cover all the cellular functions, in all cell types, at various points in time, in various organism. Oh, and add quantitative information, such as concentration (a function of location and time), and reaction parameters...

Welcome to Interactomics :)

[We choose to go to the Moon in this decade and do the other things,] not because they are easy, but because they are hard; [...] because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one we intend to win -J. F. Kennedy's speech

The Characters

Back to the detective-story analogy, who are the "characters"? Well, the genome (DNA) is well-described. The proteins not as well. The number of proteins in humans (the "proteome") is some 20 to 30 thousand. The Human Proteome Map (HPM) project mentions about 30,000 proteins - and that's without counting proteolysis events (protein breakdowns), and other post-translational modifications ("translation" is one of major steps in the generation of new proteins.) In another blog entry, I provide a primer on the complexities of proteins.

In addition to DNA and proteins, the "cast of actors" of course also includes a variety of other biomolecules, such as RNA, lipids and ATP (the molecule widely used for energy storage), not to mention various small molecules.

The Interactions

Just as our detective story would get dull if it the characters remained in isolation, the story of Life gets interesting when the biomolecules start interacting with one another. In principle, with 30,000 proteins, one could have about 450 million pairwise interactions! Fortunately, proteins tend to be specific in their interactions, so many of the conceptual pairings don't actually occur.

Still, the numbers are large. And that's just the protein-protein interactions! Needless to say, protein-DNA interactions are equally vital (in particular, to regulate gene expression), and other biomolecules cannot be left out, either...

So, we have large numbers of "actors" (bio-molecules) and dizzying numbers of "relationships" (reactions.)

The Missing Parts

To further complicate matters, not all "actors" have been characterized yet. And that's even more true for the "relationships." Projects such as the REACTOME have been hard at work to round up all the known interactions. I think it's fascinating to take a peek at their interactive Pathway Browser : to me, it feels like being let in "past the curtains" to peek at the Inner Workings of the life force!

But what to do with the unknown parts? Generally speaking, they can be explored experimentally or with computer simulations ("molecular dynamics" simulations.)

Molecular Dynamics Computing

"Molecular dynamics" simulations are very complex, even with powerful computers. In brief, that's because of the large number of atomic nuclei and electrons in biomolecules, interacting with all other atomic nuclei and electrons (less so with those farther away, but still a large number interactions.)

Hence the Supercomputers and Quantum Computers mentioned in the title. Supercomputers have been riding the recent revolution in GPU performance. "For the first time in history, most of the flops added to the TOP500 [supercomputer] list came from GPUs instead of CPUs" (June 2018 article.)

At the end of 2020, a breakthrough Machine-Learning approach, the AlphaFold 2 project by Google's DeepMind company, has been able to find patterns in known protein shapes, to the point of fairly accurately predicting shapes of other proteins (5-min summary, slightly more detailed intro, more depth, and a Nov. 2020 article in journal Nature)

And quantum computers are expected to be especially helpful for simulating molecular dynamics. A topics for future blog entries! For now, let me just mention a 12-minute PBS video with one of the best intros to qubits and quantum computing (especially its underlying math.)

Systems Biology : Quantitative Dynamical Modeling

Let's try to put it all together. We have a relatively good set of "actors" and a rather incomplete set of relationships. What's next? Quantitative dynamical modeling! That means: how does a system evolve with time, given an initial state and the interactions among its components.

But how do we do that, given our rather limited knowledge of "initial conditions" (for example, concentrations in each of the grid partitions introduced for modeling the cell), and given our wobbly knowledge of reaction parameters?

Well... unknown initial conditions... partially known "weights"... that's a job for Machine-Learning style optimization techniques! Perhaps a mix of gradient descent and genetic algorithms (i.e. artificial Directed Evolution, one of my research areas in Theoretical Neuroscience.)

But what's the counterpart of the "loss function", aka "fitness function" (that is, a gauge of how well the system is performing)? That seems hard to define, but a simulated cell that can divide appropriately, and interact with simulated environments in ways that mimic real cells - i.e. exhibit appropriate phenotypes - could be equated to better performance scores. In the words of this 2015 article in Trends in Cell Biology, Why build whole-cell models? : "quantify variation in how individual cells in a population express a set of genes in response to an environmental signal.”

Machine Learning approaches are also discussed in this 2012 article in the Proceedings of the National Academy of Science, Computational design of genomic transcriptional networks with adaptation to varying environments. Of course, Machine Learning has many more immediate uses in medicine, such as finding cancerous patterns in medical images (here's a 2019 article on AI in Cancer Imaging), but the focus of this blog entry is quantitative modeling of the cell.

Yes, it's a tall order. A good place to start is probably the simplest of organism. For example, here's a fascinating 2012 article in Cell, where whole-cell simulations are applied to Mycoplasma genitalium, one of the simplest bacteria known, with just 525 genes in its genome. In that article, the authors' computer simulations provided insight into that bacterium protein-DNA association, and into its replication.

Envisioning the Future

A possible sequence of events that could profoundly shape Medicine in the 21st century is quantitative dynamical modeling of prokaryote (bacterial) cells, followed by quantitative modeling of eukaryote cells (complex cells with a nucleus, including human cells), and finally quantitative modeling of tissues and finally of whole systems/organisms.

How will all that unfold? Among the key players, I envision institutions or companies that are fluent in bringing together the best available biological datasets (such as the REACTOME) and their frequent updates. And then work closely with academia and private companies to add quantitative dynamical modeling and machine learning. A mix of open-source/open-data and licensed, might be especially good - to work tightly with academia and public institutions, while at the same time raise money for operations and research.

Bringing Together the Community

In March 2022, an open-source project called Life123 was launched, to lay the foundation for an important element of this ambitious project : a new-generation approach to Dynamical Modeling for Systems Biology. Its goals and plans are described in this article.

Comments

Discussing Neuroscience with ChatGPT

UPDATED Apr. 2023 - I'm excited by ChatGPT 's possibilities in terms of facilitating advanced learning . For example, I got enlightening answers to questions that I had confronted when I first studied neuroscience. The examples below are taken from a very recent session I had with ChatGPT (mid Jan. 2023.) Source: https://neurosciencestuff.tumblr.com In case you're not familiar with ChatGPT, it's a very sophisticated "chatbot" - though, if you call it that way, it'll correct you! 'I am not a "chatbot", I am a language model, a sophisticated type of AI algorithm trained on vast amounts of text data to generate human-like text'. For a high-level explanation of how ChatGPT actually works - which also gives immense insight into its weaknesses, there's an excellent late Jan. 2023 talk by Stephen Wolfram, the brilliant author of the Mathematica software and of Wolfram Alpha , a product that could be combined with ChatGPT to imp...

Graph Databases (Neo4j) - a revolution in modeling the real world!

UPDATED Oct. 2023 - I was "married" to Relational Databases for many years... and it was a good "relationship" full of love and productivity - but SOMETHING WAS MISSING! Let me backtrack. In college, I got a hint of the "pre-relational database" days... Mercifully, that was largely before my time, but - primarily through a class - I got a taste of what the world was like before relational databases. It's an understatement to say: YUCK! Gratitude for the power and convenience of Relational Databases and SQL - and relief at having narrowly averted life before it! - made me an instant mega-fan of that technology. And for many years I held various jobs that, directly or indirectly, made use of MySQL and other relational databases - whether as a Database Administrator, Full-Stack Developer, Data Scientist, CTO or various other roles. UPDATE: This article is now part 1 of a growing, ongoing series on Graph Databases and Neo4j But ther...

Using Schema in Graph Databases such as Neo4j

UPDATED Feb. 2024 - Graph databases have an easygoing laissez-faire attitude: "express yourself (almost) however you want"... By contrast, relational databases come across with an attitude like a micro-manager: "my way or the highway"... Is there a way to take the best of both worlds and distance oneself from their respective excesses, as best suited for one's needs? A way to marry the flexibility of Graph Databases and the discipline of Relational Databases? This article is part 5 of a growing, ongoing series on Graph Databases and Neo4j Let's Get Concrete Consider a simple scenario with scientific data such as the Sample, Experiment, Study, Run Result , where Samples are used in Experiments, and where Experiments are part of Studies and produce Run Results. That’s all very easy and intuitive to represent and store in a Labeled Graph Database such as Neo4j . For example, a rough draft might go like this: The “labels” (b...

Life123 : An OPEN-SOURCE Platform for Quantitative Interactomics

NOTE: this entry was inspired by talks I gave at the "Bio-IT World 2025" conference in Boston, and at the "Aging & Gerontology 2025" conference in San Francisco. A VIDEO VERSION also exists (slightly shorter and outdated to Apr. 2025) Life123 is an open-source platform for quantitative interactomics, aiming towards simulations of virtual biological cells or parts thereof – among other things. And it has a slogan: AI and Big Data aren't enough! We also need Dynamical modeling. What does it mean to say "engine for quantitative reactomics"? It means that we perform dynamical modeling of biological systems. And the initial focus is towards eventually creating whole-cell simulations. The platform consists of python libraries, which allow us to do in-silico experiments. Early this year, this project left the beta stage, after a few years of development. Motivation There's a practical motiva...

What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets"

This is a very gentle introduction to the subject. The subtitle is inspired by university courses such as "Physics for Poets"! (if you're technically inclined, there's an alternate article for you.) It has been said that "The language of physics (or of God) is math". On a similar note, it could be said that: The language of the biological world - or of any subject or endeavor involving complexity - is networks ('meshes') What is a network? Think of it as the familiar 'friends of friends' diagram from social media. Everywhere one turns in biology, there's a network – at the cellular level, tissue level, organ level, ecosystem level. The weather and other earth systems are networks. Human societal organization is a network. Electrical circuits, the Internet, our own brains... Networks are everywhere! What can we do with networks, to better understand the world around us, or to create something that we need? Broadly ...

Using Neo4j with Python : the Open-Source Library "GraphAccess"

(UPDATED MAY 2026). So, you want to build a python app or Jupyter notebook to utilize Neo4j, but aren't too keen on coding a lot of string manipulation to programmatic create ad-hoc Cypher queries? You're in the right place: the GraphAccess library (formerly called NeoAccess) can do take care of all that, sparing you from lengthy, error-prone development that requires substantial graph-database and software-development expertise! This article is part 4 of a growing, ongoing series on Graph Databases and Neo4j "GraphAccess" is the bottom layer of the technology stack provided by the BrainAnnex open-source project . All layers are very modular, and the GraphAccess library may also be used by itself , entirely separately from the rest of the technology stack. (A diagram of the full stack is shown later in this article.) GraphAccess interacts with the Neo4j Python driver , which is provided by the Neo4j company, to acce...

Photonic Computer - a "supercharged GPU" with very low energy consumption

Yes, we all wish for Quantum Computers... but in the meantime we need something here and now! Could Photonic Computers fit that role? Just about everyone has heard of fiber optics – using light for data transmission – but did you know that light can also be used for computing? There's a new commercial product expected for early next year (2022) . I contacted the CEO, Nicholas Harris, of a 4-y.o. startup, Lightmatter , interviewed in April 2021 here . Photonic computers, at least in their first commercial appearance, are essentially accelerator cards for Linear Algebra - and so of special interest for Machine Learning and some types of simulations. Their claims are remarkable: 10X faster than some of the best GPUs using 90% less energy can be used with existing software stacks, such as TensorFlow commercially available early next year (2022) a lot of future growth, as additional wavelengths of light get used in parallel My own inte...

Life123 : Quantitative Dynamical Modeling of Biological Systems

(UPDATED 8/2022) - Are we ready to embark on a next-generation detailed quantitative modeling of complex biological systems , including whole-cell simulations? An anticipated up-jump in computing power may be imminent from Photonics computers (which I discuss here ), and GPU's are rapidly gaining power as well... Are we in ready state to put existing - and upcoming - power to good use? This is a manifest, and a call to action What's Life123? It's about detailed quantitative modeling of biological systems in 1-D, 2-D and full 3-D, as well as a multi-faceted software platform for doing so. What's (pseudo-)1D? For now, let's say it's like the inside of a long, thin tube - with no interactions with the tube. Likewise, (pseudo-)2D can be thought of as a Petri dish, with no interactions with the lid or the bottom. Website : https://life123.science A purposeful decision to also utilize 1D and 2D But why? Yes, it's in part about "walk before you run...

Graph Database SERIES - Beginners to Advanced, Full-Stack (Neo4j focus)

A growing, ongoing series on Graph Databases and Neo4j. The series has grown so large that I'm "factoring out" their Table of Contents as a separate entry, here! A SERIES on Graph Databases and Neo4j part 0 (alternate version of part 1 for a general, non-technical audience) : What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets" part 1 : Intro to Graph Databases (Neo4j) - a revolution in modeling the real world! part 2 : Neo4j Sandbox Tutorial : try Neo4j and learn Cypher the free and easy way part 3 : Neo4j & Cypher Tutorial : Getting Started with a Graph Database and its Query Language part 4 : Using Neo4j with Python : the Open-Source Library GraphAccess part 5 : Using Schema in Graph Databases such as Neo4j part 6 : Putting it All Together - a Technology Stack on Top of a Graph Database SPECIAL TOPICS *...

Julian's Polymath Explorations

Search This Blog