Skip to main content

Interactomics + Super (or Quantum) Computers + Machine Learning : the Future of Medicine?

[Updated Mar. 2022]

Interactomics today bears a certain resemblance to genomics in the  1990s...  Big gaps in knowledge, but an explosively-growing field of great promise.

If you're unfamiliar with the terms, genomics is about deciphering the gene sequence of an organism, while interactomics is about describing all the relevant bio-molecules and their web of interactions.

A Detective Story

Think of a good police-detective story; typically there is a multitude of characters, and an impossible-to-remember number of relationships: A hates B, who loves C, who had a crush on D, who always steers clear of E, who was best friends with A until D arrived...

Yes, just like those detective stories, things get very complex with our biological story!  Examples of webs of interactions, familiar to many who took intro biology, are the Krebs cycle for metabolism or the Calvin cycle to fix carbon into sugars in plant photosynthesis.

Now, imagine vastly expanding those cycles of reactions - the bane of biochem students who need to memorize them - to cover all the cellular functions, in all cell types, at various points in time, in various organism.  Oh, and add quantitative information, such as concentration (a function of location and time), and reaction parameters...

Welcome to Interactomics :)
[We choose to go to the Moon in this decade and do the other things,] not because they are easy, but because they are hard;  [...] because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one we intend to win  -J. F. Kennedy's speech

The Characters   

Back to the detective-story analogy, who are the "characters"?  Well, the genome (DNA) is well-described.  The proteins not as well.  The number of proteins in humans (the "proteome") is some 20 to 30 thousand.  The Human Proteome Map (HPM) project mentions about 30,000 proteins - and that's without counting proteolysis events (protein breakdowns), and other post-translational modifications ("translation" is one of major steps in the generation of new proteins.)  In another blog entry, I provide a primer on the complexities of proteins.

In addition to DNA and proteins, the "cast of actors" of course also includes a variety of other biomolecules, such as RNA, lipids and ATP (the molecule widely used for energy storage), not to mention various small molecules.

The Interactions

Just as our detective story would get dull if it the characters remained in isolation, the story of Life gets interesting when the biomolecules start interacting with one another.  In principle, with 30,000 proteins, one could have about 450 million pairwise interactions!  Fortunately, proteins tend to be specific in their interactions, so many of the conceptual pairings don't actually occur.

Still, the numbers are large.  And that's just the protein-protein interactions!  Needless to say, protein-DNA interactions are equally vital (in particular, to regulate gene expression), and other biomolecules cannot be left out, either...

So, we have large numbers of "actors" (bio-molecules) and dizzying numbers of "relationships" (reactions.)

The Missing Parts

To further complicate matters, not all "actors" have been characterized yet.  And that's even more true for the "relationships."  Projects such as the REACTOME have been hard at work to round up all the known interactions.  I think it's fascinating to take a peek at their interactive Pathway Browser : to me, it feels like being let in "past the curtains" to peek at the Inner Workings of the life force!

But what to do with the unknown parts?  Generally speaking, they can be explored experimentally or with computer simulations ("molecular dynamics" simulations.)

Molecular Dynamics Computing

"Molecular dynamics" simulations are very complex, even with powerful computers.  In brief, that's because of the large number of atomic nuclei and electrons in biomolecules, interacting with all other atomic nuclei and electrons (less so with those farther away, but still a large number interactions.)

Hence the Supercomputers and Quantum Computers mentioned in the title.  Supercomputers have been riding the recent revolution in GPU performance.  "For the first time in history, most of the flops added to the TOP500 [supercomputer] list came from GPUs instead of CPUs" (June 2018 article.)

At the end of 2020, a breakthrough Machine-Learning approach, the AlphaFold 2 project by Google's DeepMind company, has been able to find patterns in known protein shapes, to the point of fairly accurately predicting shapes of other proteins (5-min summary, slightly more detailed intromore depth, and a Nov. 2020 article in journal Nature)

And quantum computers are expected to be especially helpful for simulating molecular dynamics.  A topics for future blog entries!  For now, let me just mention a 12-minute PBS video with one of the best intros to qubits and quantum computing (especially its underlying math.)

Systems Biology : Quantitative Dynamical Modeling

Let's try to put it all together.  We have a relatively good set of "actors" and a rather incomplete set of relationships.  What's next?  Quantitative dynamical modeling!  That means: how does a system evolve with time, given an initial state and the interactions among its components.

But how do we do that, given our rather limited knowledge of "initial conditions" (for example, concentrations in each of the grid partitions introduced for modeling the cell), and given our wobbly knowledge of reaction parameters?

Well...  unknown initial conditions...  partially known "weights"...  that's a job for Machine-Learning style optimization techniques!  Perhaps a mix of gradient descent and genetic algorithms (i.e. artificial Directed Evolution, one of my research areas in Theoretical Neuroscience.)  

But what's the counterpart of the "loss function", aka "fitness function" (that is, a gauge of how well the system is performing)?  That seems hard to define, but a simulated cell that can divide appropriately, and interact with simulated environments in ways that mimic real cells - i.e. exhibit appropriate phenotypes - could be equated to better performance scores.  In the words of this 2015 article in Trends in Cell Biology, Why build whole-cell models? : "quantify variation in how individual cells in a population express a set of genes in response to an environmental signal.”

Machine Learning approaches are also discussed in this 2012 article in the Proceedings of the National Academy of Science, Computational design of genomic transcriptional networks with adaptation to varying environments.  Of course, Machine Learning has many more immediate uses in medicine, such as finding cancerous patterns in medical images (here's a 2019 article on AI in Cancer Imaging), but the focus of this blog entry is quantitative modeling of the cell.

Yes, it's a tall order.  A good place to start is probably the simplest of organism.  For example, here's a fascinating 2012 article in Cell, where whole-cell simulations are applied to Mycoplasma genitalium, one of the simplest bacteria known, with just 525 genes in its genome.  In that article, the authors' computer simulations provided insight into that bacterium protein-DNA association, and into its replication.

Envisioning the Future

A possible sequence of events that could profoundly shape Medicine in the 21st century is quantitative dynamical modeling of prokaryote (bacterial) cells, followed by quantitative modeling of eukaryote cells (complex cells with a nucleus, including human cells), and finally quantitative modeling of tissues and finally of whole systems/organisms.

How will all that unfold?  Among the key players, I envision institutions or companies that are fluent in bringing together the best available biological datasets (such as the REACTOME) and their frequent updates.  And then work closely with academia and private companies to add quantitative dynamical modeling and machine learning.  A mix of open-source/open-data and licensed, might be especially good - to work tightly with academia and public institutions, while at the same time raise money for operations and research.

Bringing Together the Community

In March 2022, an open-source project called Life123 was launched, to lay the foundation for an important element of this ambitious project : a new-generation approach to Dynamical Modeling for Systems Biology.  Its goals and plans are described in this article.

Comments

Popular posts from this blog

Discussing Neuroscience with ChatGPT

UPDATED Apr. 2023 - I'm excited by ChatGPT 's possibilities in terms of facilitating advanced learning .  For example, I got enlightening answers to questions that I had confronted when I first studied neuroscience.  The examples below are taken from a very recent session I had with ChatGPT (mid Jan. 2023.) Source: https://neurosciencestuff.tumblr.com In case you're not familiar with ChatGPT, it's a very sophisticated "chatbot" - though, if you call it that way, it'll correct you!  'I am not a "chatbot", I am a language model, a sophisticated type of AI algorithm trained on vast amounts of text data to generate human-like text'. For a high-level explanation of how ChatGPT actually works - which also gives immense insight into its weaknesses, there's an excellent late Jan. 2023 talk by Stephen Wolfram, the brilliant author of the Mathematica software and of Wolfram Alpha , a product that could be combined with ChatGPT to imp

Using Schema in Graph Databases such as Neo4j

UPDATED Feb. 2024 - Graph databases have an easygoing laissez-faire attitude: "express yourself (almost) however you want"... By contrast, relational databases come across with an attitude like a micro-manager:  "my way or the highway"... Is there a way to take the best of both worlds and distance oneself from their respective excesses, as best suited for one's needs?  A way to marry the flexibility of Graph Databases and the discipline of Relational Databases? This article is part 5 of a growing,  ongoing  series  on Graph Databases and Neo4j Let's Get Concrete Consider a simple scenario with scientific data such as the Sample, Experiment, Study, Run Result , where Samples are used in Experiments, and where Experiments are part of Studies and produce Run Results.  That’s all very easy and intuitive to represent and store in a Labeled Graph Database such as Neo4j .   For example, a rough draft might go like this:   The “labels” (black tags) represent

Anti-Aging Research: Science, not Hype

Last updated May 2023 Q: "How is aging a disease?" A: It's a dynamic system that veers away from its homeostasis (normal equilibrium point): hence a form of slow-progressing illness. Labeling it as 'natural' is a surrender to our traditional state of ignorance and powerlessness, which fortunately is beginning to be changed! Aging is "normal" only from the point of view of the "selfish gene", for whom the body is a disposable carrier. Individuals organisms - for whom self-preservation has a different meaning than for genes - have received scant help from evolution... with rare exceptions such as the T. dohrnii jellyfish (which I discuss here )... but now the time has finally arrived for our rational design to remedy some of the cellular flaws that evolution never bothered to correct!   The above is my standard answer to an oft-asked question. The science of aging is by all evidence very misunderstood by the general public.  Hype,

Graph Databases (Neo4j) - a revolution in modeling the real world!

UPDATED Oct. 2023 - I was "married" to Relational Databases for many years... and it was a good "relationship" full of love and productivity - but SOMETHING WAS MISSING! Let me backtrack.   In college, I got a hint of the "pre-relational database" days...  Mercifully, that was largely before my time, but  - primarily through a class - I got a taste of what the world was like before relational databases.  It's an understatement to say: YUCK! Gratitude for the power and convenience of Relational Databases and SQL - and relief at having narrowly averted life before it! - made me an instant mega-fan of that technology.  And for many years I held various jobs that, directly or indirectly, made use of MySQL and other relational databases - whether as a Database Administrator, Full-Stack Developer, Data Scientist, CTO or various other roles. UPDATE: This article is now part 1 of a growing, ongoing series on Graph Databases and Neo4j But ther

What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets"

  This is a very gentle introduction to the subject.  The subtitle is inspired by university courses such as "Physics for Poets"!  (if you're technically inclined, there's an alternate article for you.) It has been said that "The language of physics (or of God) is math".  On a similar note, it could be said that: The language of the biological world - or of any subject or endeavor involving complexity - is networks ('meshes') What is a network?  Think of  it as the familiar 'friends of friends' diagram from social media. Everywhere one turns in biology, there's a network – at the cellular level, tissue level, organ level, ecosystem level.  The weather and other earth systems are networks.  Human societal organization is a network.  Electrical circuits, the Internet, our own brains...  Networks are everywhere! What can we do with networks, to better understand the world around us, or to create something that we need? Broadly s

Using Neo4j with Python : the Open-Source Library "NeoAccess"

So, you want to build a python app or Jupyter notebook to utilize Neo4j, but aren't too keen on coding a lot of string manipulation to programmatic create ad-hoc Cypher queries?   You're in the right place: the NeoAccess library can do take care of all that, sparing you from lengthy, error-prone development that requires substantial graph-database and software-development expertise! This article is part 4 of a growing,  ongoing  series  on Graph Databases and Neo4j   "NeoAccess" is the bottom layer of the technology stack provided by the BrainAnnex open-source project .  All layers are very modular, and the NeoAccess library may also be used by itself , entirely separately from the rest of the technology stack.  (A diagram of the full stack is shown later in this article.) NeoAccess interacts with the Neo4j Python driver , which is provided by the Neo4j company, to access the database from Python; the API to access that driver is very powerful, but complex - and does

Neo4j Sandbox Tutorial : try Neo4j and learn Cypher - free and easy!

So, you have an itch to test-drive Neo4j and its Cypher query language.  Maybe you want to learn it, or evaluate it, or introduce colleagues/clients to it.  And you wish for: fast, simple and free! Well, good news: the Neo4j company kindly provides a free, short-term hosted solution called "the Neo4j sandbox" .  Extremely easy to set up and use! This article is part 2 of a growing, ongoing series on Graph Databases and Neo4j Register (free) for the Neo4j "Sandbox" Go to sandbox.neo4j.com , and register with a working email and a password.  That's it! Note that this same email/password will also let you into the Neo4j Community Forums and Support ; the same login for all: very convenient! Launch your instance - blank or pre-populated After registering, go to  sandbox.neo4j.com  , and follow the steps in the diagram below (the choices might differ, but the "Blank Sandbox" should always be there): Too good to be true?  Is there

Visualization of Graph Databases Using Cytoscape.js

(UPDATED APR. 2024)   I have ample evidence from multiple sources that there are strong unmet needs in the area of visualization of graph databases. And whenever there's a vacuum, vendors circle like vultures - with incomplete, non-customizable, and at times ridiculously expensive, closed-box proprietary solutions.   Fortunately, coming to the rescue is the awesome open-source cytoscape.js library ,  an offshoot of the "Cytoscape" project of the  Institute for Systems Biology , a project with a long history that goes back to 2002. One can do amazing custom solutions, relatively easily, when one combines this Cytoscape library with:   1) a front-end framework such as Vue.js   2) backend libraries (for example in python) to prepare and serve the data   For example, a while back I created a visualizer for networks of chemical reactions, for another open-source project I lead ( life123.science )   This visualizer will look and feel generally familiar to anyone who has eve

Neo4j & Cypher Tutorial : Getting Started with a Graph Database and its Query Language

You have a general idea of what Graph Databases - and Neo4j in particular - are...  But how to get started?  Read on! This article is part 3 of a growing,  ongoing  series  on Graph Databases and Neo4j   If you're new to graph databases, please check out part 1 for an intro and motivation about them.  There, we discussed an example about an extremely simple database involving actors, movies and directors...  and saw how easy the Cypher query language makes it to answer questions such as "which directors have worked with Tom Hanks in 2016" - questions that, when done with relational databases and SQL, turn into a monster of a query and an overly-complicated data model involving a whopping 5 tables! In this tutorial, we will actually carry out that query - and get acquainted with Cypher and the Neo4j browser interface in the process.  This is the dataset we'll be constructing: Get the database in place If you don't already have a database installed locally

Full-Text Search with the Neo4j Graph Database

(UPDATED May 2024)   Now that we have discussed a full technology stack based on Neo4j (or other graph databases), and that we a design and implementation available from the open-source project BrainAnnex.org  , what next?  What shall we build on top? Well, how about  Full-Text Search ?  This article is part of a growing, ongoing series on Graph Databases and Neo4j Full-Text Searching/Indexing The Brain Annex open-source project includes an implementation of a design that uses the convenient services of its Schema Layer , to provide indexing of word-based documents using Neo4j. The python class FullTextIndexing ( source code ) provides the necessary methods, and it can parse both plain-text and HTML documents (for example, used in "formatted notes"); parsing of PDF files and other formats will be added at a later date. No grammatical analysis ( stemming or lemmatizing ) is done on the text.  However, a long list of common word ("stop words") that g