Skip to main content

What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets"

 

This is a very gentle introduction to the subject.  The subtitle is inspired by university courses such as "Physics for Poets"!  (if you're technically inclined, there's an alternate article for you.)

It has been said that "The language of physics (or of God) is math".  On a similar note, it could be said that:

The language of the biological world - or of any subject or endeavor involving complexity - is networks ('meshes')

What is a network?  Think of  it as the familiar 'friends of friends' diagram from social media.

Everywhere one turns in biology, there's a network – at the cellular level, tissue level, organ level, ecosystem level.  The weather and other earth systems are networks.  Human societal organization is a network.  Electrical circuits, the Internet, our own brains...  Networks are everywhere!

What can we do with networks, to better understand the world around us, or to create something that we need?

Broadly speaking, one approach is to measure and compute quantities; for example, how they vary with time.  That's what happens in weather forecasting.  And that's something I explore in a research project, Life123.science , on dynamical modeling.  But that's another story.

In this article, we'll focus on another broach approach : looking at the parts of the network, and their interactions.  

A powerful, relatively new - but not bleeding edge - tool to store, search and retrieve the complex web of information in networks is something called a Graph Database.

Disclaimer : "Some Restrictions Apply – Graph NOT included!"

Let's get something of out of the way : the word "graph" in the name, has NOTHING whatsoever to do with the familiar "math graphs" you loved/hated in grade school!

It so happens that in higher math, "graph" is a formal name for networks: "little circles connected by lines".

So, there's no "graph" (in the way non-mathematicians use the word) in Graph Databases!  

This is a textbook lesson of what happens when the Marketing Department doesn't get consulted before naming a product!

ET Just Landed in a Small Town...

Let's get concrete!  Imagine a small town, or perhaps a neighborhood where people have been around and tend to know each other.

Suppose you're a Cultural Anthropologist - or an Extraterrestrial! - who would like to develop a deep understanding of what that community is like.  What kind of information would you be dealing with, and what does it look like?

Well, let's start with the people.  Imagine we represent each person with a little ball in a diagram, and we attach to it some basic information specific to that person, such as their name, DOB, profession, marital status, current political affiliation, etc.  Maybe not terribly exciting, is it?  But things get far more interesting the moment we start exploring the connections among those people...

For example, who lives with whom - and we draw a line whenever two people share a household.

Or, who is married to whom - and, again, we draw a line.

And then we can go wild with this!  For example, who knows whom?  Who is relatives with whom?  Who is friends with whom?   Who lusts after whom?  Has gone to school with whom?  Works for whom?  Etc., etc.   Correspondingly, draw lines between people, labeling those lines with the name of the relationship, such as "lives with", "works for", "has gone to school with", etc.



Voila', just a few arrows among people tell quite a story: Eve and Joe are a married couple who live together but lack passion.  Lo and behold, Eve (probably secretly) lusts after Max, a friend of her husband with whom she went to school...  and - the plot thickens - Max has a thing for Joe, not reciprocated!  A ton of insight!

The above scenario was created using a commercial Graph Database named "Neo4j", also available as a free open-source product.  What does the "4j" stand for?  There's a technical story behind it - but I like to think that it means "4 (for) Julian"!!

Relationship (link) information can be stored with other types of databases, including old classics ("SQL") that first emerged in the 1970's... but not natively : it is clunky and indirect to use those tools to store, search and retrieve networks - i.e. what Graph Databases are designed to easily do.

Incremental Changes

Another huge shortcoming of the old venerable databases ("SQL"), besides lack of native support for relationships, is that they tend to be rigid, and relatively hard/awkward to deal with design change, especially when - you can guess! - relationships are involved.  These old classic tools are better suited for situations where everything is fully planned in advance, and rarely changes - yep, the diametrical opposite of research environments, and of other complex business or science endeavors!

By contrast, Graph Databases are very nimble - and extremely friendly to making incremental changes...  as real life typically dictates!

Let's make an incremental change to our earlier example : we have only a coarse/vague knowledge of Eve and Max having gone to school together.  But which school?  Which level?  And what did Joe attend instead?

By introducing blue circles representing schools, and dropping the old (now redundant) HAS_GONE_TO_SCHOOL_WITH relationship between Eve and Max, we can "incrementally upgrade" to an enhanced, finer understanding:

Eve and Max have both attended "Jefferson High"; that's a finer level of understanding than just knowing they had gone to school together.  Joe, by contrast, attended "Washington High".
There might be an insight hiding behind the shared high school of Eve and Max : maybe social class or personality type; perhaps that contributes to making Eve feel closer to Max than to her husband?

The Process of Discovery

To get even more insight, we can distinguish among degrees; for example, the "FRIENDS_WITH" relationships (lines) could be labeled with quantifiers such as "casual", "close", "best friend", etc.  That adds a lot of nuances!   (To avoid clutter, we won't show it in our drawings.)

Now, imagine you zoom out of the diagram, full of little circles - and a big web of interactions across all directions.  

Now you're beginning to have a deep insight...  

That is, if you have the tools to read and manipulate such a large, possibly gigantic, diagram!  Well, that's where a Graph Database (combined with a suitable user interface) comes to the rescue!

Now we have the tools to have a deeper understanding about that community.  And we can also formulate interesting questions, such as "who lives with their best friend?  Is that common?"  The sky is the limit!

And then we could take a further step: we had little circles to represent people and schools.  Now, let's consider another entity that's very applicable to understand a small community, namely businesses : one circle for each business in the community.  And just like we did with the people, we start with the boring part, attaching to  each circle things like the name of the business, the date it was established, etc.

As usual, things get more interesting when we start creating connections.  

For starters, connections between people and business, such as "works at" or "used to work at" ; just like before, we draw lines.  We can also draw lines between businesses, such as "supplies to" or "competes against".

Now we can answer more elaborate questions that combine various elements, such as "are there spouses who work for competing businesses?"

Let's revisit our earlier example with additional information about Businesses (green) and Business Types (red):

Well, we just discovered that Eve and Max - while NOT co-workers - both work in the food industry.  Maybe that's another element in Eve's attraction to Max?  Both are foodies?

And Joe isn't just a generic "store clerk" (as we saw before): more specifically, he works at "Brown's Hardware" - which further layers of network may reveal is located in another town, some distance away; that reduced time together could be corroding his marriage with Eve...

Graph databases can greatly help such a process of discovery.

Have you noticed how detectives in films have a penchant for drawing something very similar to the above diagram, in their process of discovery?   The following screenshot is from a documentary about a huge bank heist in Brazil (bonus points if you can find me, photoshopped among the suspects!)

But how do you get answers to your questions, in search of lookups or discovery?  You probably wouldn't want to chase 1,000's of little circles and millions of lines with pen and paper!  Well, that's where our old friend, the Graph Databases, comes in really handy!

Graph Databases contain a powerful programming language (called a "query language") that makes asking those questions very easy - with some training. 

Typical end users would use a graphical user interface that further simplifies the process.

Piling Layer Upon Layer Upon Layer

So far, we have looked at networks of various relationships among people ("is married to"), among businesses ("supplies to"), and crossing over between people and businesses ("works at"), or people and schools ("attended".)

But why stop now?  We could add new entities such as "Dwellings",  "Professions", "Towns", "Nations", "Illnesses", etc.

And, alongside them, all sorts of relationships!  Just a few examples:

  • Between Schools : "prepares for"  (e.g. a Middle School to a High School)
  • From People to Illnesses:  "suffers from"
  • From People to Dwellings : "resides in"
  • Between Dwellings : "is adjacent to"
  • From People to Towns : "lives in", "used to live in"
  • From Business to Towns : "headquartered in", "has a branch in"
  • Between Business Types : "subcategory of"   (e.g. "Bakery" is of type "Food")

At this point, we have reached a Deep Understanding - as deep as we need it to be - of that community.

And we can ask incredibly convoluted questions:  

"Did co-workers married to each other typically attend the same school?"   

"How common is it for best friends to end up working in the same broad industry (i.e. business type)?" 

"Does attending a particular middle school have a strong influence on profession?"  

The sky is the limit!  But one needs a heavy-duty tool to keep, and sift thru, all this tangled web of information.  We need a Super Hero : this is a job for Graph Databases!

A Compelling Need for Graph Databases?

Ok, the world often naturally manifests itself as networks...  and the unfortunately-named Graph Databases (better called "Network Databases") are particularly adapted at dealing with networks....

But humanity has managed for a long time to keep track of things with other tools, such as spreadsheets and their beefed-up cousins "Relational (SQL) Databases."  So, is there a real scientific, technical or business need for Graph Databases?

Can we use different tools?  The short answer is "yes" - but, do we really want to?  I'll make a point that we don't. Examples abound of endeavors where simpler, less specialized, tools could be used... but it's probably unwise to do so.   You could build a house with hand saws and hammers - and indeed people used to - but would you do that, rather than turn to power tools?  Most likely, not!

Likewise, one could write computer programs in the early languages, such as BASIC, from the days of the first Personal Computers of the 1980's...  But would you want to?  Sensibly, nowadays one would make use of more sophisticated modern languages and related tools, such as Python or JavaScript, to simplify and speed up the task, not to mention simplify maintenance.

Well, various database tools that have generally been around since the 1970's ("Relational SQL Databases") could be used to store any type of data of any complexity, but it doesn't mean that they are the ideal tool for the job - in particular when :

  1. the data is very interlinked (i.e. lots of relationships)
  2. we are in research environment with a design in flux, and need for many incremental changes - as is often the case in the biomedical sciences and in many other fields

Graph Databases provide NATIVE support for networks - which is convenient, streamlined and productive - while with the alternative older tools, it takes concerted effort to represent networks of relationships, deal with many incremental changes, and carry out a process of complex discovery.

You might say, "but my data is simple: I just want to store the prices of the ice creams I sell in my store."  Well, tomorrow, you may well want to store the price history of the ice creams, the suppliers of the various flavors, the large corporate clients that your store caters to, and their orders, etc. etc.  Anything that isn't just a shopping list, easily grows into a complex web of information to manage!

In simpler words, it's wise not try to fit a square peg (the old tools) in a round hole (the Real World)!

The "Hand of God"

Networks pop up everywhere.  Kick up dirt - and a network will pop up.  Literally!  A network of micro-organism in the top soil, and their ecosystem...  

And then, transportation networks...  A network of computers called the Internet (perhaps you've heard of it?)...  Trade networks...  Disease-transmission networks...  Power grids...  Neural networks...

How about we end this article with a network so spectacular that beholding it feels like peering at the "Hand of God"?

That "hand" is often depicted like in the cover image at the top of this article... but this is how I'd depict it:

In the above diagram, the little circles are hidden for de-cluttering.  That's taken from Reactome.org , an organization that compiles and curates data about all the known interactions among molecules in human cells.

That network is what infuses life to us!

Want More?

This introduction has been about general concepts and their motivation.  If you want to hear about actual products, and how to use them - for example to find out which movie director worked with Tom Hanks in 2016  - check out this more technical series of articles.

No worries though, that series starts out very gently - just with a little more concrete details - and then builds up gradually in complexity for professionals interested in the subject.

Comments

Popular posts from this blog

Discussing Neuroscience with ChatGPT

UPDATED Apr. 2023 - I'm excited by ChatGPT 's possibilities in terms of facilitating advanced learning .  For example, I got enlightening answers to questions that I had confronted when I first studied neuroscience.  The examples below are taken from a very recent session I had with ChatGPT (mid Jan. 2023.) Source: https://neurosciencestuff.tumblr.com In case you're not familiar with ChatGPT, it's a very sophisticated "chatbot" - though, if you call it that way, it'll correct you!  'I am not a "chatbot", I am a language model, a sophisticated type of AI algorithm trained on vast amounts of text data to generate human-like text'. For a high-level explanation of how ChatGPT actually works - which also gives immense insight into its weaknesses, there's an excellent late Jan. 2023 talk by Stephen Wolfram, the brilliant author of the Mathematica software and of Wolfram Alpha , a product that could be combined with ChatGPT to imp

Graph Databases (Neo4j) - a revolution in modeling the real world!

UPDATED Oct. 2023 - I was "married" to Relational Databases for many years... and it was a good "relationship" full of love and productivity - but SOMETHING WAS MISSING! Let me backtrack.   In college, I got a hint of the "pre-relational database" days...  Mercifully, that was largely before my time, but  - primarily through a class - I got a taste of what the world was like before relational databases.  It's an understatement to say: YUCK! Gratitude for the power and convenience of Relational Databases and SQL - and relief at having narrowly averted life before it! - made me an instant mega-fan of that technology.  And for many years I held various jobs that, directly or indirectly, made use of MySQL and other relational databases - whether as a Database Administrator, Full-Stack Developer, Data Scientist, CTO or various other roles. UPDATE: This article is now part 1 of a growing, ongoing series on Graph Databases and Neo4j But ther

Anti-Aging Research: Science, not Hype

Last updated May 2023 Q: "How is aging a disease?" A: It's a dynamic system that veers away from its homeostasis (normal equilibrium point): hence a form of slow-progressing illness. Labeling it as 'natural' is a surrender to our traditional state of ignorance and powerlessness, which fortunately is beginning to be changed! Aging is "normal" only from the point of view of the "selfish gene", for whom the body is a disposable carrier. Individuals organisms - for whom self-preservation has a different meaning than for genes - have received scant help from evolution... with rare exceptions such as the T. dohrnii jellyfish (which I discuss here )... but now the time has finally arrived for our rational design to remedy some of the cellular flaws that evolution never bothered to correct!   The above is my standard answer to an oft-asked question. The science of aging is by all evidence very misunderstood by the general public.  Hype,

Using Schema in Graph Databases such as Neo4j

UPDATED Feb. 2024 - Graph databases have an easygoing laissez-faire attitude: "express yourself (almost) however you want"... By contrast, relational databases come across with an attitude like a micro-manager:  "my way or the highway"... Is there a way to take the best of both worlds and distance oneself from their respective excesses, as best suited for one's needs?  A way to marry the flexibility of Graph Databases and the discipline of Relational Databases? This article is part 5 of a growing,  ongoing  series  on Graph Databases and Neo4j Let's Get Concrete Consider a simple scenario with scientific data such as the Sample, Experiment, Study, Run Result , where Samples are used in Experiments, and where Experiments are part of Studies and produce Run Results.  That’s all very easy and intuitive to represent and store in a Labeled Graph Database such as Neo4j .   For example, a rough draft might go like this:   The “labels” (black tags) represent

Using Neo4j with Python : the Open-Source Library "NeoAccess"

So, you want to build a python app or Jupyter notebook to utilize Neo4j, but aren't too keen on coding a lot of string manipulation to programmatic create ad-hoc Cypher queries?   You're in the right place: the NeoAccess library can do take care of all that, sparing you from lengthy, error-prone development that requires substantial graph-database and software-development expertise! This article is part 4 of a growing,  ongoing  series  on Graph Databases and Neo4j   "NeoAccess" is the bottom layer of the technology stack provided by the BrainAnnex open-source project .  All layers are very modular, and the NeoAccess library may also be used by itself , entirely separately from the rest of the technology stack.  (A diagram of the full stack is shown later in this article.) NeoAccess interacts with the Neo4j Python driver , which is provided by the Neo4j company, to access the database from Python; the API to access that driver is very powerful, but complex - and does

Neo4j Sandbox Tutorial : try Neo4j and learn Cypher - free and easy!

So, you have an itch to test-drive Neo4j and its Cypher query language.  Maybe you want to learn it, or evaluate it, or introduce colleagues/clients to it.  And you wish for: fast, simple and free! Well, good news: the Neo4j company kindly provides a free, short-term hosted solution called "the Neo4j sandbox" .  Extremely easy to set up and use! This article is part 2 of a growing, ongoing series on Graph Databases and Neo4j Register (free) for the Neo4j "Sandbox" Go to sandbox.neo4j.com , and register with a working email and a password.  That's it! Note that this same email/password will also let you into the Neo4j Community Forums and Support ; the same login for all: very convenient! Launch your instance - blank or pre-populated After registering, go to  sandbox.neo4j.com  , and follow the steps in the diagram below (the choices might differ, but the "Blank Sandbox" should always be there): Too good to be true?  Is there

Visualization of Graph Databases Using Cytoscape.js

(UPDATED APR. 2024)   I have ample evidence from multiple sources that there are strong unmet needs in the area of visualization of graph databases. And whenever there's a vacuum, vendors circle like vultures - with incomplete, non-customizable, and at times ridiculously expensive, closed-box proprietary solutions.   Fortunately, coming to the rescue is the awesome open-source cytoscape.js library ,  an offshoot of the "Cytoscape" project of the  Institute for Systems Biology , a project with a long history that goes back to 2002. One can do amazing custom solutions, relatively easily, when one combines this Cytoscape library with:   1) a front-end framework such as Vue.js   2) backend libraries (for example in python) to prepare and serve the data   For example, a while back I created a visualizer for networks of chemical reactions, for another open-source project I lead ( life123.science )   This visualizer will look and feel generally familiar to anyone who has eve

Neo4j & Cypher Tutorial : Getting Started with a Graph Database and its Query Language

You have a general idea of what Graph Databases - and Neo4j in particular - are...  But how to get started?  Read on! This article is part 3 of a growing,  ongoing  series  on Graph Databases and Neo4j   If you're new to graph databases, please check out part 1 for an intro and motivation about them.  There, we discussed an example about an extremely simple database involving actors, movies and directors...  and saw how easy the Cypher query language makes it to answer questions such as "which directors have worked with Tom Hanks in 2016" - questions that, when done with relational databases and SQL, turn into a monster of a query and an overly-complicated data model involving a whopping 5 tables! In this tutorial, we will actually carry out that query - and get acquainted with Cypher and the Neo4j browser interface in the process.  This is the dataset we'll be constructing: Get the database in place If you don't already have a database installed locally

Full-Text Search with the Neo4j Graph Database

(UPDATED May 2024)   Now that we have discussed a full technology stack based on Neo4j (or other graph databases), and that we a design and implementation available from the open-source project BrainAnnex.org  , what next?  What shall we build on top? Well, how about  Full-Text Search ?  This article is part of a growing, ongoing series on Graph Databases and Neo4j Full-Text Searching/Indexing The Brain Annex open-source project includes an implementation of a design that uses the convenient services of its Schema Layer , to provide indexing of word-based documents using Neo4j. The python class FullTextIndexing ( source code ) provides the necessary methods, and it can parse both plain-text and HTML documents (for example, used in "formatted notes"); parsing of PDF files and other formats will be added at a later date. No grammatical analysis ( stemming or lemmatizing ) is done on the text.  However, a long list of common word ("stop words") that g