What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets"

This is a very gentle introduction to the subject. The subtitle is inspired by university courses such as "Physics for Poets"! (if you're technically inclined, there's an alternate article for you.)

It has been said that "The language of physics (or of God) is math". On a similar note, it could be said that:

The language of the biological world - or of any subject or endeavor involving complexity - is networks ('meshes')

What is a network? Think of it as the familiar 'friends of friends' diagram from social media.

Everywhere one turns in biology, there's a network – at the cellular level, tissue level, organ level, ecosystem level. The weather and other earth systems are networks. Human societal organization is a network. Electrical circuits, the Internet, our own brains... Networks are everywhere!

What can we do with networks, to better understand the world around us, or to create something that we need?

Broadly speaking, one approach is to measure and compute quantities; for example, how they vary with time. That's what happens in weather forecasting. And that's something I explore in a research project, Life123.science , on dynamical modeling. But that's another story.

In this article, we'll focus on another broad approach : looking at the parts of the network, and their interactions.

A powerful, relatively new - but not bleeding edge - tool to store, search and retrieve the complex web of information in networks is something called a Graph Database.

Disclaimer : "Some Restrictions Apply – Graph NOT included!"

Let's get something of out of the way : the word "graph" in the name, has NOTHING whatsoever to do with the familiar "math graphs" you loved/hated in grade school!

It so happens that in higher math, "graph" is a formal name for networks: "little circles connected by lines".

So, there's no "graph" (in the way non-mathematicians use the word) in Graph Databases!

This is a textbook lesson of what happens when the Marketing Department doesn't get consulted before naming a product!

ET Just Landed in a Small Town...

Let's get concrete! Imagine a small town, or perhaps a neighborhood where people have been around and tend to know each other.

Suppose you're a Cultural Anthropologist - or an Extraterrestrial! - who would like to develop a deep understanding of what that community is like. What kind of information would you be dealing with, and what does it look like?

Well, let's start with the people. Imagine we represent each person with a little ball in a diagram, and we attach to it some basic information specific to that person, such as their name, DOB, profession, marital status, current political affiliation, etc. Maybe not terribly exciting, is it? But things get far more interesting the moment we start exploring the connections among those people...

For example, who lives with whom - and we draw a line whenever two people share a household.

Or, who is married to whom - and, again, we draw a line.

And then we can go wild with this! For example, who knows whom? Who is relatives with whom? Who is friends with whom? Who lusts after whom? Has gone to school with whom? Works for whom? Etc., etc. Correspondingly, draw lines between people, labeling those lines with the name of the relationship, such as "lives with", "works for", "has gone to school with", etc.

Voila', just a few arrows among people tell quite a story: Eve and Joe are a married couple who live together but lack passion. Lo and behold, Eve (probably secretly) lusts after Max, a friend of her husband with whom she went to school... and - the plot thickens - Max has a thing for Joe, not reciprocated! A ton of insight!

The above scenario was created using a commercial Graph Database named "Neo4j", also available as a free open-source product. What does the "4j" stand for? There's a technical story behind it - but I like to think that "4j" means "4 (for) Julian"!!

Relationship (link) information can be stored with other types of databases, including old classics ("SQL") that first emerged in the 1970's... but not natively : it is clunky and indirect to use those tools to store, search and retrieve networks - i.e. what Graph Databases are designed to easily do.

Incremental Changes

Another huge shortcoming of the old venerable databases ("SQL"), besides lack of intuitive native support for relationships, is that they tend to be rigid, and relatively hard/awkward to deal with design change, especially when - you can guess! - relationships are involved. These old classic tools are better suited for situations where everything is fully planned in advance, and rarely changes - yep, the diametrical opposite of research environments, and of other complex business or science endeavors!

By contrast, Graph Databases are very nimble - and extremely friendly to making incremental changes... as real life typically dictates!

Let's make an incremental change to our earlier example : we have only a coarse/vague knowledge of Eve and Max having gone to school together. But which school? Which level? And what did Joe attend instead?

By introducing blue circles representing schools, and dropping the old (now redundant) HAS_GONE_TO_SCHOOL_WITH relationship between Eve and Max, we can "incrementally upgrade" to an enhanced, finer understanding:

Eve and Max have both attended "Jefferson High"; that's a finer level of understanding than just knowing they had gone to school together. Joe, by contrast, attended "Washington High".

There might be an insight hiding behind the shared high school of Eve and Max : maybe social class or personality type; perhaps that contributes to making Eve feel closer to Max than to her husband?

The Process of Discovery

To get even more insight, we can distinguish among degrees; for example, the "FRIENDS_WITH" relationships (lines) could be labeled with quantifiers such as "casual", "close", "best friend", etc. That adds a lot of nuances! (To avoid clutter, we won't show it in our drawings.)

Now, imagine you zoom out of the diagram, full of little circles - and a big web of interactions across all directions.

Now you're beginning to have a deep insight...

That is, if you have the tools to read and manipulate such a large, possibly gigantic, diagram! Well, that's where a Graph Database (combined with a suitable user interface) comes to the rescue!

Now we have the tools to have a deeper understanding about that community. And we can also formulate interesting questions, such as "who lives with their best friend? Is that common?" The sky is the limit!

And then we could take a further step: we had little circles to represent people and schools. Now, let's consider another entity that's very applicable to understand a small community, namely businesses : one circle for each business in the community. And just like we did with the people, we start with the boring part, attaching to each circle things like the name of the business, the date it was established, etc.

As usual, things get more interesting when we start creating connections.

For starters, connections between people and business, such as "works at" or "used to work at" ; just like before, we draw lines. We can also draw lines between businesses, such as "supplies to" or "competes against".

Now we can answer more elaborate questions that combine various elements, such as "are there spouses who work for competing businesses?"

Let's revisit our earlier example with additional information about Businesses (green) and Business Types (red):

Well, we just discovered that Eve and Max - while NOT co-workers - both work in the food industry. Maybe that's another element in Eve's attraction to Max? Both are foodies?

And Joe isn't just a generic "store clerk": more specifically, he works at "Brown's Hardware" - which further layers of network may reveal is located in another town, some distance away; that reduced time together could be corroding his marriage with Eve...

Graph databases can greatly help such a process of discovery.

Have you noticed how detectives in films have a penchant for drawing something very similar to the above diagram, in their process of discovery? The following screenshot is from a documentary about a huge bank heist in Brazil (bonus points if you can find me, photoshopped among the suspects!)

But how do you get answers to your questions, in search of lookups or discovery? You probably wouldn't want to chase 1,000's of little circles and millions of lines with pen and paper! Well, that's where our old friend, the Graph Databases, comes in really handy!

Graph Databases contain a powerful programming language (called a "query language") that makes asking those questions very easy - with some training.

Typical end users would use a graphical user interface that further simplifies the process.

Piling Layer Upon Layer Upon Layer

So far, we have looked at networks of various relationships among people ("is married to"), among businesses ("supplies to"), and crossing over between people and businesses ("works at"), or people and schools ("attended".)

But why stop now? We could add new entities such as "Dwellings", "Professions", "Towns", "Nations", "Illnesses", etc.

And, alongside them, all sorts of relationships! Just a few examples:

Between Schools : "prepares for" (e.g. a Middle School to a High School)
From People to Illnesses: "suffers from"
From People to Dwellings : "resides in"
Between Dwellings : "is adjacent to"
From People to Towns : "lives in", "used to live in"
From Business to Towns : "headquartered in", "has a branch in"
Between Business Types : "subcategory of" (e.g. "Bakery" is of type "Food")

At this point, we have reached a Deep Understanding - as deep as we need it to be - of that community.

And we can ask incredibly convoluted questions:

"Did co-workers married to each other typically attend the same school?"

"How common is it for best friends to end up working in the same broad industry (i.e. business type)?"

"Does attending a particular middle school have a strong influence on profession?"

The sky is the limit! But one needs a heavy-duty tool to keep, and sift thru, all this tangled web of information. We need a Super Hero : this is a job for Graph Databases!

A Compelling Need for Graph Databases?

Ok, the world often naturally manifests itself as networks... and the unfortunately-named Graph Databases (better called "Network Databases") are particularly adapted at dealing with networks....

But humanity has managed for a long time to keep track of things with other tools, such as spreadsheets and their beefed-up cousins "Relational (SQL) Databases." So, is there a real scientific, technical or business need for Graph Databases?

Can we use different tools? The short answer is "yes" - but, do we really want to? I'll make a point that we don't.

Examples abound of endeavors where simpler, less specialized, tools could be used... but it's probably unwise to do so. You could build a house with hand saws and hammers - and indeed people used to - but would you do that, rather than turn to power tools? Most likely, not!

Likewise, one could write computer programs in the early languages, such as BASIC, from the days of the first Personal Computers of the 1980's... But would you want to? Sensibly, nowadays one would make use of more sophisticated modern languages and related tools, such as Python or JavaScript, to simplify and speed up the task, not to mention simplify maintenance.

Well, various database tools that have generally been around since the 1970's ("Relational SQL Databases") could be used to store any type of data of any complexity, but it doesn't mean that they are the ideal tool for the job - in particular when :

the data is very interlinked (i.e. lots of relationships)
we are in research environment with a design in flux, and need for many incremental changes - as is often the case in the biomedical sciences and in many other fields

Graph Databases provide NATIVE support for networks - which is convenient, streamlined and productive - while with the alternative older tools, it takes concerted effort to represent networks of relationships, deal with many incremental changes, and carry out a process of complex discovery.

You might say, "but my data is simple: I just want to store the prices of the ice creams I sell in my store." Well, tomorrow, you may well want to store the price history of the ice creams, the suppliers of the various flavors, the large corporate clients that your store caters to, and their orders, etc. etc. Anything that isn't just a shopping list, easily grows into a complex web of information to manage!

In simpler words, it's wise not try to fit a square peg (the old tools) in a round hole (the Real World)!

The "Hand of God"

Networks pop up everywhere. Kick up dirt - and a network will pop up. Literally! A network of micro-organism in the top soil, and their ecosystem...

And then, transportation networks... A network of computers called the Internet (perhaps you've heard of it?)... Trade networks... Disease-transmission networks... Power grids... Neural networks...

How about we end this article with a network so spectacular that beholding it feels like peering at the "Hand of God"?

That "hand" is often depicted like in the cover image at the top of this article... but this is how I'd depict it:

In the above diagram, the little circles are hidden for de-cluttering. That's taken from Reactome.org , an organization that compiles and curates data about all the known interactions among molecules in human cells.

That network is what infuses life to us!

Want More?

This introduction has been about general concepts and their motivation. If you want to hear about actual products, and how to use them - for example to find out which movie director worked with Tom Hanks in 2016 - check out this more technical series of articles.

No worries though, that series starts out very gently - just with a little more concrete details - and then builds up gradually in complexity for professionals interested in the subject.

Julian's Polymath Explorations

Search This Blog

What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets"

Disclaimer : "Some Restrictions Apply – Graph NOT included!"

ET Just Landed in a Small Town...

Incremental Changes

The Process of Discovery

Piling Layer Upon Layer Upon Layer

A Compelling Need for Graph Databases?

The "Hand of God"

Want More?

Labels

Comments

Post a Comment

Popular posts from this blog

Discussing Neuroscience with ChatGPT

Graph Databases (Neo4j) - a revolution in modeling the real world!

Using Schema in Graph Databases such as Neo4j

Using Neo4j with Python : the Open-Source Library "NeoAccess"

Interactomics + Super (or Quantum) Computers + Machine Learning : the Future of Medicine?

Visualization of Graph Databases Using Cytoscape.js

Photonic Computer - a "supercharged GPU" with very low energy consumption

Life123 : Quantitative Dynamical Modeling of Biological Systems

D3 Visualization with Vue.js : a powerful alliance (when done right!)