Skip to main content

Neo4j & Cypher Tutorial : Getting Started with a Graph Database and its Query Language

You have a general idea of what Graph Databases - and Neo4j in particular - are...  But how to get started?  Read on!


This article is part 3 of a growing, ongoing series on Graph Databases and Neo4j

 

If you're new to graph databases, please check out part 1 for an intro and motivation about them.  There, we discussed an example about an extremely simple database involving actors, movies and directors...  and saw how easy the Cypher query language makes it to answer questions such as "which directors have worked with Tom Hanks in 2016" - questions that, when done with relational databases and SQL, turn into a monster of a query and an overly-complicated data model involving a whopping 5 tables!

In this tutorial, we will actually carry out that query - and get acquainted with Cypher and the Neo4j browser interface in the process.  This is the dataset we'll be constructing:

Get the database in place

If you don't already have a database installed locally or on the cloud, the good news is that there's a super-easy way to try out Neo4j and its query language Cypher: the "Neo4j sandbox" that we discussed in Part2.

Ready to take it for a spin? 

Whether from your own install or from the convenient "sandbox", go ahead and start the "Neo4j browser", a UI that comes with Neo4j.  

If you ever used Jupyter Notebooks, you should feel right at home; however, be aware that the most recent cell is now on top rather at at the bottom (i.e. reversed order):

Populating the Database

If you just got a new account, the database is empty... but let's get in the habit of clearing everything out:

MATCH (n) DETACH DELETE n


Now, let's create the first "node" (record):

CREATE (a :actors {name: "Tom Hanks"})
RETURN a

With the above line, we're creating a node - think of it as a "database record" - and setting its label to "actors" and attribute "name" to the value "Tom Hanks".  And then returning it.

What's the "label"?   Very loosely speaking, think of it as a table name in a relational database.  Or think of it as a "class" or "type" of record.   Or a "tag".  Multiple labels are allowed on a node (in that respect, the "table" analogy doesn't hold.)

CYPHER SYNTAX

The round parentheses indicate a node.

The curly brackets encircle attribute declarations (may look familiar from JSON or JavaScript or Python dictionaries)

The n or a in the above examples are just a dummy names

Voila', you now have a solid foot in the door with Cypher!

If you wish to locate that record just created, you can issue:

MATCH (a :actors {name: "Tom Hanks"}) RETURN a

Now, locate the "Tom Hanks" actor record, and create a relationship from it to a new "movie" record:

MATCH (a :actors {name: "Tom Hanks"})
MERGE (a)-[:stars_in]->(m :movies {name: "Big", release_date:1988})
RETURN a, m

Notice the syntax "-[:stars_in]->" for a relationship, in the indicated direction and with the specified name, between two nodes.  The "MERGE" part will create new records and relationships as needed.

Similarly, locate the "Big" movie record, and create a relationship from it to a new "director" record:

MATCH (m :movies {name: "Big"})
MERGE (m)-[:directed_by]->(d :directors {name: "Penny Marshall"})
RETURN  m, d

Locate everything created so far:

MATCH (n) RETURN n


One last step - we'll use a similar process as earlier, but this time we'll create a new movie and a new director at once, with all their relationships.  As usual, we start by locating the node that is our starting point:

MATCH (a :actors {name: "Tom Hanks"})
MERGE (a)-[:stars_in]->
(m :movies {name: "California Typewriter", release_date:2016})-[:directed_by]->
(d :directors {name: "Doug Nichol"})
RETURN  a, m, d

At this point, we're finished building the dataset, and if you issue a "MATCH (n) RETURN n", you should see the image shown at the very top of this page.

Let's answer our Burning Question!

Finally, we can issue a query to easily answer the question "which directors have worked with Tom Hanks in 2016":

MATCH (a :actors) -- (m :movies) -- (d :directors)
WHERE a.name = "Tom Hanks" AND m.release_date = 2016
RETURN d.name

In the above Cypher query, the double hyphen (--) between the nodes indicates relationships whose name and direction we don't care about.  You can think of them as leaving out the middle part, and the arrow, in terms such as "-[:stars_in]->"

I phrased that query to sound more familiar to people coming from SQL.  Here's a more concise, alternate way, to state it:

MATCH (a :actors {name: "Tom Hanks"}) -- (m :movies {release_date: 2016}) -- (d :directors)
RETURN d.name

Now you're officially acquainted with Neo4j and Cypher!

More Cypher

Ready to learn more Cypher?  Here's a handy cheat sheet!   I also highly recommend the tutorials that are built into the Neo4j browser interface (the UI in the earlier screenshots; the tutorials are at the bottom.)

Please notice that the Cypher query language is open source (just like Neo4j itself) and, while developed by the Neo4j company and often associated with it, can also be used with other graph databases that implement it.  Info about "Open Cypher".

Interestingly, Amazon Neptune (an AWS hosted graph database), which initially was trying to bulldoze its own way (with arrogance reminiscent of the old Microsoft Internet Explorer browsers!) finally saw the light in 2021 and started supporting openCypher in their product.

Cypher is a very powerful query language: in addition to leveraging the power of graph databases - like we saw in the example we built - it can also implement cascades of instructions.  I nickname them "badass queries"!  Look for the WITH clause in the guides: "The WITH syntax is similar to RETURN. It separates query parts explicitly, allowing you to declare which variables to carry over to the next part."  Basically, a powerful handover of variables between multiple queries - all in just 1 statement.

The WITH clause allows query parts to be chained together, piping the results from one to be used as starting points or criteria in the next.


Want more extensive knowledge?

The Neo4j company provides a series of good courses at zero cost - and even offers free certifications that you can put on your LinkedIn profile: the Neo4j academy.

For beginners, I recommend the following 3 short courses:

  1. Neo4j Fundamentals
  2. Cypher Fundamentals
  3. Graph Data Modeling Fundamentals

 

Now that we've taken a closer look at Neo4j and Cypher, how to use it in actual typical projects?

Want to access Neo4j thru Python?   

Neo4j provides official support for a powerful but complex library called Neo4j Bolt Driver for Python (in some places referred to as the "Neo4j Python Driver".)

To make use of its power, but without getting bogged down with its complex low-level details,  I wrote a library to make Python interfacing to Neo4j easier, and released it to open source at the beginning of 2021 : https://github.com/BrainAnnex/neo4j-liaison .

As of Dec. 2021, it has been superseded by "NeoAccess", an expanded library that I released (source code on GitHub) as part of the new version of Brain Annex, also based on work that I and others did at GSK pharmaceuticals, and graciously made open source by the company.

The NeoAccess library is discussed in part 4.

The NeoAccess library also comes with an optional companion library, NeoSchema: a schema layer harmoniously brings together the best of the flexibility ("anything goes!") of graph databases and the "law and order" aspect of relational databases!  (For details, see part 5 of this article.)

Other open-source libraries exist to access Neo4j from Python: users with simple, limited needs, might benefit from Py2neo, and Django users might want to look at Neomodel  (details about both.)

Putting it All Together : a Technology Stack on top of a Graph Database

One typically needs a full data-management solution, not just a database.  The Schema Layer, briefly mentioned in the previous section, as well as an API and a UI, are all discussed in part 6 of this series.


This article is part 3 of a growing, ongoing series on Graph Databases and Neo4j

 

Comments

Popular posts from this blog

Discussing Neuroscience with ChatGPT

UPDATED Apr. 2023 - I'm excited by ChatGPT 's possibilities in terms of facilitating advanced learning .  For example, I got enlightening answers to questions that I had confronted when I first studied neuroscience.  The examples below are taken from a very recent session I had with ChatGPT (mid Jan. 2023.) Source: https://neurosciencestuff.tumblr.com In case you're not familiar with ChatGPT, it's a very sophisticated "chatbot" - though, if you call it that way, it'll correct you!  'I am not a "chatbot", I am a language model, a sophisticated type of AI algorithm trained on vast amounts of text data to generate human-like text'. For a high-level explanation of how ChatGPT actually works - which also gives immense insight into its weaknesses, there's an excellent late Jan. 2023 talk by Stephen Wolfram, the brilliant author of the Mathematica software and of Wolfram Alpha , a product that could be combined with ChatGPT to imp

Using Schema in Graph Databases such as Neo4j

UPDATED Feb. 2024 - Graph databases have an easygoing laissez-faire attitude: "express yourself (almost) however you want"... By contrast, relational databases come across with an attitude like a micro-manager:  "my way or the highway"... Is there a way to take the best of both worlds and distance oneself from their respective excesses, as best suited for one's needs?  A way to marry the flexibility of Graph Databases and the discipline of Relational Databases? This article is part 5 of a growing,  ongoing  series  on Graph Databases and Neo4j Let's Get Concrete Consider a simple scenario with scientific data such as the Sample, Experiment, Study, Run Result , where Samples are used in Experiments, and where Experiments are part of Studies and produce Run Results.  That’s all very easy and intuitive to represent and store in a Labeled Graph Database such as Neo4j .   For example, a rough draft might go like this:   The “labels” (black tags) represent

Anti-Aging Research: Science, not Hype

Last updated May 2023 Q: "How is aging a disease?" A: It's a dynamic system that veers away from its homeostasis (normal equilibrium point): hence a form of slow-progressing illness. Labeling it as 'natural' is a surrender to our traditional state of ignorance and powerlessness, which fortunately is beginning to be changed! Aging is "normal" only from the point of view of the "selfish gene", for whom the body is a disposable carrier. Individuals organisms - for whom self-preservation has a different meaning than for genes - have received scant help from evolution... with rare exceptions such as the T. dohrnii jellyfish (which I discuss here )... but now the time has finally arrived for our rational design to remedy some of the cellular flaws that evolution never bothered to correct!   The above is my standard answer to an oft-asked question. The science of aging is by all evidence very misunderstood by the general public.  Hype,

Graph Databases (Neo4j) - a revolution in modeling the real world!

UPDATED Oct. 2023 - I was "married" to Relational Databases for many years... and it was a good "relationship" full of love and productivity - but SOMETHING WAS MISSING! Let me backtrack.   In college, I got a hint of the "pre-relational database" days...  Mercifully, that was largely before my time, but  - primarily through a class - I got a taste of what the world was like before relational databases.  It's an understatement to say: YUCK! Gratitude for the power and convenience of Relational Databases and SQL - and relief at having narrowly averted life before it! - made me an instant mega-fan of that technology.  And for many years I held various jobs that, directly or indirectly, made use of MySQL and other relational databases - whether as a Database Administrator, Full-Stack Developer, Data Scientist, CTO or various other roles. UPDATE: This article is now part 1 of a growing, ongoing series on Graph Databases and Neo4j But ther

What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets"

  This is a very gentle introduction to the subject.  The subtitle is inspired by university courses such as "Physics for Poets"!  (if you're technically inclined, there's an alternate article for you.) It has been said that "The language of physics (or of God) is math".  On a similar note, it could be said that: The language of the biological world - or of any subject or endeavor involving complexity - is networks ('meshes') What is a network?  Think of  it as the familiar 'friends of friends' diagram from social media. Everywhere one turns in biology, there's a network – at the cellular level, tissue level, organ level, ecosystem level.  The weather and other earth systems are networks.  Human societal organization is a network.  Electrical circuits, the Internet, our own brains...  Networks are everywhere! What can we do with networks, to better understand the world around us, or to create something that we need? Broadly s

Using Neo4j with Python : the Open-Source Library "NeoAccess"

So, you want to build a python app or Jupyter notebook to utilize Neo4j, but aren't too keen on coding a lot of string manipulation to programmatic create ad-hoc Cypher queries?   You're in the right place: the NeoAccess library can do take care of all that, sparing you from lengthy, error-prone development that requires substantial graph-database and software-development expertise! This article is part 4 of a growing,  ongoing  series  on Graph Databases and Neo4j   "NeoAccess" is the bottom layer of the technology stack provided by the BrainAnnex open-source project .  All layers are very modular, and the NeoAccess library may also be used by itself , entirely separately from the rest of the technology stack.  (A diagram of the full stack is shown later in this article.) NeoAccess interacts with the Neo4j Python driver , which is provided by the Neo4j company, to access the database from Python; the API to access that driver is very powerful, but complex - and does

Neo4j Sandbox Tutorial : try Neo4j and learn Cypher - free and easy!

So, you have an itch to test-drive Neo4j and its Cypher query language.  Maybe you want to learn it, or evaluate it, or introduce colleagues/clients to it.  And you wish for: fast, simple and free! Well, good news: the Neo4j company kindly provides a free, short-term hosted solution called "the Neo4j sandbox" .  Extremely easy to set up and use! This article is part 2 of a growing, ongoing series on Graph Databases and Neo4j Register (free) for the Neo4j "Sandbox" Go to sandbox.neo4j.com , and register with a working email and a password.  That's it! Note that this same email/password will also let you into the Neo4j Community Forums and Support ; the same login for all: very convenient! Launch your instance - blank or pre-populated After registering, go to  sandbox.neo4j.com  , and follow the steps in the diagram below (the choices might differ, but the "Blank Sandbox" should always be there): Too good to be true?  Is there

Visualization of Graph Databases Using Cytoscape.js

(UPDATED APR. 2024)   I have ample evidence from multiple sources that there are strong unmet needs in the area of visualization of graph databases. And whenever there's a vacuum, vendors circle like vultures - with incomplete, non-customizable, and at times ridiculously expensive, closed-box proprietary solutions.   Fortunately, coming to the rescue is the awesome open-source cytoscape.js library ,  an offshoot of the "Cytoscape" project of the  Institute for Systems Biology , a project with a long history that goes back to 2002. One can do amazing custom solutions, relatively easily, when one combines this Cytoscape library with:   1) a front-end framework such as Vue.js   2) backend libraries (for example in python) to prepare and serve the data   For example, a while back I created a visualizer for networks of chemical reactions, for another open-source project I lead ( life123.science )   This visualizer will look and feel generally familiar to anyone who has eve

Full-Text Search with the Neo4j Graph Database

(UPDATED May 2024)   Now that we have discussed a full technology stack based on Neo4j (or other graph databases), and that we a design and implementation available from the open-source project BrainAnnex.org  , what next?  What shall we build on top? Well, how about  Full-Text Search ?  This article is part of a growing, ongoing series on Graph Databases and Neo4j Full-Text Searching/Indexing The Brain Annex open-source project includes an implementation of a design that uses the convenient services of its Schema Layer , to provide indexing of word-based documents using Neo4j. The python class FullTextIndexing ( source code ) provides the necessary methods, and it can parse both plain-text and HTML documents (for example, used in "formatted notes"); parsing of PDF files and other formats will be added at a later date. No grammatical analysis ( stemming or lemmatizing ) is done on the text.  However, a long list of common word ("stop words") that g