Neo4j & Cypher Tutorial : Getting Started with a Graph Database and its Query Language

You have a general idea of what Graph Databases - and Neo4j in particular - are... But how to get started? Read on!

This article is part 3 of a growing, ongoing series on Graph Databases and Neo4j

If you're new to graph databases, please check out part 1 for an intro and motivation about them. There, we discussed an example about an extremely simple database involving actors, movies and directors... and saw how easy the Cypher query language makes it to answer questions such as "which directors have worked with Tom Hanks in 2016" - questions that, when done with relational databases and SQL, turn into a monster of a query and an overly-complicated data model involving a whopping 5 tables!

In this tutorial, we will actually carry out that query - and get acquainted with Cypher and the Neo4j browser interface in the process. This is the dataset we'll be constructing:

Get the database in place

If you don't already have a database installed locally or on the cloud, the good news is that there's a super-easy way to try out Neo4j and its query language Cypher: the "Neo4j sandbox" that we discussed in Part2.

Ready to take it for a spin?

Whether from your own install or from the convenient "sandbox", go ahead and start the "Neo4j browser", a UI that comes with Neo4j.

If you ever used Jupyter Notebooks, you should feel right at home; however, be aware that the most recent cell is now on top rather at at the bottom (i.e. reversed order):

Populating the Database

If you just got a new account, the database is empty... but let's get in the habit of clearing everything out:

MATCH (n) DETACH DELETE n

Now, let's create the first "node" (record):

CREATE (a :actors {name: "Tom Hanks"})
RETURN a

With the above line, we're creating a node - think of it as a "database record" - and setting its label to "actors" and attribute "name" to the value "Tom Hanks". And then returning it.

What's the "label"? Very loosely speaking, think of it as a table name in a relational database. Or think of it as a "class" or "type" of record. Or a "tag". Multiple labels are allowed on a node (in that respect, the "table" analogy doesn't hold.)

CYPHER SYNTAX
The round parentheses indicate a node.
The curly brackets encircle attribute declarations (may look familiar from JSON or JavaScript or Python dictionaries)
The n or a in the above examples are just a dummy names

Voila', you now have a solid foot in the door with Cypher!

If you wish to locate that record just created, you can issue:

MATCH (a :actors {name: "Tom Hanks"}) RETURN a

Now, locate the "Tom Hanks" actor record, and create a relationship from it to a new "movie" record:

MATCH (a :actors {name: "Tom Hanks"})
MERGE (a)-[:stars_in]->(m :movies {name: "Big", release_date:1988})
RETURN a, m

Notice the syntax "-[:stars_in]->" for a relationship, in the indicated direction and with the specified name, between two nodes. The "MERGE" part will create new records and relationships as needed.

Similarly, locate the "Big" movie record, and create a relationship from it to a new "director" record:

MATCH (m :movies {name: "Big"})
MERGE (m)-[:directed_by]->(d :directors {name: "Penny Marshall"})
RETURN m, d

Locate everything created so far:

MATCH (n) RETURN n

One last step - we'll use a similar process as earlier, but this time we'll create a new movie and a new director at once, with all their relationships. As usual, we start by locating the node that is our starting point:

MATCH (a :actors {name: "Tom Hanks"})
MERGE (a)-[:stars_in]->
(m :movies {name: "California Typewriter", release_date:2016})-[:directed_by]->
(d :directors {name: "Doug Nichol"})
RETURN a, m, d

At this point, we're finished building the dataset, and if you issue a "MATCH (n) RETURN n", you should see the image shown at the very top of this page.

Let's answer our Burning Question!

Finally, we can issue a query to easily answer the question "which directors have worked with Tom Hanks in 2016":

MATCH (a :actors) -- (m :movies) -- (d :directors)
WHERE a.name = "Tom Hanks" AND m.release_date = 2016
RETURN d.name

In the above Cypher query, the double hyphen (--) between the nodes indicates relationships whose name and direction we don't care about. You can think of them as leaving out the middle part, and the arrow, in terms such as "-[:stars_in]->"

I phrased that query to sound more familiar to people coming from SQL. Here's a more concise, alternate way, to state it:

MATCH (a :actors {name: "Tom Hanks"}) -- (m :movies {release_date: 2016}) -- (d :directors)
RETURN d.name

Now you're officially acquainted with Neo4j and Cypher!

More Cypher

Ready to learn more Cypher? Here's a handy cheat sheet! I also highly recommend the tutorials that are built into the Neo4j browser interface (the UI in the earlier screenshots; the tutorials are at the bottom.)

Please notice that the Cypher query language is open source (just like Neo4j itself) and, while developed by the Neo4j company and often associated with it, can also be used with other graph databases that implement it. Info about "Open Cypher".

Interestingly, Amazon Neptune (an AWS hosted graph database), which initially was trying to bulldoze its own way (with arrogance reminiscent of the old Microsoft Internet Explorer browsers!) finally saw the light in 2021 and started supporting openCypher in their product.

Cypher is a very powerful query language: in addition to leveraging the power of graph databases - like we saw in the example we built - it can also implement cascades of instructions. I nickname them "badass queries"! Look for the WITH clause in the guides: "The WITH syntax is similar to RETURN. It separates query parts explicitly, allowing you to declare which variables to carry over to the next part." Basically, a powerful handover of variables between multiple queries - all in just 1 statement.

The WITH clause allows query parts to be chained together, piping the results from one to be used as starting points or criteria in the next.

Want more extensive knowledge?

The Neo4j company provides a series of good courses at zero cost - and even offers free certifications that you can put on your LinkedIn profile: the Neo4j academy.

For beginners, I recommend the following 3 short courses:

Now that we've taken a closer look at Neo4j and Cypher, how to use it in actual typical projects?

Want to access Neo4j thru Python?

Neo4j provides official support for a powerful but complex library called Neo4j Bolt Driver for Python (in some places referred to as the "Neo4j Python Driver".)

To make use of its power, but without getting bogged down with its complex low-level details, I wrote a library to make Python interfacing to Neo4j easier, and released it to open source at the beginning of 2021 : https://github.com/BrainAnnex/neo4j-liaison .

As of Dec. 2021, it has been superseded by "NeoAccess", an expanded library that I released (source code on GitHub) as part of the new version of Brain Annex, also based on work that I and others did at GSK pharmaceuticals, and graciously made open source by the company.

The NeoAccess library is discussed in part 4.

The NeoAccess library also comes with an optional companion library, NeoSchema: a schema layer harmoniously brings together the best of the flexibility ("anything goes!") of graph databases and the "law and order" aspect of relational databases! (For details, see part 5 of this article.)

Other open-source libraries exist to access Neo4j from Python: users with simple, limited needs, might benefit from Py2neo, and Django users might want to look at Neomodel (details about both.)

Putting it All Together : a Technology Stack on top of a Graph Database

One typically needs a full data-management solution, not just a database. The Schema Layer, briefly mentioned in the previous section, as well as an API and a UI, are all discussed in part 6 of this series.

This article is part 3 of a growing, ongoing series on Graph Databases and Neo4j

Comments

Discussing Neuroscience with ChatGPT

UPDATED Apr. 2023 - I'm excited by ChatGPT 's possibilities in terms of facilitating advanced learning . For example, I got enlightening answers to questions that I had confronted when I first studied neuroscience. The examples below are taken from a very recent session I had with ChatGPT (mid Jan. 2023.) Source: https://neurosciencestuff.tumblr.com In case you're not familiar with ChatGPT, it's a very sophisticated "chatbot" - though, if you call it that way, it'll correct you! 'I am not a "chatbot", I am a language model, a sophisticated type of AI algorithm trained on vast amounts of text data to generate human-like text'. For a high-level explanation of how ChatGPT actually works - which also gives immense insight into its weaknesses, there's an excellent late Jan. 2023 talk by Stephen Wolfram, the brilliant author of the Mathematica software and of Wolfram Alpha , a product that could be combined with ChatGPT to imp...

Graph Databases (Neo4j) - a revolution in modeling the real world!

UPDATED Oct. 2023 - I was "married" to Relational Databases for many years... and it was a good "relationship" full of love and productivity - but SOMETHING WAS MISSING! Let me backtrack. In college, I got a hint of the "pre-relational database" days... Mercifully, that was largely before my time, but - primarily through a class - I got a taste of what the world was like before relational databases. It's an understatement to say: YUCK! Gratitude for the power and convenience of Relational Databases and SQL - and relief at having narrowly averted life before it! - made me an instant mega-fan of that technology. And for many years I held various jobs that, directly or indirectly, made use of MySQL and other relational databases - whether as a Database Administrator, Full-Stack Developer, Data Scientist, CTO or various other roles. UPDATE: This article is now part 1 of a growing, ongoing series on Graph Databases and Neo4j But ther...

Using Schema in Graph Databases such as Neo4j

UPDATED Feb. 2024 - Graph databases have an easygoing laissez-faire attitude: "express yourself (almost) however you want"... By contrast, relational databases come across with an attitude like a micro-manager: "my way or the highway"... Is there a way to take the best of both worlds and distance oneself from their respective excesses, as best suited for one's needs? A way to marry the flexibility of Graph Databases and the discipline of Relational Databases? This article is part 5 of a growing, ongoing series on Graph Databases and Neo4j Let's Get Concrete Consider a simple scenario with scientific data such as the Sample, Experiment, Study, Run Result , where Samples are used in Experiments, and where Experiments are part of Studies and produce Run Results. That’s all very easy and intuitive to represent and store in a Labeled Graph Database such as Neo4j . For example, a rough draft might go like this: The “labels” (b...

What are Graph Databases - and Why Should I Care?? : "Graph Databases for Poets"

This is a very gentle introduction to the subject. The subtitle is inspired by university courses such as "Physics for Poets"! (if you're technically inclined, there's an alternate article for you.) It has been said that "The language of physics (or of God) is math". On a similar note, it could be said that: The language of the biological world - or of any subject or endeavor involving complexity - is networks ('meshes') What is a network? Think of it as the familiar 'friends of friends' diagram from social media. Everywhere one turns in biology, there's a network – at the cellular level, tissue level, organ level, ecosystem level. The weather and other earth systems are networks. Human societal organization is a network. Electrical circuits, the Internet, our own brains... Networks are everywhere! What can we do with networks, to better understand the world around us, or to create something that we need? Broadly ...

Using Neo4j with Python : the Open-Source Library "NeoAccess"

So, you want to build a python app or Jupyter notebook to utilize Neo4j, but aren't too keen on coding a lot of string manipulation to programmatic create ad-hoc Cypher queries? You're in the right place: the NeoAccess library can do take care of all that, sparing you from lengthy, error-prone development that requires substantial graph-database and software-development expertise! This article is part 4 of a growing, ongoing series on Graph Databases and Neo4j "NeoAccess" is the bottom layer of the technology stack provided by the BrainAnnex open-source project . All layers are very modular, and the NeoAccess library may also be used by itself , entirely separately from the rest of the technology stack. (A diagram of the full stack is shown later in this article.) NeoAccess interacts with the Neo4j Python driver , which is provided by the Neo4j company, to access the database from Python; the API to access that driver is very p...

Interactomics + Super (or Quantum) Computers + Machine Learning : the Future of Medicine?

[Updated Mar. 2022] Interactomics today bears a certain resemblance to genomics in the 1990s... Big gaps in knowledge, but an explosively-growing field of great promise. If you're unfamiliar with the terms, genomics is about deciphering the gene sequence of an organism, while interactomics is about describing all the relevant bio-molecules and their web of interactions. A Detective Story Think of a good police-detective story; typically there is a multitude of characters, and an impossible-to-remember number of relationships: A hates B, who loves C, who had a crush on D, who always steers clear of E, who was best friends with A until D arrived... Yes, just like those detective stories, things get very complex with our biological story! Examples of webs of interactions, familiar to many who took intro biology, are the Krebs cycle for metabolism or the Calvin cycle to fix carbon into sugars in plant photosynthesis. Now, imagine vastly expanding those cyc...

Visualization of Graph Databases Using Cytoscape.js

(UPDATED APR. 2024) I have ample evidence from multiple sources that there are strong unmet needs in the area of visualization of graph databases. And whenever there's a vacuum, vendors circle like vultures - with incomplete, non-customizable, and at times ridiculously expensive, closed-box proprietary solutions. Fortunately, coming to the rescue is the awesome open-source cytoscape.js library , an offshoot of the "Cytoscape" project of the Institute for Systems Biology , a project with a long history that goes back to 2002. One can do amazing custom solutions, relatively easily, when one combines this Cytoscape library with: 1) a front-end framework such as Vue.js 2) backend libraries (for example in python) to prepare and serve the data For example, a while back I created a visualizer for networks of chemical reactions, for another open-source project I lead ( life123.science ) This visualizer will look and feel generally ...

Photonic Computer - a "supercharged GPU" with very low energy consumption

Yes, we all wish for Quantum Computers... but in the meantime we need something here and now! Could Photonic Computers fit that role? Just about everyone has heard of fiber optics – using light for data transmission – but did you know that light can also be used for computing? There's a new commercial product expected for early next year (2022) . I contacted the CEO, Nicholas Harris, of a 4-y.o. startup, Lightmatter , interviewed in April 2021 here . Photonic computers, at least in their first commercial appearance, are essentially accelerator cards for Linear Algebra - and so of special interest for Machine Learning and some types of simulations. Their claims are remarkable: 10X faster than some of the best GPUs using 90% less energy can be used with existing software stacks, such as TensorFlow commercially available early next year (2022) a lot of future growth, as additional wavelengths of light get used in parallel My own inte...

Life123 : Quantitative Dynamical Modeling of Biological Systems

(UPDATED 8/2022) - Are we ready to embark on a next-generation detailed quantitative modeling of complex biological systems , including whole-cell simulations? An anticipated up-jump in computing power may be imminent from Photonics computers (which I discuss here ), and GPU's are rapidly gaining power as well... Are we in ready state to put existing - and upcoming - power to good use? This is a manifest, and a call to action What's Life123? It's about detailed quantitative modeling of biological systems in 1-D, 2-D and full 3-D, as well as a multi-faceted software platform for doing so. What's (pseudo-)1D? For now, let's say it's like the inside of a long, thin tube - with no interactions with the tube. Likewise, (pseudo-)2D can be thought of as a Petri dish, with no interactions with the lid or the bottom. Website : https://life123.science A purposeful decision to also utilize 1D and 2D But why? Yes, it's in part about "walk before you run...

D3 Visualization with Vue.js : a powerful alliance (when done right!)

[UPDATED MAY 2022] D3.js is a very powerful visualization tool, especially for specialized/custom needs... On the flip side, it's rather hard to use - with a steep learning curve. Even worse if one also wants interactivity ! But why is D3 so hard/clunky to use? And what can be done about it? Spoiler alert: Vue.js (or other modern front-end framework) to the rescue - if done right... All code in the examples is available in this GitHub repository . The Root of the Problem In a nutshell, what makes D3 awkward to use is that, for historical reasons, it tries to do too much : most painfully, it uses an old way to do direct DOM manipulation (i.e. restructuring the page layout) - an operation that nowadays is superbly handled in a far more friendly way by modern front-end frameworks, such as Vue.js Document Object Model ( DOM ) is a programming interface for web documents. In simple terms, it's the structure of the elements on a web page (text, images, etc.) Let ...

Julian's Polymath Explorations

Search This Blog