You have a general idea of what Graph Databases - and Neo4j in particular - are... But how to get started? Read on!
If you're new to graph databases, please check out part 1 for an intro and motivation about them. There, we discussed an example about an extremely simple database involving actors, movies and directors... and saw how easy the Cypher query language makes it to answer questions such as "which directors have worked with Tom Hanks in 2016" - questions that, when done with relational databases and SQL, turn into a monster of a query and an overly-complicated data model involving a whopping 5 tables!
In this tutorial, we will actually carry out that query - and get acquainted with Cypher and the Neo4j browser interface in the process. This is the dataset we'll be constructing:
Get the database in place
If you don't already have a database installed locally or on the cloud, the good news is that there's a super-easy way to try out Neo4j and its query language Cypher: the "Neo4j sandbox" that we discussed in Part2.
Ready to take it for a spin?
Whether from your own install or from the convenient "sandbox", go ahead and start the "Neo4j browser", a UI that comes with Neo4j.
If you ever used Jupyter Notebooks, you should feel right at home; however, be aware that the most recent cell is now on top rather at at the bottom (i.e. reversed order):
Populating the Database
If you just got a new account, the database is empty... but let's get in the habit of clearing everything out:
MATCH (n) DETACH DELETE n
Now, let's create the first "node" (record):
CREATE (a :actors {name: "Tom Hanks"})
RETURN a
With the above line, we're creating a node - think of it as a "database record" - and setting its label to "actors" and attribute "name" to the value "Tom Hanks". And then returning it.
What's the "label"? Very loosely speaking, think of it as a table name in a relational database. Or think of it as a "class" or "type" of record. Or a "tag". Multiple labels are allowed on a node (in that respect, the "table" analogy doesn't hold.)
CYPHER SYNTAX
The round parentheses indicate a node.
The curly brackets encircle attribute declarations (may look familiar from JSON or JavaScript or Python dictionaries)
The n or a in the above examples are just a dummy names
If you wish to locate that record just created, you can issue:
MATCH (a :actors {name: "Tom Hanks"}) RETURN a
Now, locate the "Tom Hanks" actor record, and create a relationship from it to a new "movie" record:
MATCH (a :actors {name: "Tom Hanks"})
MERGE (a)-[:stars_in]->(m :movies {name: "Big", release_date:1988})
RETURN a, m
Notice the syntax "-[:stars_in]->" for a relationship, in the indicated direction and with the specified name, between two nodes. The "MERGE" part will create new records and relationships as needed.
Similarly, locate the "Big" movie record, and create a relationship from it to a new "director" record:
MATCH (m :movies {name: "Big"})
MERGE (m)-[:directed_by]->(d :directors {name: "Penny Marshall"})
RETURN m, d
Locate everything created so far:
MATCH (n) RETURN n
One last step - we'll use a similar process as earlier, but this time we'll create a new movie and a new director at once, with all their relationships. As usual, we start by locating the node that is our starting point:
MATCH (a :actors {name: "Tom Hanks"})
MERGE (a)-[:stars_in]->
(m :movies {name: "California Typewriter", release_date:2016})-[:directed_by]->
(d :directors {name: "Doug Nichol"})
RETURN a, m, d
At this point, we're finished building the dataset, and if you issue a "MATCH (n) RETURN n", you should see the image shown at the very top of this page.
Let's answer our Burning Question!
Finally, we can issue a query to easily answer the question "which directors have worked with Tom Hanks in 2016":
MATCH (a :actors) -- (m :movies) -- (d :directors)
WHERE a.name = "Tom Hanks" AND m.release_date = 2016
RETURN d.name
In the above Cypher query, the double hyphen (--) between the nodes indicates relationships whose name and direction we don't care about. You can think of them as leaving out the middle part, and the arrow, in terms such as "-[:stars_in]->"
I phrased that query to sound more familiar to people coming from SQL. Here's a more concise, alternate way, to state it:
MATCH (a :actors {name: "Tom Hanks"}) -- (m :movies {release_date: 2016}) -- (d :directors)
RETURN d.name
Now you're officially acquainted with Neo4j and Cypher!
More Cypher
Ready to learn more Cypher? Here's a handy cheat sheet! I also highly recommend the tutorials that are built into the Neo4j browser interface (the UI in the earlier screenshots; the tutorials are at the bottom.)
Please notice that the Cypher query language is open source (just like Neo4j itself) and, while developed by the Neo4j company and often associated with it, can also be used with other graph databases that implement it. Info about "Open Cypher".
Interestingly, Amazon Neptune (an AWS hosted graph database), which initially was trying to bulldoze its own way (with arrogance reminiscent of the old Microsoft Internet Explorer browsers!) finally saw the light in 2021 and started supporting openCypher in their product.
Cypher is a very powerful query language: in addition to leveraging the power of graph databases - like we saw in the example we built - it can also implement cascades of instructions. I nickname them "badass queries"! Look for the WITH clause in the guides: "The WITH
syntax is similar to RETURN
.
It separates query parts explicitly, allowing you to declare which variables to carry over to the next part." Basically, a powerful handover of variables between multiple queries - all in just 1 statement.
The WITH clause allows query parts to be chained together, piping the results from one to be used as starting points or criteria in the next.
Want more extensive knowledge?
The Neo4j company provides a series of good courses at zero cost - and even offers free certifications that you can put on your LinkedIn profile: the Neo4j academy.
For beginners, I recommend the following 3 short courses:
Now that we've taken a closer look at Neo4j and Cypher, how to use it in actual typical projects?
Want to access Neo4j thru Python?
Neo4j provides official support for a powerful but complex library called Neo4j Bolt Driver for Python (in some places referred to as the "Neo4j Python Driver".)To
make use of its power, but without getting bogged down with its complex
low-level details, I wrote a library to make Python interfacing to
Neo4j easier, and released it to open source at the beginning of 2021 : https://github.com/BrainAnnex/neo4j-liaison .
As of Dec. 2021, it has been superseded by "NeoAccess", an expanded library that I released (source code on GitHub) as part of the new version of Brain Annex, also based on work that I and others did at GSK pharmaceuticals, and graciously made open source by the company.
The NeoAccess library is discussed in part 4.
The NeoAccess library also comes with an optional companion library, NeoSchema: a schema layer harmoniously
brings together the best of the flexibility ("anything goes!") of graph
databases and the "law and order" aspect of relational databases! (For
details, see part 5 of this article.)
Other open-source libraries exist to access Neo4j from Python: users with simple, limited needs, might benefit from Py2neo, and Django users might want to look at Neomodel (details about both.)
Putting it All Together : a Technology Stack on top of a Graph Database
One typically needs a full data-management solution, not just a database. The Schema Layer, briefly mentioned in the previous section, as well as an API and a UI, are all discussed in part 6 of this series.
Comments
Post a Comment