Using graph databases, you may have a need for ORDERED SEQUENCES of data records...
For example:- ordered sequences of adverse events in clinical trials
- ordered sequences of text messages
- ordered sequences of chemical production steps
- ordered sequences of "content items" for a content management system
Good news : an open-source library exists for conveniently managing this data model in a Neo4j graph database (and in principle could be ported to other graph databases.)
This concept of ordered sequences of data nodes:
1) is hugely general : I gave them the name "(Ordered) Collections"
2) can be tricky to implement fully (for example, to deal with multiple insertions in the middle)
3) fortunately, all is handled by this open-source library! 😁
=> Here's the documentation
=> Here's the source code
This article is part of a growing, ongoing series on Graph Databases and Neo4j
A Simple Example
The attached image is a simple example of a "Collection" (ordered sequence) representing a photo album of a trip to Greece, with data nodes that represent individual photos (desired to be in a particular order).In this implementation, the position of the individual items in the sequence is stored as an integer value of the link property "pos", on the "in_category" relationships.
You may ignore the green and orange nodes on the left hand side: those are part of the Schema Layer that we opt to use for data integrity and other reasons - but not relevant to this discussion of Collections (ordered sequences.)
Collections vs. "Recordsets"
"Collections", as I use the term, are relatively permanent, with only sporadic "tweaks" of relative positions. All the use cases given in this article are generally in this category.
For different use cases, there's a separate data structure slated to be added to the BrainAnnex open-source project, tentatively called "Recordset" - meant for a group of records that you want to show together at a particular time (for example tabular view of data), in response to the user request for a particular sorting.
The envisioned design for the "Recordset" structure is that no sequential positional information is to be actually stored in the database; rather, it is the responsibility of a front-end module to request the desired records to show, and to pass in the request a criterion for sorting the records : ordering is done on the fly by the backend, in servicing the data request.
The front-end module may also offer a way to select a record (node) and optionally follow links out of it. I.e. a lot more responsibility on the front end! This is currently experimental, and slated to be rolled out in future releases...
Technical note on design considerations:
When I first announced this python library on LinkedIn and elsewhere, I got a ton of reactions: clearly, there's a broad need, and widespread interest in the matter! Several people mentioned an alternate design, which I'll discuss below.
On multiple occasions, starting in a parent project around 2015, I considered using a "linked list" kind of data structure, with "next" relationships among the child nodes.
But I disliked several aspects when I first considered it - and have continued to feel the same way throughout the years.
What I dislike about the "linked list" design includes:
- In many use cases, the more "natural" ("primary") relationship is the child node belonging to the parent (the Collection) at a particular position (e.g. the "step" in a sequence or relative position on a page, etc); being "next" to another child seems more like a consequence.
- All the traversals of the "linked list" of nodes feels burdensome
But, most importantly, I think that the following consideration is the last nail on the coffin for the "linked list" design:
- Data nodes ought to be allowed to belong to MULTIPLE ordered collections (in which case there's no such thing as a single "next" relationship)
Example - The use case of Content Management Systems :
- the data nodes might be "content items", such as images or text notes or documents
- the ordered collections might be "categories" (topics)
- the sequential order might be the position of the content items on the page, akin to the relative order of paragraphs, figures, etc, in a chapter in a book.
Would anyone want to juggle multiple "next" relationships - one for each collection? I think that would be messy and confusing! Hence I stand by my original design : the positional information is most appropriately described by an integer property value on the "belongs to collection" relationships from the data nodes.
This article is part of a growing, ongoing series on Graph Databases and Neo4j
Comments
Post a Comment