Time for the opening shot of a series about Semantic Technology, and in particular contrasting-and-comparing the opposing (but perhaps ultimately complementary) camps of:
It’s my opinion that modeling in terms of Subject/Predicate/Object triples (aka RDF) might be appealing to mathematicians or philosophers for its minimalist foundation (though a lot of baroque add-on’s quickly come out of the closet!)
Modeling in terms of (Labeled) Property Graphs might be appealing to computer scientists, because such graphs appear more usable and less clunky once you start actually doing something with them.
Perhaps because I straddle both the Math and CS camps, I’m currently on the fence about which model I like best. I wear many hats: as someone involved in a research project on Knowledge Management (BrainAnnex.org) , I'm very interested in contrasting and comparing the two models; by contrast, in my role as someone trying to do Bioinformatics and other types of Content Management, I really don't care about the "war of the models" and simply want what is best to actually do serious work with!
Here's an example from a helpful tutorial by Neo4j, RDF Triple Stores vs. Labeled Property Graphs: What’s the Difference?
With binary relationships, triples are simple and intuitive. To express that there is “a direct flight” (predicate) from NYC (entity) to SF (entity), you can have a subject/predicate/object triple such as:
That's just the infix form of a mathematical predicate with 2 variables: P(x, y). In graph form, it’s an edge (the Predicate) connecting the 2 nodes of the cities.
But what if you want to model the fact that the direct flight has distance and price values attached to them? Using Property Graphs models such as Neo4j, that’s trivially done: relationships can take as many properties/value pairs as you want. End of story!
With Triple-based Graphs, it’s a somewhat clunky process of adding an extra node, representing the city/city pair as an entity that can have properties such as distance and price. Vaguely reminiscent of those annoying “junction tables” needed in relational databases to represent many-many relationships. This operation is called reification.
How does one actually add that extra node? Here's a recipe example from Kno.e.sis that shows two ways to actually add a node to the set of triples. (The triples are shown written as “Turtle code”, which is a concise way to avoid repeating the same subject, etc. Don’t worry about the prefixes such as rdf: , as they’re just name spaces.)
I’m not trying to say that Property Graphs are overall better than Triple-based graphs. Just pointing the fact that N-ary relationships, while eminently doable in RDF, are more clunky and a little less intuitive.
I've also had promising initial results with Blazegraph.
By contrast, I advice against Virtuoso, a triplestore I found to be clunky, bloated, buggy, and with an inadequate ecosystem. (At a past job, we squandered a lot of time trying to use it, both versions 6 and 7.)
The open-source Blazegraph supports RDF*, above and beyond supporting RDF (both triples and quads) and also providing a Property Graph modality.
- RDF Triple Stores, aka Triples-Based Graphs. For example, Blazegraph or Apache Jena
- (Labeled) Property Graphs. For example, Neo4j or Blazegraph
It’s my opinion that modeling in terms of Subject/Predicate/Object triples (aka RDF) might be appealing to mathematicians or philosophers for its minimalist foundation (though a lot of baroque add-on’s quickly come out of the closet!)
Modeling in terms of (Labeled) Property Graphs might be appealing to computer scientists, because such graphs appear more usable and less clunky once you start actually doing something with them.
Perhaps because I straddle both the Math and CS camps, I’m currently on the fence about which model I like best. I wear many hats: as someone involved in a research project on Knowledge Management (BrainAnnex.org) , I'm very interested in contrasting and comparing the two models; by contrast, in my role as someone trying to do Bioinformatics and other types of Content Management, I really don't care about the "war of the models" and simply want what is best to actually do serious work with!
N-ary relationships : attaching properties to relationships
There are many aspects, but in this blog entry l I’ll just discuss one: N-ary relationships, aka how to attach properties to relationships.Here's an example from a helpful tutorial by Neo4j, RDF Triple Stores vs. Labeled Property Graphs: What’s the Difference?
With binary relationships, triples are simple and intuitive. To express that there is “a direct flight” (predicate) from NYC (entity) to SF (entity), you can have a subject/predicate/object triple such as:
NYC hasDirectFlight SF
That's just the infix form of a mathematical predicate with 2 variables: P(x, y). In graph form, it’s an edge (the Predicate) connecting the 2 nodes of the cities.
But what if you want to model the fact that the direct flight has distance and price values attached to them? Using Property Graphs models such as Neo4j, that’s trivially done: relationships can take as many properties/value pairs as you want. End of story!
With Triple-based Graphs, it’s a somewhat clunky process of adding an extra node, representing the city/city pair as an entity that can have properties such as distance and price. Vaguely reminiscent of those annoying “junction tables” needed in relational databases to represent many-many relationships. This operation is called reification.
Reification
The above flight example shows the extra "reification" node as a diagram. Again, it's basically a crutch to attach properties on a relationship.How does one actually add that extra node? Here's a recipe example from Kno.e.sis that shows two ways to actually add a node to the set of triples. (The triples are shown written as “Turtle code”, which is a concise way to avoid repeating the same subject, etc. Don’t worry about the prefixes such as rdf: , as they’re just name spaces.)
I’m not trying to say that Property Graphs are overall better than Triple-based graphs. Just pointing the fact that N-ary relationships, while eminently doable in RDF, are more clunky and a little less intuitive.
My Tentative Conclusions
At present, I lean heavily in favor of the open-source Neo4j, though for some purposes triplestores do just fine (for example, in the Knowledge-Representation and Media Management open source project Brain Annex, I make use of ARC2, a PHP library for working with RDF.)I've also had promising initial results with Blazegraph.
By contrast, I advice against Virtuoso, a triplestore I found to be clunky, bloated, buggy, and with an inadequate ecosystem. (At a past job, we squandered a lot of time trying to use it, both versions 6 and 7.)
Beyond RDF
I'll just mention in passing that there is an extension of RDF (and of its query language SPARQL), called RDF*, which somewhat simplifies/streamlines the clunky reification process of RDF.The open-source Blazegraph supports RDF*, above and beyond supporting RDF (both triples and quads) and also providing a Property Graph modality.
Comments
Post a Comment