Introduction to graph databases

It is not surprising that graphs are being employed by more and more companies, especially the big ones. The ever increasing data volume and complexity is a serious problem, and that is precisely why graph databases were introduced. Bear with me as we learn what a graph database is.

Data models

In order to understand how graph databases operate, we have to first go over the data models a little bit. This knowledge is essential to make a decision whether you will use a graph database in the future.

Relational Model

It is by far the most known data model out there. The data in it is organized into relations (tables) which consist of tuples (rows).

When you look at the schema above you may get an idea that relational model creates a graph of entities, but in reality each foreign key is an ID of an entry in another table with the same primary key. These arrows are there just to make us, humans, understand the data layout easily.

The problem with such solution is that it is not coherent with the way we write our applications. It’s due to the fact that we see the data as nested structures whereas it’s completely flat in a table. In such case an awkward layer between the application and a database has to be introduced. Its sole purpose is to map an object to tabular format and vice versa. This is called Object-Relational Mismatch.

Document Model

This model shines when the data is self-contained and highly volatile. It also solves the problem I’ve mentioned above, as the format of the data is more closely (in some cases even directly) mapped to objects in the application. It is often represented as a JSON:

[
  {
    "id": 1,
    "name": "Leanne Graham",
    "username": "Bret",
    "email": "[email protected]",
    "address": {
      "street": "Kulas Light",
      "suite": "Apt. 556",
      "city": "Gwenborough",
      "zipcode": "92998-3874"
    },
    "phone": "1-770-736-8031 x56442",
    "website": "hildegard.org",
    "company": {
      "name": "Romaguera-Crona",
      "catchPhrase": "Multi-layered client-server neural-net",
      "bs": "harness real-time e-markets"
    }
  }
]

Such approach provides better locality than relational model. If you were to display user profile in a relational database, you would have to query quite a bit of tables or perform a sloppy multi-join. The document model on the other hand allows you to fetch all information for a user using a single query as all the data is in one place.

Of course, it’s not perfect either. Most document databases don’t support joins at all. Say you wanted to extract company information to a separate document. In such case when fetching user data, you’d have to manually query for the company and join the data together. Most probably you would do so in application layer, which may slow things down a bit.

Network model

We have to make one last stop before we dive into graph databases. Network model was a direct competitor for relational databases back in the 1970s. This approach differs from a relational one in that it stores the links between records not with foreign keys, but more like pointers in C. Because of that the only way to access a record in the database is to follow the links of a root node.

Network model databases are based on hierarchical data model, which means the data is stored more or less like a tree. Due to that fact, you can’t just link any nodes you want. Each child must have exactly one parent, and each parent can have multiple children. These databases also use fixed records, meaning you can’t add arbitrary properties.

It works much like get/findoperations in a linked list, where you start from the head of the list and follow the next item until you find what you’re looking for. But that’s only the simplest case. In real world there are lots of many-to-many relations which make things more challenging, as there are many ways to create a path between two points on a graph.

Graph model

Last but not least, there’s a graph model. If your data is highly interconnected, then that’s the data model you will want to look into. It’s the reason why lots of big companies started employing this model. You’re already making use of the benefits of graphs if you’re using Facebook, Google, Amazon, Twitter or Instagram. They use them to perform complex analytics on how users interact with other users, item auctions or simple web searches, etc.

While network and graph sound very much alike, these models aren’t the same. Graphs are much more flexible in a sense that they allow for arbitrary properties/relationships.

Graph structure

Each graph consists of:

Graph: An object that contains vertices and edges.
Element: An object that can have any number of key/value pairs associated with it (i.e. properties)
Vertex: An object that has incoming and outgoing edges.
Edge: An object that has a tail and head vertex.

Graph models

Property Graph Model – it boils down to the fact that all elements have an identifier, properties and a label. Additionally, every vertex has a set of outgoing and incoming edges and each edge has a tail and head vertex. There’s a bit more about it in here.
Triple-store – every information is being stored as a very simple statement that consists of three parts: subject, predicate and object, e.g. John, likes, Lucy or John, age, 24

Query languages

Gremlin – functional language designed specifically for graph databases, it’s a de facto standard in the graph world that many vendors utilize and contribute to.

graph.V().has('movie','name','Die Hard')
  .inE('rated')
  .values('stars')
  .mean()

Cypher – it employs an ascii-art technique to represent graph information in queries. It was created for Neo4j database and at the same time it’s being used pretty much solely by it.

MATCH (:Actor {name: 'Nicole Kidman'})-[:ACTED_IN]->(movie:Movie)
WHERE movie.year < 2015
RETURN movie

GraphQL – it’s a less obvious choice, but some databases use a modified version of GraphQL too. The language itself is rather simple but simultaneously it fits the graph model well. The sample below is a GraphQL+- snippet which was created for Dgraph.

{
  brCharacters(func: eq(name@en, "Blade Runner")) {
    name@en
    initial_release_date
    starring {
      performance.actor {
        name@en  # actor name
      }
      performance.character {
        name@en  # character name
      }
    }
  }
}

SQL – it is possible to query graph data with the use of SQL, there are some quirks though. There could be some cases in which you need to follow an edge a number of times. While it’s doable in SQL, the query will be much longer and more complicated. However, there is OrientDB which extends the SQL in order to make it easier for developers to write queries.

SELECT name, OUT("Friend").name AS friendName FROM Person

Graph use cases

As already mentioned, the graph model excels when the data has lots of many-to-many relationships, therefore all of the following can be considered valid use cases for graphs:

social media – e.g. friend network on Facebook, user circles on Google Plus
network architecture – map of all network layers (how applications, servers, datacenters, routers etc. relate to each other)
web graphs – how websites reference each other
collaboration networks – github projects and contributors
communication networks – messaging applications
road networks – graph that shows the possibilities of travelling between places
fraud detection – detecting suspicious actions
chemistry – how compounds react with each other
biology – DNA-trait relations (which gene causes what)
medicine – chemicals and side effects

Comparison of popular graph databases

(scroll the table to the right)

	JanusGraph	OrientDB	Dgraph	ArangoDB	Neo4j Community	Neo4j Enterprise
StackOverflow questions	619	2,665	82	1,682	19,757
Initial release	2017	2010	2016	2011	2007
Last release date (as of 09/2020)	05/2020	09/2020	07/2020	10/2020	07/2020
Built in	Java	Java	Go	C++	Java
Architecture	sharding built-in, scalability handled by storage backend	sharded (with some limitations) and distributed, multi master replication	sharding per predicate, replicas have full dataset unless it’s too big, otherwise replica groups are created	many deployment modes (single instance / master slave / active failover / cluster)	not distributed (only scalable vertically)	a few core (write) servers + many replica (read) servers, sharded
Storage engine	depends on storage backend	plocal	Badger	RocksDB	custom solution
APIs	WS/HTTP, many popular PL	Binary, REST, many popular PL	JSON/protobuf over gRPC/HTTP	REST, Java/JS/PHP/Go drivers (+ more from community, e.g. python, .Net)	BOLT, REST, many popular PL
Query language	Gremlin	SQL	GraphQL or GraphQL+-	AQL	Cypher
License	Apache 2.0	Apache 2.0	Apache 2.0	Apache 2.0	GPLv3	GPLv3 / commercial
Geo search	built-in	built-in	built-in	built-in	external
Full-text search	pluggable, external (lucene, SOLR, ES)	built-in	built-in	built-in	external
Company behind	community-driven (developed by e.g. IBM, Google, Hortonworks, Amazon)	SAP	Dgraph Labs	ArangoDB	Neo Technology
Price (annually)	free	free (enterprise: hidden, approx. $6k – $10k)	free (enterprise: hidden)	free (enterprise: hidden)	free	hidden, approx. $30k – $200k
Spring Data support	Kind of (there’s a plugin but it’s bugged)	No	No	Yes	Yes
Detailed summary	https://docs.janusgraph.org/	https://orientdb.com/docs/3.0.x/misc/Overview.html	https://dgraph.io/docs/faq/#	https://www.arangodb.com/why-arangodb/arangodb-vs-neo4j/	https://neo4j.com/product/

Summary

The thing about graphs is that they are ideal for modern data, especially for Big Data. They just fit perfectly the direction in which the IT goes, that is the one of ever increasing data volume and complexity. I sincerely believe that now is the best time to employ graph databases because of how much interest they gain. And it doesn’t seem like they will vanish in future, I’m sure it’s quite the opposite.

Let’s stay in touch!

Sign up for our newsletter! You will receive a balanced portion of technological knowledge that you can easily transfer to the business world. In addition, once a week, a press with carefully selected information will be waiting for you!

I agree to the processing of my personal data by Stepwise sp. z o.o. in order to provide me with a newsletter. I declare that I have read the information clause contained in the Privacy Policy regarding the processing of personal data.

I declare that I have read the Regulations and I accept it.

Pursuant to the Polish Act of 18.07.2002 on the provision of electronic services, I consent to receive the commercial information via electronic means from Stepwise sp. z o.o.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices please view our Privacy Policy.

Introduction to graph databases

Data models

Relational Model

Document Model

Network model

Graph model

Graph structure

Graph models

Query languages

Graph use cases

Comparison of popular graph databases

(scroll the table to the right)

Summary

Let’s stay in touch!

Karol Czeryna

Get in touch with us

Contact Info

AI Consulting
& Strategy

Data & Machine Learning
Engineering

AI Software
Development

Contact info

Introduction to graph databases

Data models

Relational Model

Document Model

Network model

Graph model

Graph structure

Graph models

Query languages

Graph use cases

Comparison of popular graph databases

(scroll the table to the right)

Summary

Let’s stay in touch!

Other topics in this category

Get in touch with us

Contact Info

Contact info