Introduction to graph databases

Blog post cover: Introduction to graph databases

It is not surprising that graphs are being employed by more and more companies, especially the big ones. The ever increasing data volume and complexity is a serious problem, and that is precisely why graph databases were introduced. Bear with me as we learn what a graph database is.

Data models

In order to understand how graph databases operate, we have to first go over the data models a little bit. This knowledge is essential to make a decision whether you will use a graph database in the future.

Relational Model

It is by far the most known data model out there. The data in it is organized into relations (tables) which consist of tuples (rows).

When you look at the schema above you may get an idea that relational model creates a graph of entities, but in reality each foreign key is an ID of an entry in another table with the same primary key. These arrows are there just to make us, humans, understand the data layout easily.

The problem with such solution is that it is not coherent with the way we write our applications. It’s due to the fact that we see the data as nested structures whereas it’s completely flat in a table. In such case an awkward layer between the application and a database has to be introduced. Its sole purpose is to map an object to tabular format and vice versa. This is called Object-Relational Mismatch.

post picture: Introduction to graph databases

Document Model

This model shines when the data is self-contained and highly volatile. It also solves the problem I’ve mentioned above, as the format of the data is more closely (in some cases even directly) mapped to objects in the application. It is often represented as a JSON:

    "id": 1,
    "name": "Leanne Graham",
    "username": "Bret",
    "email": "",
    "address": {
      "street": "Kulas Light",
      "suite": "Apt. 556",
      "city": "Gwenborough",
      "zipcode": "92998-3874"
    "phone": "1-770-736-8031 x56442",
    "website": "",
    "company": {
      "name": "Romaguera-Crona",
      "catchPhrase": "Multi-layered client-server neural-net",
      "bs": "harness real-time e-markets"

Such approach provides better locality than relational model. If you were to display user profile in a relational database, you would have to query quite a bit of tables or perform a sloppy multi-join. The document model on the other hand allows you to fetch all information for a user using a single query as all the data is in one place.

Of course, it’s not perfect either. Most document databases don’t support joins at all. Say you wanted to extract company information to a separate document. In such case when fetching user data, you’d have to manually query for the company and join the data together. Most probably you would do so in application layer, which may slow things down a bit.

Network model

We have to make one last stop before we dive into graph databases. Network model was a direct competitor for relational databases back in the 1970s. This approach differs from a relational one in that it stores the links between records not with foreign keys, but more like pointers in C. Because of that the only way to access a record in the database is to follow the links of a root node.

Network model databases are based on hierarchical data model, which means the data is stored more or less like a tree. Due to that fact, you can’t just link any nodes you want. Each child must have exactly one parent, and each parent can have multiple children. These databases also use fixed records, meaning you can’t add arbitrary properties.

Post picture: Introduction to graph databasesIt works much like get/findoperations in a linked list, where you start from the head of the list and follow the next item until you find what you’re looking for. But that’s only the simplest case. In real world there are lots of many-to-many relations which make things more challenging, as there are many ways to create a path between two points on a graph.

Graph model

Last but not least, there’s a graph model. If your data is highly interconnected, then that’s the data model you will want to look into. It’s the reason why lots of big companies started employing this model. You’re already making use of the benefits of graphs if you’re using Facebook, Google, Amazon, Twitter or Instagram. They use them to perform complex analytics on how users interact with other users, item auctions or simple web searches, etc.

While network and graph sound very much alike, these models aren’t the same. Graphs are much more flexible in a sense that they allow for arbitrary properties/relationships.

Post picture: Introduction to graph databases

Graph structure

Each graph consists of:

  • Graph: An object that contains vertices and edges.
  • Element: An object that can have any number of key/value pairs associated with it (i.e. properties)
  • Vertex: An object that has incoming and outgoing edges.
  • Edge: An object that has a tail and head vertex.

Graph models

  • Property Graph Model – it boils down to the fact that all elements have an identifier, properties and a label. Additionally, every vertex has a set of outgoing and incoming edges and each edge has a tail and head vertex. There’s a bit more about it in here.
  • Triple-store – every information is being stored as a very simple statement that consists of three parts: subject, predicate and object, e.g. John, likes, Lucy or John, age, 24

Query languages

  • Gremlin – functional language designed specifically for graph databases, it’s a de facto standard in the graph world that many vendors utilize and contribute to.
graph.V().has('movie','name','Die Hard')
  • Cypher – it employs an ascii-art technique to represent graph information in queries. It was created for Neo4j database and at the same time it’s being used pretty much solely by it.
MATCH (:Actor {name: 'Nicole Kidman'})-[:ACTED_IN]->(movie:Movie)
WHERE movie.year < 2015
RETURN movie
  • GraphQL – it’s a less obvious choice, but some databases use a modified version of GraphQL too. The language itself is rather simple but simultaneously it fits the graph model well. The sample below is a GraphQL+- snippet which was created for Dgraph.
  brCharacters(func: eq(name@en, "Blade Runner")) {
    starring { {
        name@en  # actor name
      performance.character {
        name@en  # character name
  • SQL – it is possible to query graph data with the use of SQL, there are some quirks though. There could be some cases in which you need to follow an edge a number of times. While it’s doable in SQL, the query will be much longer and more complicated. However, there is OrientDB which extends the SQL in order to make it easier for developers to write queries.
SELECT name, OUT("Friend").name AS friendName FROM Person

Graph use cases

As already mentioned, the graph model excels when the data has lots of many-to-many relationships, therefore all of the following can be considered valid use cases for graphs:

  • social media – e.g. friend network on Facebook, user circles on Google Plus
  • network architecture – map of all network layers (how applications, servers, datacenters, routers etc. relate to each other)
  • web graphs – how websites reference each other
  • collaboration networks – github projects and contributors
  • communication networks – messaging applications
  • road networks – graph that shows the possibilities of travelling between places
  • fraud detection – detecting suspicious actions
  • chemistry – how compounds react with each other
  • biology – DNA-trait relations (which gene causes what)
  • medicine – chemicals and side effects

Comparison of popular graph databases

(scroll the table to the right)

JanusGraphOrientDBDgraphArangoDBNeo4j CommunityNeo4j Enterprise
StackOverflow questions6192,665821,68219,757
Initial release20172010201620112007
Last release date (as of 09/2020)05/202009/202007/202010/202007/2020
Built inJavaJavaGoC++Java
Architecturesharding built-in, scalability handled by storage backendsharded (with some limitations) and distributed, multi master replicationsharding per predicate, replicas have full dataset unless it’s too big, otherwise replica groups are createdmany deployment modes (single instance / master slave / active failover / cluster)not distributed (only scalable vertically)a few core (write) servers + many replica (read) servers, sharded
Storage enginedepends on storage backendplocalBadgerRocksDBcustom solution
APIsWS/HTTP, many popular PLBinary, REST, many popular PLJSON/protobuf over gRPC/HTTPREST, Java/JS/PHP/Go drivers (+ more from community, e.g. python, .Net)BOLT, REST, many popular PL
Query languageGremlinSQLGraphQL or GraphQL+-AQLCypher
LicenseApache 2.0Apache 2.0Apache 2.0Apache 2.0GPLv3GPLv3 / commercial
Geo searchbuilt-inbuilt-inbuilt-inbuilt-inexternal
Full-text searchpluggable, external (lucene, SOLR, ES)built-inbuilt-inbuilt-inexternal
Company behindcommunity-driven (developed by e.g. IBM, Google, Hortonworks, Amazon)SAPDgraph LabsArangoDBNeo Technology
Price (annually)freefree (enterprise: hidden, approx. $6k – $10k)free (enterprise: hidden)free (enterprise: hidden)freehidden, approx. $30k – $200k
Spring Data supportKind of (there’s a plugin but it’s bugged)NoNoYesYes
Detailed summary


The thing about graphs is that they are ideal for modern data, especially for Big Data. They just fit perfectly the direction in which the IT goes, that is the one of ever increasing data volume and complexity. I sincerely believe that now is the best time to employ graph databases because of how much interest they gain. And it doesn’t seem like they will vanish in future, I’m sure it’s quite the opposite.

    Let’s stay in touch!

    Sign up for our newsletter! You will receive a balanced portion of technological knowledge that you can easily transfer to the business world. In addition, once a week, a press with carefully selected information will be waiting for you!

    You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices please view our Privacy Policy.