JanusGraph cluster deployment on an example

In order to meet the need for an increasing load and complexity of data new technologies have to be employed. It’s simply because of the pace of IT in general. One solution to this problem would be to introduce a graph database. It’s awesome because this concept helps a lot with modern data, that is anything related to Big Data and things like networking or natural sciences (chemistry, biology, medicine). After reading the following blog post you should be able to create your very own JanusGraph Kubernetes cluster and connect to it from the application.

The following blog post is a continuation of an article posted here.

Why JanusGraph?

JanusGraph is a great all-around database that you can easily tailor to your needs. Its concept differs a bit from other vendors in that it delegates storage and indexing over to external applications. It’s a really good feature because it means that JanusGraph tries to do one thing and does it well. The main features of this technology are as follow:

free and open source
pluggable data storage and indexing – allows you to choose whether you want to focus on availability with Cassandra or consistency with Google Cloud Bigtable. There are, of course, more storage plugins and additionally you can write your own if you want to.
scalable out of the box
integrates with big data platforms
utilizes Gremlin server and Gremlin Query Language
it’s a Titan fork under The Linux Foundation, to which companies like Google, IBM, Amazon or Hortonworks contribute to

Prerequisites

We’ll be using a local Kubernetes cluster inside docker which is created with k3d. It means that we’ll need docker, k3d and additionally helm and kubectl to deploy Janus and Cassandra.

The steps to install what we need will differ from OS to OS. For linux distributions do the following:

Debian

apt install docker
wget -q -O - https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash

You’ll also have to install helm using the following instruction: https://helm.sh/docs/intro/install/ and kubectl: https://kubernetes.io/docs/tasks/tools/install-kubectl/

Arch

yay -S rancher-k3d-bin helm kubectl docker

Kubernetes cluster

For demo purposes am going to run 3 cassandra instances and 1 JanusGraph server, which adds up to a total of 4 servers that we have to create locally (other configurations can be found here: https://docs.janusgraph.org/storage-backend/cassandra).

k3d cluster create multiserver --servers 4

Since we want to run everything inside a namespace, let’s create one:

kubectl create namespace janus

Cassandra cluster

Now that we have the kubernetes cluster, let’s run cassandra with 3 instances. To do that we’ll have to create a helm values file as we want to override some defaults (you can find a list of all of them in here: https://github.com/helm/charts/tree/master/incubator/cassandra):

cassandra-values.yaml

image:
  repo: cassandra
  tag: 3.11.8
  pullPolicy: IfNotPresent
resources:
  requests:
    memory: 4Gi
     cpu: 1
  limits:
    memory: 4Gi

Let’s run cassandra. Note that we don’t create any persistent volumes so once you delete a pod then the data is all gone:

helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com
helm install cassandra --namespace janus --values cassandra-values.yaml incubator/cassandra

JanusGraph server

Now it’s time to create a persistent volume for JanusGraph as well as a Service which will allow Janus and Cassandra to interact. To do that, save the file below and execute the following command:

janus-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: janus-pv
spec:
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 1Gi
  hostPath:
    path: /var/janus
  storageClassName: local-storage
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: janus-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: local-storage
---
apiVersion: v1
kind: Service
metadata:
  name: kin-nodeport-service
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 8080
  selector:

kubectl apply --namespace janus -f janus-pv.yaml

Then start the JanusGraph server itself with the values file (values list is in here: https://github.com/helm/charts/tree/master/stable/janusgraph):

janus-values.yaml

image:
  repository: janusgraph/janusgraph
  tag: latest
  pullPolicy: IfNotPresent
replicaCount: 1

persistence:
  existingClaim: janus-pvc

service:
  type: ClusterIP
  port: 8182
  serviceAnnotations:
elasticsearch:
  deploy: false
  rbac:
    create: true
properties:
  storage.backend: cql
  storage.hostname: cassandra

  index.search.backend: lucene
  index.search.directory: /db/searchindex

helm install janusgraph --namespace janus --values janus-values.yaml stable/janusgraph

Connecting from application

Now that we have everything up and running, let’s try to connect to the database from application. It’s not going to be fancy by any means, it’s just to show you a basic setup.

We’re going to use kotlin and gradle as a build system. There are many ways to bootstrap such project, but probably the easiest way is to delegate it to the gradle itself. Make sure you have gradle installed and execute the following commands:

# first, expose janusgraph server to localhost
janus_pod_id=$(kubectl get pod --namespace janus | grep janusgraph | awk '{print $1}')
kubectl expose pod "$janus_pod_id" --port=8182 --name=janus-exposed --namespace janus

# create project
mkdir janus-demo
cd janus-demo
gradle init
# choose `application`, then `kotlin`, the rest is up to you, you can skip it by pressing enter a couple of times

We’re also going to need two files in resources:

src/main/resources/remote-graph.properties

gremlin.remote.remoteConnectionClass=org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection
gremlin.remote.driver.clusterFile=remote-objects.yaml
gremlin.remote.driver.sourceName=g

src/main/resources/remote-objects.yaml

hosts: [ localhost ]
port: 8182
serializer: {
  className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0,
  config: { ioRegistries: [ org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry ] }
}

As a final touch, we have to add the following line to the build file (build.gradle.kts) inside dependencies block

implementation("org.janusgraph:janusgraph-driver:0.5.2")

Once it’s done, we can finally start hacking.

fun main() {
    // g is a conventional name for graph traversal
    val g = createTraversalUsingProperties()

    g.V().has("name", "Alice")
            .valueMap<String>()
            .next()
            .let(::println)
}

fun createTraversalUsingProperties(): GraphTraversalSource = AnonymousTraversalSource
        .traversal()
        .withRemote("remote-graph.properties")

This code will try to find a vertex that has a name equal to Alice and print all properties. However, we don’t have any data in the database yet, so let’s fix that:

val g = createTraversalUsingProperties()

g.addV()
        .property("name", "Alice")
        .tryNext()

g.V().has("name", "Alice")
        .valueMap<String>()
        .next()
        .let(::println)

The code above may look a bit mysterious at the beginning, but in reality there’s nothing difficult about it. You can pick up the API really quick with the help of official Getting Started tutorial on apache website: http://tinkerpop.apache.org/docs/current/tutorials/getting-started/

One thing that may bother you is the use of next or tryNext. These are called Terminal Steps and they execute the query. You can read more about them here: https://tinkerpop.apache.org/docs/current/reference/#terminal-steps

Summary

This should be enough to get you started with graph databases. As you can see, deploying such setup isn’t that hard. Due to the reasons I’ve mentioned in this article, I think it’s totally wort it to at least try employing JanusGraph.

Let’s stay in touch!

Sign up for our newsletter! You will receive a balanced portion of technological knowledge that you can easily transfer to the business world. In addition, once a week, a press with carefully selected information will be waiting for you!

I agree to the processing of my personal data by Stepwise sp. z o.o. in order to provide me with a newsletter. I declare that I have read the information clause contained in the Privacy Policy regarding the processing of personal data.

I declare that I have read the Regulations and I accept it.

Pursuant to the Polish Act of 18.07.2002 on the provision of electronic services, I consent to receive the commercial information via electronic means from Stepwise sp. z o.o.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices please view our Privacy Policy.

JanusGraph cluster deployment

Why JanusGraph?

Prerequisites

Kubernetes cluster

Cassandra cluster

JanusGraph server

Connecting from application

Summary

Let’s stay in touch!

Karol Czeryna

Get in touch with us

Contact Info

AI Consulting
& Strategy

Data & Machine Learning
Engineering

AI Software
Development

Contact info

JanusGraph cluster deployment

Why JanusGraph?

Prerequisites

Kubernetes cluster

Cassandra cluster

JanusGraph server

Connecting from application

Summary

Let’s stay in touch!

Other topics in this category

Get in touch with us

Contact Info

Contact info