In order to meet the need for an increasing load and complexity of data new technologies have to be employed. It’s simply because of the pace of IT in general. One solution to this problem would be to introduce a graph database. It’s awesome because this concept helps a lot with modern data, that is anything related to Big Data and things like networking or natural sciences (chemistry, biology, medicine). After reading the following blog post you should be able to create your very own JanusGraph Kubernetes cluster and connect to it from the application.
The following blog post is a continuation of an article posted here.
Why JanusGraph?
JanusGraph is a great all-around database that you can easily tailor to your needs. Its concept differs a bit from other vendors in that it delegates storage and indexing over to external applications. It’s a really good feature because it means that JanusGraph tries to do one thing and does it well. The main features of this technology are as follow:
- free and open source
- pluggable data storage and indexing – allows you to choose whether you want to focus on availability with Cassandra or consistency with Google Cloud Bigtable. There are, of course, more storage plugins and additionally you can write your own if you want to.
- scalable out of the box
- integrates with big data platforms
- utilizes Gremlin server and Gremlin Query Language
- it’s a Titan fork under The Linux Foundation, to which companies like Google, IBM, Amazon or Hortonworks contribute to
Prerequisites
We’ll be using a local Kubernetes cluster inside docker which is created with k3d. It means that we’ll need docker, k3d and additionally helm and kubectl to deploy Janus and Cassandra.
The steps to install what we need will differ from OS to OS. For linux distributions do the following:
- Debian
apt install docker
wget -q -O - https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
You’ll also have to install helm using the following instruction: https://helm.sh/docs/intro/install/ and kubectl: https://kubernetes.io/docs/tasks/tools/install-kubectl/
- Arch
yay -S rancher-k3d-bin helm kubectl docker
Kubernetes cluster
For demo purposes am going to run 3 cassandra instances and 1 JanusGraph server, which adds up to a total of 4 servers that we have to create locally (other configurations can be found here: https://docs.janusgraph.org/storage-backend/cassandra).
k3d cluster create multiserver --servers 4
Since we want to run everything inside a namespace, let’s create one:
kubectl create namespace janus
Cassandra cluster
Now that we have the kubernetes cluster, let’s run cassandra with 3 instances. To do that we’ll have to create a helm values file as we want to override some defaults (you can find a list of all of them in here: https://github.com/helm/charts/tree/master/incubator/cassandra):
cassandra-values.yaml
image:
repo: cassandra
tag: 3.11.8
pullPolicy: IfNotPresent
resources:
requests:
memory: 4Gi
cpu: 1
limits:
memory: 4Gi
Let’s run cassandra. Note that we don’t create any persistent volumes so once you delete a pod then the data is all gone:
helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com
helm install cassandra --namespace janus --values cassandra-values.yaml incubator/cassandra
JanusGraph server
Now it’s time to create a persistent volume for JanusGraph as well as a Service which will allow Janus and Cassandra to interact. To do that, save the file below and execute the following command:
janus-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: janus-pv
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 1Gi
hostPath:
path: /var/janus
storageClassName: local-storage
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: janus-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: local-storage
---
apiVersion: v1
kind: Service
metadata:
name: kin-nodeport-service
spec:
type: NodePort
ports:
- port: 80
targetPort: 8080
selector:
kubectl apply --namespace janus -f janus-pv.yaml
Then start the JanusGraph server itself with the values file (values list is in here: https://github.com/helm/charts/tree/master/stable/janusgraph):
janus-values.yaml
image:
repository: janusgraph/janusgraph
tag: latest
pullPolicy: IfNotPresent
replicaCount: 1
persistence:
existingClaim: janus-pvc
service:
type: ClusterIP
port: 8182
serviceAnnotations:
elasticsearch:
deploy: false
rbac:
create: true
properties:
storage.backend: cql
storage.hostname: cassandra
index.search.backend: lucene
index.search.directory: /db/searchindex
helm install janusgraph --namespace janus --values janus-values.yaml stable/janusgraph
Connecting from application
Now that we have everything up and running, let’s try to connect to the database from application. It’s not going to be fancy by any means, it’s just to show you a basic setup.
We’re going to use kotlin and gradle as a build system. There are many ways to bootstrap such project, but probably the easiest way is to delegate it to the gradle itself. Make sure you have gradle installed and execute the following commands:
# first, expose janusgraph server to localhost
janus_pod_id=$(kubectl get pod --namespace janus | grep janusgraph | awk '{print $1}')
kubectl expose pod "$janus_pod_id" --port=8182 --name=janus-exposed --namespace janus
# create project
mkdir janus-demo
cd janus-demo
gradle init
# choose `application`, then `kotlin`, the rest is up to you, you can skip it by pressing enter a couple of times
We’re also going to need two files in resources:
src/main/resources/remote-graph.properties
gremlin.remote.remoteConnectionClass=org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection
gremlin.remote.driver.clusterFile=remote-objects.yaml
gremlin.remote.driver.sourceName=g
src/main/resources/remote-objects.yaml
hosts: [ localhost ]
port: 8182
serializer: {
className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0,
config: { ioRegistries: [ org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry ] }
}
As a final touch, we have to add the following line to the build file (build.gradle.kts
) inside dependencies
block
implementation("org.janusgraph:janusgraph-driver:0.5.2")
Once it’s done, we can finally start hacking.
fun main() {
// g is a conventional name for graph traversal
val g = createTraversalUsingProperties()
g.V().has("name", "Alice")
.valueMap<String>()
.next()
.let(::println)
}
fun createTraversalUsingProperties(): GraphTraversalSource = AnonymousTraversalSource
.traversal()
.withRemote("remote-graph.properties")
This code will try to find a vertex that has a name
equal to Alice
and print all properties. However, we don’t have any data in the database yet, so let’s fix that:
val g = createTraversalUsingProperties()
g.addV()
.property("name", "Alice")
.tryNext()
g.V().has("name", "Alice")
.valueMap<String>()
.next()
.let(::println)
The code above may look a bit mysterious at the beginning, but in reality there’s nothing difficult about it. You can pick up the API really quick with the help of official Getting Started tutorial on apache website: http://tinkerpop.apache.org/docs/current/tutorials/getting-started/
One thing that may bother you is the use of next
or tryNext
. These are called Terminal Steps
and they execute the query. You can read more about them here: https://tinkerpop.apache.org/docs/current/reference/#terminal-steps
Summary
This should be enough to get you started with graph databases. As you can see, deploying such setup isn’t that hard. Due to the reasons I’ve mentioned in this article, I think it’s totally wort it to at least try employing JanusGraph.