Kubernetes 1.8: Using Cronjobs to take Neo4j backups
With the release of Kubernetes 1.8 Cronjobs have graduated to beta, which means we can now more easily run Neo4j backup jobs against Kubernetes clusters.
Before we learn how to write a Cronjob let’s first create a local Kubernetes cluster and deploy Neo4j.
Spinup Kubernetes & Helm
minikube start --memory 8192 helm init && kubectl rollout status -w deployment/tiller-deploy --namespace=kube-system
Deploy a Neo4j cluster
helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/ helm install incubator/neo4j --name neo-helm --wait --set authEnabled=false,core.extraVars.NEO4J_dbms_backup_address=0.0.0.0:6362
Populate our cluster with data
We can run the following command to check our cluster is ready:
kubectl exec neo-helm-neo4j-core-0 \ -- bin/cypher-shell --format verbose \ "CALL dbms.cluster.overview() YIELD id, role RETURN id, role" +-----------------------------------------------------+ | id | role | +-----------------------------------------------------+ | "0b3bfe6c-6a68-4af5-9dd2-e96b564df6e5" | "LEADER" | | "09e9bee8-bdc5-4e95-926c-16ea8213e6e7" | "FOLLOWER" | | "859b9b56-9bfc-42ae-90c3-02cedacfe720" | "FOLLOWER" | +-----------------------------------------------------+
Now let’s create some data:
kubectl exec neo-helm-neo4j-core-0 \ -- bin/cypher-shell --format verbose \ "UNWIND range(0,1000) AS id CREATE (:Person {id: id})" 0 rows available after 653 ms, consumed after another 0 ms Added 1001 nodes, Set 1001 properties, Added 1001 labels
Now that our Neo4j cluster is running and contains data we want to take regular backups.
Neo4j backups
The Neo4j backup tool supports full and incremental backups.
Full backup
A full backup streams a copy of the store files to the backup location and any transactions that happened during the backup. Those transactions are then applied to the backed up copy of the database.
Incremental backup
An incremental backup is triggered if there is already a Neo4j database in the backup location. In this case there will be no copying of store files. Instead the tool will copy any new transactions from Neo4j and apply them to the backup.
Backup Cronjob
We will use a Cronjob to execute a backup. In the example below we attach a PersistentVolumeClaim to our Cronjob so that we can see both of the backup scenarios in action.
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: backupdir-neo4j labels: app: neo4j-backup spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi --- apiVersion: batch/v1beta1 kind: CronJob metadata: name: neo4j-backup spec: schedule: "*/1 * * * *" jobTemplate: spec: template: spec: volumes: - name: backupdir-neo4j persistentVolumeClaim: claimName: backupdir-neo4j containers: - name: neo4j-backup image: neo4j:3.3.0-enterprise env: - name: NEO4J_ACCEPT_LICENSE_AGREEMENT value: "yes" volumeMounts: - name: backupdir-neo4j mountPath: /tmp command: ["bin/neo4j-admin", "backup", "--backup-dir", "/tmp", "--name", "backup", "--from", "neo-helm-neo4j-core-2.neo-helm-neo4j.default.svc.cluster.local:6362"] restartPolicy: OnFailure
$ kubectl apply -f cronjob.yaml cronjob "neo4j-backup" created
kubectl get jobs -w NAME DESIRED SUCCESSFUL AGE neo4j-backup-1510940940 1 1 34s
Now let’s view the logs from the job:
kubectl logs $(kubectl get pods -a --selector=job-name=neo4j-backup-1510940940 --output=jsonpath={.items..metadata.name}) Doing full backup... 2017-11-17 17:49:05.920+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db.labels ... 2017-11-17 17:49:06.038+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.labelscanstore.db 48.00 kB 2017-11-17 17:49:06.038+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 18 files 2017-11-17 17:49:06.094+0000 INFO [o.n.b.BackupService] Start recovering store 2017-11-17 17:49:07.669+0000 INFO [o.n.b.BackupService] Finish recovering store Doing consistency check... 2017-11-17 17:49:07.716+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /tmp/backup 2017-11-17 17:49:07.716+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8] 2017-11-17 17:49:07.755+0000 INFO [o.n.m.MetricsExtension] Initiating metrics... .................... 10% ... .................... 100% Backup complete.
All good so far. Now let’s create more data:
kubectl exec neo-helm-neo4j-core-0 \ -- bin/cypher-shell --format verbose \ "UNWIND range(0,1000) AS id CREATE (:Person {id: id})" 0 rows available after 114 ms, consumed after another 0 ms Added 1001 nodes, Set 1001 properties, Added 1001 labels
And wait for another backup job to run:
kubectl get jobs -w NAME DESIRED SUCCESSFUL AGE neo4j-backup-1510941180 1 1 2m neo4j-backup-1510941240 1 1 1m neo4j-backup-1510941300 1 1 17s
kubectl logs $(kubectl get pods -a --selector=job-name=neo4j-backup-1510941300 --output=jsonpath={.items..metadata.name}) Destination is not empty, doing incremental backup... Doing consistency check... 2017-11-17 17:55:07.958+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /tmp/backup 2017-11-17 17:55:07.959+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8] 2017-11-17 17:55:07.995+0000 INFO [o.n.m.MetricsExtension] Initiating metrics... .................... 10% ... .................... 100% Backup complete.
If we were to extend this job further we could have it copy each backup it takes to an S3 bucket but I’ll leave that for another day.
Published on Java Code Geeks with permission by Mark Needham, partner at our JCG program. See the original article here: Kubernetes 1.8: Using Cronjobs to take Neo4j backups Opinions expressed by Java Code Geeks contributors are their own. |