DevOps

Persisting Couchbase Data Across Container Restarts

Best Practices for Virtualized Platforms provide best practices for running Couchbase on a virtualized platform like Amazon Web Services and Azure. In addition, it also provide some recommendations for running it as Docker container.

One of the recommendations is to map Couchbase node specific data to a local folder. Let’s understand that in more detail.

Implicit Per-Container Storage

If a Couchbase container is started as:

docker run -d -p 8091-8093:8091-8093 -p 11210:11210 --name db couchbase/server:sandbox

This container:

  • Starts in a detached mode using -d
  • Different query, caching and administration ports are mapped using -p
  • A name is provided using --name
  • Image is couchbase/server:sandbox

By default, the data for the container is stored in a managed volume. Checking volume mounts using the docker inspect command shows:

docker inspect --format '{{json .Mounts }}' db  | jq
[
  {
    "Name": "aa3c06f9c506d52bfb5d3d265f7b63045df0fea996998f12ce08b2543345e948",
    "Source": "/var/lib/docker/volumes/aa3c06f9c506d52bfb5d3d265f7b63045df0fea996998f12ce08b2543345e948/_data",
    "Destination": "/opt/couchbase/var",
    "Driver": "local",
    "Mode": "",
    "RW": true,
    "Propagation": ""
  }
]

The data for Couchbase is stored in the container filesystem defined by the value of Source attribute. This can be verified by logging into the root filesystem:

docker run -it --pid=host --privileged debian:jessie nsenter -t 1 -m -p -n

Now you can see the data directory:

010e52853bc6:~# ls /var/lib/docker/volumes | grep aa3c
aa3c06f9c506d52bfb5d3d265f7b63045df0fea996998f12ce08b2543345e948

A new directory is created for a new run of the container. This directory is still around when the container is stopped and removed but no longer easily accessible. Thus no data is preserved across container restarts.

The volume can be explicitly removed, along with container, using the command:

docker rm -v db

If the container terminates then the entire state of the application is lost.

Explicit Host Directory Mapping

Now, let’s start a Couchbase container with explicit volume mapping:

docker run -d -p 8091-8093:8091-8093 -p 11210:11210 --name db -v ~/couchbase:/opt/couchbase/var couchbase/server:sandbox

This container is very similar to the container started earlier. The main difference is that a directory from host ~/couchbase is mapped to a directory in the container /opt/couchbase/var.

Couchbase container persists any data in /opt/couchbase/var directory in the container filesystem. Now that directory is mapped to a directory on the host filesystem. This allows to persist state of the container outside on the host filesystem. The bypasses the union filesystem used by Docker and exposes the host filesystem to the container. This allows the state to persist across container restarts. The new container only needs to start with the exact same volume mapping.

More details about the container can be seen as:

 
docker inspect --format '{{json .Mounts }}' db | jq

jq is a JSON processor that needs to be installed separately. And the output is shown as:

[
  {
    "Source": "/Users/arungupta/couchbase",
    "Destination": "/opt/couchbase/var",
    "Mode": "",
    "RW": true,
    "Propagation": "rprivate"
  }
]

This shows the source and destination directory. RW shows that the volume is read/write.

If the container is started using Docker for Mac, then Couchbase Web Console is accessible at http://localhost:8091. The Data Bucketstab shows the default travel-sample bucket:

docker-volume-couchbase-01

Click on Create New Data Bucket to create a new data bucket. Give it the name sample:

docker-volume-couchbase-02

The Data Buckets tab is updated with this newly created bucket:

docker-volume-couchbase-03

Now stop and remove the container:

docker stop db
docker rm db

Start the container again using the same command:

docker run -d -p 8091-8093:8091-8093 -p 11210:11210 --name db -v ~/couchbase:/opt/couchbase/var couchbase/server:sandbox

Data Buckets tab will show the same two buckets in the Couchbase Web Console.

In this case, if the container is started on a different host then the state would not be available. Or if the host dies then the state is lost.

An alternative and a more robust and foolproof way to manage persistence in containers is using a shared network filesystem such as Ceph, GlusterFS or Network Filesystem. Some other common approaches are to use Docker Volume Plugins like Flocker from ClusterHQ or Software Defined Storage such as PortWorx. All of these storage technique simplify how state of a container can be saved in a multi-container multi-host environment. A future blog will cover these techniques in detail.

Read more details in Managing data in containers.

couchbase.com/containers provide more details about how to run Couchbase in different container frameworks.

More information about Couchbase:

Arun Gupta

Arun is a technology enthusiast, avid runner, author of a best-selling book, globe trotter, a community guy, Java Champion, JavaOne Rockstar, JUG Leader, Minecraft Modder, Devoxx4Kids-er, and a Red Hatter.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button