Docker

Published on:

Docker

This is my attempt to understand the ever changing world of docker.

Docker is an ever moving target and there are lots of examples of outdated
ways of doing things. Hopefully this will stay sort and up to date.

Storage and data volumes

Docker containers should be portable and immutable. This presents a challenge
for storage. A database needs to write its files somewhere. If it writes them
inside the container we break immutability. If it writes them outside the
container we break portability.

Performance is another consideration. The union file system is pretty slow.
Where performance counts, we need to bypass the union file system.

There are many ways to crack the docker storage nut. The best way depends on
our needs.

Mutable data in the container

If we don't care about performance and we don't care about immutability the most
simple thing (maybe see if you agree after reading below) is to store mutable
data in the container itself.

Let's say file-upload is an app that lets you to upload and download files.

Since we store the data in the container, moving the container to another server
looks like this.

  1. stop the container
  2. build an image from the stopped container
  3. push the image to a docker repo
  4. run the image on another server

or this

  1. stop the container
  2. copy the data out of the container
  3. move the data to the new server
  4. run the container on the new server

However, upgrading the file-upload app to version 3.0.0 is not so simple.

  1. stop the container
  2. copy the files out of the container
  3. start version 3.0.0 of the container
  4. copy the files into the container

Finally performance won't be that great, but we don't care.

WARNING!!! One very large caveat with storing data directly in your
container is when you remove the container your data is lost forever. Stopping
and restarting the container is safe, but removing it will also delete your
data.

Data volumes

Data volumes are docker's way to bypass the union file system and store data
directly on the host file system. This is much faster than the union file
system, and it allows your containers to be immutable.

With this approach moving our container to another server is as follows.

  1. stop the container
  2. move the data to the other server
  3. run the container on the other server

Upgrading our file-upload app is a simple as.

  1. stop the old version
  2. start the new version

Docker will never delete data stored in a data volume. Even removing a container
won't delete its volume's data. This can cause disk space issues.

Creating data volumes

There are two ways to create a data volume.

  1. in Dockerfile with the VOLUME instruction
  2. docker run -v

in the Dockerfile

VOLUME ["/some/path/in/my/container/"]

If you were to run a container based on this dockerfile and then run
docker inspect you would see something like the following. The mount Name will be
the id of the container, and it will be mapped to some path on the host system,
Source, and to the path you specified in the container, Destination.

"Mounts": [
        {
            "Name": "d4bf12ed6684da1f20a9eefcdf2a4f11987bbd1aa0e36007ab4454449f81bf30",
            "Source": "/mnt/sda1/var/lib/docker/volumes/d4bf12ed6684da1f20a9eefcdf2a4f11987bbd1aa0e36007ab4454449f81bf30/_data",
            "Destination": "/some/path/in/my/container",
            "Driver": "local",
            "Mode": "",
            "RW": true
        }
    ],

docker run -v

docker run -v /some/path/in/my/container/ busybox

This does exactly the same thing as the first way, and you will see the same
output from docker inspect.

The great thing about this is performance. Now you are bypassing the union file
system and you will get fast access.

Another good thing is that you can stop your container and even delete it
without losing your data. You can upgrade to version 2.0.0 without having to
copy data out of your old container into the new (sort of).

But this isn't very convenient because when you start your new container it will
have a new id and it will create a new path on the host based on that new id. In
order to use the data from the previous container you will need to copy it from
the old path.

Mapping data volumes to host paths

So a better way than the above is to map the container path to the host path
in a consistent way.

You have two choices for mapping a container path to the host system.

  1. you can specifiy the host file system path
  2. you can create a named volume and let docker determine the host file system path

un-named or user mapped volumes

If you want a volume to point to a specific path on the host file system
you can do the following where ~/ refers to the host file system and
/tmp refers to the container file system.

docker run -it -v ~/:/tmp busybox

You must use absolute paths for the container path.

You can use absolute or relative paths for the host file path. However,
relative paths must begin with ./ or ~/. If you just specify somepath it
creates a named volume which is very different and is explained below.

Inspecting the container will display the following.

"Mounts": [
        {
            "Source": "/home/docker",
            "Destination": "/tmp",
            "Mode": "",
            "RW": true
        }
    ],

Ahhhhhh, now we can upgrade to version 2.0.0 just by pointing the new
container to the same path. Look ma, no copying data!

named volumes

There are at least two ways you can create a named volume.

docker volume create --name=myvol

This second way both creates myvol if it doesn't exist and maps it to the
container path.

docker run -v myvol:/tmp busybox

Either of the above commands will create a volume named myvol.

# docker volume inspect myvol
[
    {
        "Name": "myvol",
        "Driver": "local",
        "Mountpoint": "/mnt/sda1/var/lib/docker/volumes/myvol/_data"
    }
]

This has the exact same result as you specifying the host path, but you let
docker determine the host path.

NOTE: While it may seem more convenient to specify the host path yourself --
maybe because you like having it under your home directory -- for
production applications it's generally more work. If you are deploying to
hundreds of servers it's easier to let docker create the host path so you
don't have to.

This will delete the volume.

docker volume rm myvol

Error response from daemon: Conflict: volume is in use

You will likely see the error because docker won't let you delete a volume
if there are any containers (running or stopped) that refer to it.

Data volume containers

Docker seems to push the idea of data volume containers. I don't know why.
Seems to be no advantage over named data volumes. I think it must be outdated
advice from before there were named data volumes.

The current docker documentation says:

If you have some persistent data that you want to share between containers, or
want to use from non-persistent containers, it’s best to create a named Data
Volume Container, and then to mount the data from it.

So the idea here is to create a container that doesn't do anything but specify
one or more (the examples use unnamed) volumes. Other containers can use
docker run --volumes-from to use the volume in the data volume container.

docker create -v /my/path --name file-upload-data file-upload /bin/true
docker run -d --name file-upload-app1 --volumes-from file-upload-data file-upload
docker run -d --name file-upload-app2 --volumes-from file-upload-data file-upload

The above creates a container file-upload-data
that just sits there and isn't even running. Then it runs two containers,
file-upload-app1 and file-upload-app2 that store any data they write
to /my/path to the volume owned by file-upload-data.

How is this better than having the two containers just use the same named
data volume?