Docker
Docker
This is my attempt to understand the ever changing world of docker.
Docker is an ever moving target and there are lots of examples of outdated
ways of doing things. Hopefully this will stay sort and up to date.
Storage and data volumes
Docker containers should be portable and immutable. This presents a challenge
for storage. A database needs to write its files somewhere. If it writes them
inside the container we break immutability. If it writes them outside the
container we break portability.
Performance is another consideration. The union file system is pretty slow.
Where performance counts, we need to bypass the union file system.
There are many ways to crack the docker storage nut. The best way depends on
our needs.
Mutable data in the container
If we don't care about performance and we don't care about immutability the most
simple thing (maybe see if you agree after reading below) is to store mutable
data in the container itself.
Let's say file-upload
is an app that lets you to upload and download files.
Since we store the data in the container, moving the container to another server
looks like this.
- stop the container
- build an image from the stopped container
- push the image to a docker repo
- run the image on another server
or this
- stop the container
- copy the data out of the container
- move the data to the new server
- run the container on the new server
However, upgrading the file-upload
app to version 3.0.0 is not so simple.
- stop the container
- copy the files out of the container
- start version 3.0.0 of the container
- copy the files into the container
Finally performance won't be that great, but we don't care.
WARNING!!! One very large caveat with storing data directly in your
container is when you remove the container your data is lost forever. Stopping
and restarting the container is safe, but removing it will also delete your
data.
Data volumes
Data volumes are docker's way to bypass the union file system and store data
directly on the host file system. This is much faster than the union file
system, and it allows your containers to be immutable.
With this approach moving our container to another server is as follows.
- stop the container
- move the data to the other server
- run the container on the other server
Upgrading our file-upload
app is a simple as.
- stop the old version
- start the new version
Docker will never delete data stored in a data volume. Even removing a container
won't delete its volume's data. This can cause disk space issues.
Creating data volumes
There are two ways to create a data volume.
- in Dockerfile with the VOLUME instruction
- docker run -v
in the Dockerfile
VOLUME ["/some/path/in/my/container/"]
If you were to run a container based on this dockerfile and then run
docker inspect
you would see something like the following. The mount Name
will be
the id of the container, and it will be mapped to some path on the host system,
Source
, and to the path you specified in the container, Destination
.
"Mounts": [
{
"Name": "d4bf12ed6684da1f20a9eefcdf2a4f11987bbd1aa0e36007ab4454449f81bf30",
"Source": "/mnt/sda1/var/lib/docker/volumes/d4bf12ed6684da1f20a9eefcdf2a4f11987bbd1aa0e36007ab4454449f81bf30/_data",
"Destination": "/some/path/in/my/container",
"Driver": "local",
"Mode": "",
"RW": true
}
],
docker run -v
docker run -v /some/path/in/my/container/ busybox
This does exactly the same thing as the first way, and you will see the same
output from docker inspect
.
The great thing about this is performance. Now you are bypassing the union file
system and you will get fast access.
Another good thing is that you can stop your container and even delete it
without losing your data. You can upgrade to version 2.0.0 without having to
copy data out of your old container into the new (sort of).
But this isn't very convenient because when you start your new container it will
have a new id and it will create a new path on the host based on that new id. In
order to use the data from the previous container you will need to copy it from
the old path.
Mapping data volumes to host paths
So a better way than the above is to map the container path to the host path
in a consistent way.
You have two choices for mapping a container path to the host system.
- you can specifiy the host file system path
- you can create a named volume and let docker determine the host file system path
un-named or user mapped volumes
If you want a volume to point to a specific path on the host file system
you can do the following where ~/
refers to the host file system and
/tmp
refers to the container file system.
docker run -it -v ~/:/tmp busybox
You must use absolute paths for the container path.
You can use absolute or relative paths for the host file path. However,
relative paths must begin with ./ or ~/. If you just specify somepath
it
creates a named volume which is very different and is explained below.
Inspecting the container will display the following.
"Mounts": [
{
"Source": "/home/docker",
"Destination": "/tmp",
"Mode": "",
"RW": true
}
],
Ahhhhhh, now we can upgrade to version 2.0.0 just by pointing the new
container to the same path. Look ma, no copying data!
named volumes
There are at least two ways you can create a named volume.
docker volume create --name=myvol
This second way both creates myvol
if it doesn't exist and maps it to the
container path.
docker run -v myvol:/tmp busybox
Either of the above commands will create a volume named myvol
.
# docker volume inspect myvol
[
{
"Name": "myvol",
"Driver": "local",
"Mountpoint": "/mnt/sda1/var/lib/docker/volumes/myvol/_data"
}
]
This has the exact same result as you specifying the host path, but you let
docker determine the host path.
NOTE: While it may seem more convenient to specify the host path yourself --
maybe because you like having it under your home directory -- for
production applications it's generally more work. If you are deploying to
hundreds of servers it's easier to let docker create the host path so you
don't have to.
This will delete the volume.
docker volume rm myvol
Error response from daemon: Conflict: volume is in use
You will likely see the error because docker won't let you delete a volume
if there are any containers (running or stopped) that refer to it.
Data volume containers
Docker seems to push the idea of data volume containers. I don't know why.
Seems to be no advantage over named data volumes. I think it must be outdated
advice from before there were named data volumes.
The current docker documentation says:
If you have some persistent data that you want to share between containers, or
want to use from non-persistent containers, it’s best to create a named Data
Volume Container, and then to mount the data from it.
So the idea here is to create a container that doesn't do anything but specify
one or more (the examples use unnamed) volumes. Other containers can use
docker run --volumes-from
to use the volume in the data volume container.
docker create -v /my/path --name file-upload-data file-upload /bin/true
docker run -d --name file-upload-app1 --volumes-from file-upload-data file-upload
docker run -d --name file-upload-app2 --volumes-from file-upload-data file-upload
The above creates a container file-upload-data
that just sits there and isn't even running. Then it runs two containers,
file-upload-app1
and file-upload-app2
that store any data they write
to /my/path
to the volume owned by file-upload-data
.
How is this better than having the two containers just use the same named
data volume?