Common Dockerfile Mistakes

June 30th 2016 Derek Chamorro in Docker, Microservices

We live in a containerized world. As companies transition from monolithic builds to microservice architectures, we often overlook some common mistakes we make when we write our Dockerfiles. Most are simple mistakes, allowing a user to make use of build cache in a more pragmatic fashion. Others, should be avoided at all costs. The following are some common mistakes I've seen uploaded in the past and some ways to correct them.

Understanding Docker cache

When Docker builds an image, it goes through each line (or instruction) of your Dockerfile. As each line is examined, Docker will look for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image. So, as an example,with each occurrence of a RUN command in your Dockerfile, Docker will create and commit a new layer to the image and then commit it to disk. Any RUN command changes will update the build layer. If nothing is changed, Docker will use the cache of a previous build on the host for subsequent builds.

Keep this in mind when using ADD or COPY commands. COPY will copy a file or a directory from your host to your image. ADD can do the same, but also has the ability of fetching remote URL's, extracting TAR files, etc. As the range of functionality covered by ADD can be quite large, it is usually best to use COPY for copying files or directories into the build context with RUN instructions for downloading remote resources.

Example:

FROM alpine:3.3

ADD test.tar.gz /add # Will untar the file into the test directory
COPY test.tar.gz /copy # Will copy the file directly

Running apt/apk/yum

Running apt-get install is one of those things virtually every Debian-based Dockerfile will have. This is due to satiate some external package requirements in order to run your code. But, using apt-get as an example, comes with its fair share of gotchas:

Example: apt-get upgrade. This will update all your packages to their latests versions, which can be bad because it prevents your Dockerfile from creating consistent, immutable builds.

Example: Running apt-get update in a different line than running your apt-get install command. Running apt-get update as a single line entry will get cached by the build and won't actually run every time you need to run apt-get install. Instead, make sure you run apt-get update in the same line with all the packages to ensure all are updated correctly.

Using :latest

Many Dockerfiles use the FROM package:latest pattern at the top of their Dockerfiles to pull the latest image from a Docker registry. While simple, using the latest tag for an image means that your build can suddenly break if that image gets updated. This can lead to problems where everything builds fine locally (because your local cache thinks it is the latest) while a build server may fail (because something like Bitbucket Pipelines makes a clean pull on every build). Additionally, troubleshooting can prove to be difficult, since the maintainer of the Dockerfile didn't actually make any changes.

To prevent this, just make sure you use a specific tag of an image (example: alpine:3.3). This will ensure your Dockerfile remains immutable.

EXPOSE and ENV

ENVs should only be declared when you need them in your build process. If they are not needed during build time, then they should be at the end of your Dockerfile, along with EXPOSE. Below is an example of a Vault Dockerfile build. The ENV is declared for the release URL fetch required for the build and the EXPOSE is left at the end:

ENV VERSION 0.5.3

ADD https://releases.hashicorp.com/vault/${VERSION}/vault_${VERSION}_linux_amd64.zip /tmp/
ADD https://releases.hashicorp.com/vault/${VERSION}/vault_${VERSION}_SHA256SUMS /tmp/
ADD https://releases.hashicorp.com/vault/${VERSION}/vault_${VERSION}_SHA256SUMS.sig /tmp/

WORKDIR /tmp/
RUN apk --update add --virtual verify gpgme \
 && gpg --keyserver pgp.mit.edu --recv-key 0x348FFC4C \
 && gpg --verify /tmp/vault_${VERSION}_SHA256SUMS.sig \
 && apk del verify \
 && cat vault_${VERSION}_SHA256SUMS | grep linux_amd64 | sha256sum -c \
 && unzip vault_${VERSION}_linux_amd64.zip \
 && mv vault /usr/local/bin/ \
 && rm -rf /tmp/* \
 && rm -rf /var/cache/apk/*

WORKDIR /

# Expose TCP listener port
EXPOSE 8200

FROM statement

Attempting to chain multiple images together by using multiple FROM statements will not work. Docker will only use the last declared FROM statement.

Example:

FROM node:6.2.1
FROM python:2.7

Running docker exec into that running container will net the following:

$ docker exec -it 9a5349f8f0c3 bash
root@9a5349f8f0c3:/# which python
/usr/local/bin/python
root@9a5349f8f0c3:/# which node
root@9a5349f8f0c3:/#

Which leads to my next point....

Using VOLUME

Volumes in your image are added when you run your container, not when you build it. You should never interact with your declared volume in your build process as it should only be used when you run your container.

Example: Creating a file in my build process and then running cat on the file once the image is run works fine:

FROM ubuntu:12.04
RUN echo "Hello Charlie!" > /test.txt

CMD ["cat", "/test.txt"]

$ docker run test-build-volume
Hello Charlie!

If I attempt to do the same thing for a file stored in a volume then it won't work:

FROM ubuntu:12.04
RUN echo "Hello Charlie!" > /var/data/test.txt

CMD ["cat", "/var/data/test.txt"]

$ docker run test-build-volume
cat: can't open '/var/data/test.txt': No such file or directory

Storing Secrets

Never store secrets (keys, certs, passwords, etc) in your actual image. It's bad... like REALLY BAD. You could potentially store secrets encrypted in images, but then you still need a way of passing the decryption key, and you are unnecessarily giving an attacker something to work with.

Secrets can be passed in environment variables, as it has been recommended in the 12 factor App, but there are caveats to this as well:

  • Environment variables are visible to all child processes, Docker commands (like inspect), and any linked containers.
  • Since environments can be saved, secrets can appear in debugging logs.
  • They can't be deleted.

Another used option is storing secrets in a shared volume:

$ docker run -d -v $(pwd):/secret-vol:/secret-vol:ro test

The problem with this solution is that you still keep your secrets in a file, which could potentially have many sets of eyes viewing it.

One the better solutions is to use a key management system, like Vault or Keywhiz to keep secrets and retrieve them from the container at runtime. They can help you avoid an embarrassing situation.

Conclusion

Most of us have grown accustomed to making Dockerfiles. By avoiding some simple mistakes, we can take advantage of the following:

  • A better understanding of how Docker leverages/invalidates it’s build cache
  • How Docker handles image layering and ordering of statements
  • Gain greater efficiency in your workflows.
  • Safer storage of secrets

I hope these tips will help you as much as they've helped me and my team. And speaking of help, this post from the team at Runnable was really helpful as I was organizing my thoughts. Check it out if you're keen to explore this topic further. Thanks for reading!