Atlassian Orchestration with Docker: multi-host support for the win!

December 16th 2015 Nicola Paolucci in Docker, Orchestration

After the fantastic DockerCon Europe and the recent releases of Docker 1.9.1, Compose 0.5.2 and Swarm 1.0.1 I finally have all the missing bits to automatically deploy a suite of Atlassian products to a swarm cluster without supervision:

  • Docker Network and Docker Swarm are now production ready.
  • Docker Compose finally works in a multi-host configuration.
  • Swarm is capable to handle 1000+ hosts and 50,000 containers as demonstrated live on stage.

This is the dream I have - and we probably all have as an industry: to describe our software components, describe how they are linked together and let the infrastructure automatically arrange itself to match our needs. It's here! It has been cooking for a while and depending on the technology stack maybe it is already there for you. Nonetheless the Docker suite of tools have reached that moment for me. And it's glorious.

Let me show you an example of the possibilities.

The objective

I start with a meaningful goal: Deploy Bitbucket Server and PostgreSQL to a 3-node cluster created with Docker Machine and managed via Docker Swarm.

This is the end result I have in mind, where my setup does not mention any hard-coded IP address:

Architecture

Prerequisites

As a prerequisite I need an account on an IaaS provider, this time around I choose Digital Ocean but any other of the Docker Machine drivers will do. I create an authenticated API_TOKEN and this allows me to create nodes at will using "docker-machine".

Install and run discovery server

The new multi-host capabilities of Compose and Swarm require a more complete discovery service than the basic Docker Hub Swarm tokens, so in this piece I will use Consul, a discovery server and key/value store from HashiCorp.

  • First step, create the consul node using docker-machine:

    docker-machine create -d digitalocean --digitalocean-access-token=$DO_TOKEN \
      --digitalocean-region "ams2" consul

    This specifies the "ams2" region, passes my token and names this machine "consul".

    Running pre-create checks...
    Creating machine...
    Waiting for machine to be running, this may take a few minutes...
    Machine is running, waiting for SSH to be available...
    Detecting operating system of created instance...
    Provisioning created instance...
    Copying certs to the local machine directory...
    Copying certs to the remote machine...
    Setting Docker configuration on the remote daemon...
    To see how to connect Docker to this machine, run: docker-machine env discovery
  • After the machine is ready, switch our docker environment to run commands on that instance by evaluating:

    eval "$(docker-machine env consul)"
  • Finally run the consul server in a simple non redundant configuration with:

    docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp -h consul progrium/consul -server -bootstrap
  • Test it by curling:

    curl $(docker-machine ip consul):8500/v1/catalog/nodes
    [{"Node":"consul","Address":"172.17.0.2"}]

Setup a 3-node Swarm cluster

Now we can create a cluster of 3 machines, with slightly different requirements.

  • Let's start with the Swarm master, which will control our entire cluster:

    docker-machine create -d digitalocean --digitalocean-access-token=$DO_TOKEN \
      --digitalocean-image "debian-8-x64" \
      --digitalocean-region "ams3" --swarm --swarm-master \
      --swarm-discovery=consul://$(docker-machine ip consul):8500 \
      --engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
      --engine-opt="cluster-advertise=eth0:2376" \
      demo-master

    Take note we chose a specific Debian 8.2 image debian-8-x64, the default Ubuntu image that Docker Machine chooses on Digital Ocean won't work because it has an older kernel that does not work with Docker overlay networks.

    We also pass cluster-store and cluster-advertise to the Docker engine on this new machine with information on how the swarm can store keys and values of the infrastructure we are building. Those are stored on the consul instance we readied before.

  • Next create a machine with 2Gb of RAM to run Bitbucket Server:

    docker-machine create -d digitalocean --digitalocean-access-token=$DO_TOKEN \
      --digitalocean-image "debian-8-x64" \
      --digitalocean-region "ams3" \
      --digitalocean-size "2gb" \
      --swarm \
      --swarm-discovery=consul://$(docker-machine ip consul):8500 \
      --engine-label instance=java \
      --engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
      --engine-opt="cluster-advertise=eth0:2376" \
      node1

    We require the machine to have 2GB of RAM and tag this machine with label java so that we can deploy our application based on labels.

  • Third, create a machine to host the PostgreSQL database:

    docker-machine create -d digitalocean --digitalocean-access-token=$DO_TOKEN \
      --digitalocean-image "debian-8-x64" \
      --digitalocean-region "ams3" \
      --swarm \
      --swarm-discovery=consul://$(docker-machine ip consul):8500 \
      --engine-label instance=db \
      --engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
      --engine-opt="cluster-advertise=eth0:2376" \
      node2

    We tag this machine with label db so that we can deploy our application based on labels.

  • Check that the machines have been created:

    docker-machine ls
    NAME      ACTIVE   DRIVER         STATE     URL                         SWARM
    consul    -        digitalocean   Running   tcp://5.101.98.134:2376     
    cluster   *        digitalocean   Running   tcp://188.166.23.145:2376   cluster (master)
    node1     -        digitalocean   Running   tcp://178.62.247.112:2376   cluster
    node2     -        digitalocean   Running   tcp://178.62.212.73:2376    cluster
  • Connect our local docker command to the entire Swarm:

    eval $(docker-machine env --swarm cluster)
    docker info
    Containers: 15
    Images: 12
    Role: primary
    Strategy: spread
    Filters: health, port, dependency, affinity, constraint
    Nodes: 3
     cluster: 188.166.23.145:2376
      └ Containers: 2
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 519.2 MiB
      └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), provider=digitalocean, storagedriver=aufs
     node1: 178.62.247.112:2376
      └ Containers: 10
      └ Reserved CPUs: 0 / 2
      └ Reserved Memory: 0 B / 2.061 GiB
      └ Labels: executiondriver=native-0.2, instance=java, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), provider=digitalocean, storagedriver=aufs
     node2: 178.62.212.73:2376
      └ Containers: 3
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 519.2 MiB
      └ Labels: executiondriver=native-0.2, instance=db, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), provider=digitalocean, storagedriver=aufs
    CPUs: 4
    Total Memory: 3.075 GiB
    Name: c5e1ce85f79a

Multi-host Docker Compose configuration

Next on the list is to write the multi-host configuration in a docker-compose.yml, which will take care of starting both our Java application and our database in the proper order. It will also create a transparent overlay network between the cluster nodes involved.

The interesting points of the setup are:

  • We do not specify any IP addresses for the physical infrastructure.
  • We allocate applications to nodes using label constraints.
  • We create a data only container with Bitbucket Server licensing information.
  • We only use official images from the Docker Hub.

This is the complete docker-compose.yml:

bitbucket:
  image: atlassian/bitbucket-server
  ports:
    - "7990:7990"
    - "7999:7999"
  volumes_from:
    - license
  user: root
  privileged: true
  environment:
    - "constraint:instance==java"
db:
  image: postgres
  ports:
    - "5432:5432"
  environment:
    - "POSTGRES_PASSWORD=somepassword"
    - "constraint:instance==db"
license:
  build: .

License data-only was built from a Dockerfile written like this:

FROM alpine
RUN mkdir -p /var/atlassian/application-data/bitbucket/shared
COPY ./bitbucket.properties /var/atlassian/application-data/bitbucket/shared/bitbucket.properties
VOLUME /var/atlassian/application-data/bitbucket
CMD ["/bin/true"]

And the only file it stored in reality is a single bitbucket.properties file with this:

setup.displayName=Bitbucket Server
setup.baseUrl= http://localhost:7990
setup.license=<fill your license>
setup.sysadmin.username=admin
setup.sysadmin.password=admin
setup.sysadmin.displayName=<User Name>
setup.sysadmin.emailAddress=<Email Address>
jdbc.driver=org.postgresql.Driver
jdbc.url=jdbc:postgresql://orchestration_db_1:5432/postgres
jdbc.user=postgres
jdbc.password=somepassword

To start everything we can now invoke docker-compose, making sure we turn on the multi-host networking and specify we want to use an overlay network:

docker-compose --x-networking --x-network-driver=overlay up -d

The result is our application deployed to the cluster:

docker ps
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS                                                          NAMES
0f6adc9a14bb        atlassian/bitbucket-server   "./bin/start-bitbucke"   2 hours ago        Up 2 hours         178.62.247.112:7990->7990/tcp, 178.62.247.112:7999->7999/tcp   node1/orchestration_bitbucket_1
0a305957925f        postgres                     "/docker-entrypoint.s"   2 hours ago        Up 2 hours         128.199.37.223:5432->5432/tcp                                  node2/orchestration_db_1

Note that the Java application "Bitbucket Server" was deployed to the instance with 2GB of RAM labelled java as planned, and the PostgreSQL onto node2 which was labelled db. Beautiful.

Bitbucket Server

Issues

While creating the setup above I ran into a whole set of issues, partially due to the novelty of the tools and partially due to my hastiness.

  1. Proper orchestration only works with a fully fledged discovery service like consul, not the default token you get when running the basic docker swarm create.

  2. In the flag --engine-opt="cluster-advertise=eth0:2376" guides mentioned eth1 but that is dependent on the specific machine and provider used. In the case of Digital Ocean the correct interface is eth0. I tracked that down by looking into /var/log/upstart/docker.log where I found this bit:

    Error starting daemon: discovery advertise parsing failed (no available
    advertise IP address in interface (eth1:2376))

    To understand what happened I even went looking into the source code.

  3. At one point I got a very cryptic failure on vxlan interface creation, like the following:

    ERROR: Cannot start container 774f639d4275af7f53dd8c8f3d65387d053c8000ab96ce3c6765b982428c3a2d: subnet sandbox join failed for "10.0.0.0/24": vxlan interface creation failed for subnet "10.0.0.0/24": failed in prefunc: failed to set namespace on link "vxlana389573": invalid argument

    Turns out that to get the full blown multi-host support in compose and swarm, you need at least a 3.15+ Linux kernel (as explained here), and the default Digital Ocean Ubuntu image had an older one:

    uname -a
    Linux node3 3.13.0-68-generic #111-Ubuntu SMP Fri Nov 6 18:17:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

    To make things work I had to add --digitalocean-image "debian-8-x64" to my docker-machine create command.

  4. To find the proper Digital Ocean image I installed a neat tool called tugboat, which is a command line tool to provision DO images:

    gem install tugboat
    tugboat authorize
    tugboat images | grep ubuntu
    12.04.5 x64 (slug: ubuntu-12-04-x64, id: 10321756, distro: Ubuntu)
    12.04.5 x32 (slug: ubuntu-12-04-x32, id: 10321777, distro: Ubuntu)
    15.10 x64 (slug: ubuntu-15-10-x64, id: 14169855, distro: Ubuntu)
    15.10 x32 (slug: ubuntu-15-10-x32, id: 14169868, distro: Ubuntu)
    15.04 x64 (slug: ubuntu-15-04-x64, id: 14169884, distro: Ubuntu)
    15.04 x32 (slug: ubuntu-15-04-x32, id: 14169999, distro: Ubuntu)
    14.04.3 x64 (slug: ubuntu-14-04-x64, id: 14530089, distro: Ubuntu)
    14.04.3 x32 (slug: ubuntu-14-04-x32, id: 14530129, distro: Ubuntu)

    I tried all Ubuntu images and they all failed, including 15.04,so I had to use an image for Debian 8.2, that had the proper kernel version and didn't crash.

  5. Whenever I needed to restart the containers, something went wrong with the overlay network creation, the vxlan network gave me a problem. To remove the vxlan configurations I used:

    sudo umount /var/run/docker/netns/* && sudo rm /var/run/docker/netns/* && start docker

Conclusions

The source of the above configurations can be found on Bitbucket.

This for me was the first magical step into having an entire suite of Atlassian tools deployed and run automatically onto a Docker Swarm. Stay tuned for the next chapter in the series. If you found this interesting and want more follow me at @durdn or my awesome team at @atlassiandev.