Compare Docker to traditional virtual machines and, spot it in a development life cycle.
Docker has been creating a lot of buzz in the development community lately. This article will compare Docker to traditional virtual machines and hopefully define its sweet spot in a development life cycle. This is written from a Docker newbie’s perspective and will highlight some valuable lessons and tricks I have learned over the past few months.
When I coach people on running crowdsourced code challenges on Topcoder,
there is one critical aspect I always emphasize to achieve good
participation: it’s a simple fact that the environmental setup should
take no more than 5 minutes. Let me elaborate.
At any given time there may be hundreds of
other challenges that members can choose from. If your challenge
requires an hour to set up the environment before a developer can begin
the solution, he/she will most likely move to the next challenge where
they can dive in right away.
If I am running a simple MEAN.io, the setup instructions might be:
- git clone
- npm install
- mongod
- grunt
BAM! Thirty seconds later you’re up and
running and ready to code your solution. It is very difficult to improve
on this, and it might be hard to find a place for Docker in this
scenario, but let’s expand on this example.
Let’s say you are a little further along
in your life cycle and you want to split your web app with your api
layer. Now you need to manage two code repositories and start them both
and wire them together with env vars.
Let’s keep going and say you want some
indexes created in Mongo and use Redis to manage sessions. O.k., better
yet, let’s say you want to swap out Mongo for PostgreSQL and now you
have schemas you need to import. If you sat at the front of class at
code school, you might start to see the dilemma. Without Docker you
might be able to orchestrate this to some extent with a task runner like
Grunt or Gulp. If you came from the Ruby camp maybe you would use
db_migrate to manage your database. You might even use foreman
to start all your services. All these techniques are perfectly
acceptable and we’ve been using them for many years but now Docker and
Docker-compose (formerly Fig) offer and brand new paradigms.
What is Docker?
When you find a new tool and you visit its
site it is sometimes difficult to weed through the marketing and
understand what the tool really does. The following is my attempt to
explain what Docker is from firsthand experience. I have never seen it
explained this way so take it with a grain of salt. I will assume that
you already understand the classic virtual machinery and I will compare
Docker to what you already know.
In my opinion there are two major differences between Docker and a virtual machine like VMWare.
Difference 1: Fundamental distinction between an image and running a container
Typically when you use something like
VMware, you start with an ISO of Ubuntu or maybe you shop around for a
pre-built image that offers just what you need. You run that image, and
it becomes a VM and you start to install packages, configure it and then
use it.
At this point the original image becomes a
distant memory. If you have done this before, you probably will clone
the VM before you customize it too much, so you always have that virgin
foundation in case you need to build a similar machine. This is best
practice; however, it is inherently part of Docker’s DNA.
With Docker you build your image (more on
that below) and you start it as a container. The image is always saved
and preserved. When you start a container it runs a single command. This
is a little hard to image first, but it is very complementary to the
microservice strategy that we’ve talked so much about, because you can
launch a separate container for each process.
For example, your container might be run
with node web.js or mongod or redis-server or node api.js. You might
even start a container to display a log file. By default a container
state is persistent but you can use a –rm switch to make them ephemeral.
Not every container needs to originate from an unique image.
For example, if you want to use a
Cassandra cluster you might start three Cassandra containers from the
same image and you will use the –link options so they can communicate
with each other. You can restart stopped containers and the state (even
the data) will be preserved or you can destroy the containers and start
new ones. With just a few images you can build a full stack of services
all running separate containers. You can even mount volumes from one
container to another or from the host system. These are very powerful
concepts; however they are a little more abstract than we are used to.
Once again, I believe that this “container revolution” is driving the
push toward microservice.
Difference 2: You are encouraged to build minimal images from layers of images rather than monolithic ones.
As I mentioned in the previous section,
large monolithic images are a thing of the past and Docker encourages
you to build images with a recipe-like convention stored in a
Dockerfile. Then the command docker build -t myfirstcontainer . will
create the image. All Dockerfiles start with the FROM directive. This is
the base image found in the Docker hub. FROM Ubuntu:latest is a good
place to start. You can use the COPY directive to copy code from your
host to be used during the building process but more typically we would
use the RUN directive to use the images OS to get the packages we want.
For example:
RUN apt-get update RUN apt-get install mongodb RUN apt-get install ssh |
This would give us a terse Ubuntu image
that would include Mongodb and ssh. We then would add the default
container command called an ENTRYPOINT so we could simply
run the image to create a container that was running Mongo. The
Entrypoint might look something like this: ENTRYPOINT
/user/local/bin/mongod and when we started the container, Mongodb would
be running. Of course, all your configuration would also be done during
the build time using the ENV or RUN directive. Starting a container
feels very different than starting a vm because there appears to be no
boot process.
Tips on using Docker
Here are some lessons I have learned over the past few months that are worth sharing.
- docker images shows you images.
- docker ps shows your RUNNING containers (Understand the difference of these two concepts and commands since the output looks almost identical).
- docker ps -a shows ALL containers, even stopped ones (that can be restarted).
- docker run <image> starts a new container from an image running the default entry point.
- docker exec </your/command> <container> runs a command on an ALREADY RUNNING container. This is useful! For example: docker exec my_mongo_container /bin/ps will show the output of the running processes on your Mongo container and then the container will exit.
- Docker runs on Linux so you will need to have (if you are on a Mac) a Docker virtual machine called Boot2docker which is Virtualbox under the covers. You should download the Virtualbox GUI from Oracle so you can tweak this type 2 hypervisor. I know this is confusing but if you follow the instructions on docker.com it will make sense.
- Boot2docker creates a virtual network adaptor with the IP address of 192.168.59.103 but your containers will be on a virtual network that is 172.17.0.x. So you won’t be able to reach them from your host (unless you are on Linux already) All the documentation gives you instructions to port forward. However I found this simple trick to be more elegant: just add a route to your container network using your boot2docker host as a gateway with the following command on your Mac: sudo route add 172.17.0.0/24 192.168.59.103
- Once you have added the above route you can run docker inspect <containerId> to get the IP address and hit the container directly.
- I recommend you name your containers and images with the -t tag switch and add the word ‘container’ or ‘image’ to avoid confusion.
- Once you have a container you can convert it back to an image and preserve its state by using docker commit. For example I started with an Oracle image and ran it as a container. I then created the schemas in the container and loaded some sample data. I then committed this container as an image and uploaded it to Docker Hub. Now someone can do a docker pull kbowerma/sp5 and get an image that not only contains a working version of Oracle 11g on Ubuntu but also has my schema and sample data. This is super powerful!
- For the sake of brevity I have left out some important details you will need. This article is not a Docker how-to (there are plenty already) but is simply a conceptual comparison.
Docker Compose (fig)
Ok, this is really cool. Once you play
with Docker and get a bunch of containers up and running and talking to
each other, you will learn the commands to do this are somewhat lengthy
and require a lot of switches. Docker Compose is a small binary that
takes a YAML file that allows you to orchestrate a complex set of
containers to interact with each other.
Below is a simple Docker-compose.yml file that will create and run three Cassandra containers as a cluster from a single image.
cassnode1: build: . links: - cassnode2 - cassnode3 hostname: cassnode1 command: -name "DataGuard Cluster" -seeds "cassnode1,cassnode2,cassnode3" cassnode2: build: . links: - cassnode3 hostname: cassnode2 command: -name "DataGuard Cluster" -seeds "cassnode1,cassnode2,cassnode3" cassnode3: build: . hostname: cassnode3 command: -name "DataGuard Cluster" -seeds "cassnode1,cassnode2,cassnode3"
|
It assumes it is in the same directory as
the Dockerfile, hence the build . The really cool thing is that it will
build the three containers and run them. Before it builds them it will
check to see if they have already been built and if they are, it will
just re-start them and link them together. If you don’t have them built
as containers, it will look for the Dockerfile and build them. The first
one may take a few minutes but the 2nd and 3rd will be almost
instantaneous since they use the same image and all the layers are
already present.
When an image is built from a Dockerfile,
every command is called a ‘Step’ and Docker will accept a commit at the
end of every step. This means that if you build an image with 10 steps,
to get 10 packages via apt-get, it will make 10 internal commits. But
then if you add one more command (Step) and run Docker build again, it
will use the previous commits and will only take a few seconds to get
that single new package.
Conclusion
If the development environment is simple
and requires little setup, then Docker may be overkill; but once things
start to become complex, Docker does a great job to simplify it again
and make the replication of the development environment exact, easy and
completely transportable. Docker should be in every developer’s toolbox
even if it is down at the bottom next to the grout knife.