Docker is one of the most used tools in the DevOps world for development and deployment. It is considered the de facto containerization application. However, it might be a little vague for newcomers what Docker is exactly used for. To some junior DevOps engineers, they might not see a big difference between a Docker container and a virtual machine (perhaps a Vagrant one). And, more importantly, why not just use a configuration management application (which is another DevOps tool) to do the job? So, to let's be all on the same page and read along.
Any invention or technology must address a certain problem or pain point. Even in non-tech circles, you'll always learn that your product must solve a problem or a pain point. So, what does Docker solve here?
Let's say you're a developer working on a NodeJS web application. Throughout coding the app, you come to install some dependencies. NodeJS uses a tool called
npm to make dependency satisfaction easier. You just keep a file called
package.json in the root directory of the app, where you add all the libraries and modules that your software will need, and then run
npm install. Now all is good. But, what about NodeJS itself? it must be installed first, right? here is where the problem starts to arise. Which version of NodeJS was this app built against? Although NodeJS is very generous when it comes to the platform where is can run, but sometimes, a specific version needs a specific OS and, more importantly, kernel version. What if you are using the cutting edge edition? This means that you will need to compile it first before you can use it! Compiling NodeJS (or any software) is not as easy as just installing it. You will need compilers like
gcc, libraries like
libstdc++ and many others.
As you can see, lots of questions need to be asked by the ops team to the devs team. Each question may lead to a totally different path.
Of course, a seasoned ops engineer or system administrator does not have a problem asking the right questions and deploying the needed infrastructure. But how could they possible do this?
A well-written shell script can do the job. But it has its own shortcomings:
- Now the application needs new components. We need to ensure that they are installed on the environment.
- The devs are using a new tool called
gulpto perform some tasks. We need this tool to be added to the deployment script.
- Running the script several times (to deploy the missing dependencies or to reset the environment) throws errors. We need to modify it so that it only deploys what's missing. This leads to a thicker script file and, thus, a more error-prone, hard-to-debug one.
- The devs want a new environment to test a cutting-edge feature. They can't just mess with the already running one, so they want the ops team to provision a new environment, possibly with some differences than the original one. This means a new script version needs to be written, taking into account the modifications demanded by the dev team.
A virtual machine or Vagrant
A virtual machine might solve the above problems, especially if you combine it with a configuration management tool like Ansible. You'd even save the installation and setup time when you use Vagrant.
But virtual machines and Vagrant suffer their own drawbacks.
As you can see from the diagram, you can use a virtual machine and, using Ansible for example, you can automate creating environments that are exactly the same as each other. You can also made some changes to the Ansible playbook to make any required environment modifications without risking the introduction of bugs or wasting time. You can also use Vagrant to encapsulate the whole environment in a box and use that to spawn already-configured Vagrant machines as much as needed.
But did you notice that that each machine is consuming a dedicated share of the host's CPU, memory, and disk space? Those resources are not used solely for running the NodeJS application. They are also used to power the OS itself. This means a lot of wasted computing power, especially when more environments are needed.
What about a virtual machine that does NOT need a dedicated OS?
That is a rough definition of a Linux container. A container encapsulates whatever application you want to deploy with all the dependencies that it might need, even the OS related ones.
The trick here is, we will not need a complete OS to run it. Only the kernel libraries that are needed will be packaged. This results in:
- Much smaller image size. Some Linux images like Alpine is only a few megabytes!
- Much faster load time. A Docker container can start in literally milliseconds.
- Mush spared resources. Since the container does not need a complete OS, the CPU cycles and memory addresses consumed by a dedicated OS are spared.
- A container is just a process on your system. You can create hundreds (even thousands) of them on a single host. The same is not true for virtual machines unless you own a very powerful host. This results in easier, cheaper, and more robust scalability options.
- Docker is currently supported on Linux, Windows (version 10 pro), and modern macOS. This means that you can build your image on an Ubuntu 16 machine and run it on the newer Ubuntu 18, or even Windows or macOS. This means even easier deployments and more costs saved.
Containers, images, Docker…I'm confused!
Newcomers to Docker often do not realize the difference between an image and a container. They also think that Docker is another name for containerization. Let's make things clearer.
The image is like the template that Docker uses to spawn containers. Just like a class is used to create objects (if you are a developer), or the binary file is used to launch processes (if you are a sysadmin).
A typical image contains:
- Files: like application binaries, dependencies, libraries, kernel modules and so on
- Metadata: instructions for how the container will behave. For example, which process it will run, which network ports it will expose, which volumes it will use for persistent storage, among other settings.
A container is the manifestation of the image. It's just a process like any other process on your system, yet it is much richer. A container encapsulates a complete application, with everything it needs to operate correctly. Let's see how our NodeJS container might look like on your system:
Dockerfile is the instructions that are used when building the image. Once the image is built, it can be used to start multiple containers, all sharing the same characteristics and behavior.
Docker is not containerization
The last ambiguous point here is: what exactly is Docker? Docker is an open-source project that was first released in March 2013. It uses the containerization technology to create an operating-system-level of virtualization. Originally, it was based on the LXC technology, but since 2014, it used its own containerization technology, libcontainer, which is built using the Go programming language.
The point that I want to drive home here is that Docker is an implementation of a technology that already existed before, it is not the containerization method itself.
When not to use Docker?
Docker is very robust when it comes to quickly deploying complete infrastructures with the minimum cost and fastest load times. However, sometimes it may not be your best option if:
- You application runs only on Windows or macOS. Till the time of this writing, Docker can only be used to containerize applications that run on Linux kernel. This means that you cannot convert a native Windows or macOS application and run it elsewhere. You can, however, run a Linux container on Linux, Windows, or macOS.
- Your application is tightly coupled with the OS. Sometimes, you may need to make specific, low-level changes to the OS so that your application can run as expected. Think of security tools that must have direct access to the CPU or memory of the OS to scan for threats.
- Your application works in GUI only. Although the trend is moving fast towards client-server applications, where you access your interface using a thin client (aka web browser), there are still some legacy applications that need a GUI to work. Think of a word processor or a spreadsheet app that does not have an SaaS (Software as a Service) cloud version and you get the point.
Other vendors in the market
Despite its popularity, Docker is not the only software that uses Linux containerization. We have other vendors like Apache Mesos.
Technologies built on top of Docker
In this article, I outlined the challenges that containerization and Docker try to solve. I demonstrated a simple web application deployment example and walked you through the possible paths that you might follow if you will use virtualization technologies vs. containerization. Then, I briefly stated some of the use cases where Docker is not the best choice. Finally, I showed you examples of other Docker-related technologies. I hope you liked this article, please drop me a comment if you want to ask or suggest anything.