The value of your project significantly increases when you can put it in production. Docker containers is an easy technology to achieve this. Chiel briefly explains the technology behind Docker and demonstrates how you can do it yourself.
You’ve been coding with Python for a while, but have always ran your scripts on your local machine. You want to take it to the next level and deploy your scripts on a server. Here, you will discover an easy way of doing this by using a technology that’s called a container.
Virtual Machines versus Containers
There are various levels of running programs on a server, with various degrees of flexibility. A couple of years ago, virtual machines (VM) were the standard. This meant that, with multiple applications requiring different dependencies, we needed to create new VMs for each application. This is cumbersome and resource intensive, as each VM comes with its own OS. This results in a low utilization rate of your infrastructure. A more modern and agile way is to use a container. Containers are a way of isolating programs (or processes) from each other. They share parts of the underlying OS, but can support different dependencies. Due to their efficient use of resources a containerized application starts fast and can run anywhere with the same underlying OS (kernel). Which means that they are easily deployed on, for instance, the cloud.
Docker is a program that enables the packaging of applications as containers with a repeatable execution environment. Docker is open-source and has great documentation on their website. While deploying VMs on a server can be complex, Docker containers are not. So every aspiring data scientist can easily learn how to use this technology.
Note: I will use the terms image and containers in the following section. They are closely related but not the same. At the end of this blog I will elaborate more on the differences.
To give you an idea of how easy it is to use Docker, I’ll give you a small demonstration of creating a ‘Hello World’ application in Docker. The following steps use Docker on a Mac. Starting in the terminal, run the following commands.
The second statement creates an empty Dockerfile. We use VIM to edit the Dockerfile; however, you can use any text editor. Within VIM, press “i” to start editing and use the following statement:
You can save your file by hitting “esc”, type “:wq” and press “enter”.
We have created a Dockerfile. The Dockerfile is one of the benefits of Docker: the ability to create a Docker image with a simple text file. This raises the question, what do the commands in this file do?
- FROM will get an existing Docker image which will be the bases of our new image. In our case, we use Python 3 as our basis.
- ADD defines which files to copy from the our computer onto the container. We add our python file to our container.
- CMD specifies what command to run within the container.
Now we can start building our Docker image with docker build. The -t flag enables us to name our app helloworldapp. The “.” tells Docker to start looking for the Dockerfile in the current directory. When we run the app the container is created and we see our hello world statement printed. Not very exciting but it is a start.
Once we have run our app, we want to clean the container from Docker. With the pa -a statement we see all running and past run Docker containers and we can remove them by providing the container id to the rm statement.
On Docker.com you can create a personal repository. Mine is called test. We can push our image to the repository, so we can pull it on a different machine with the same kernel and Docker installed to run. If we would have changed something in our Docker container, we can turn our container back into an image with the commit command. In our case this step is redundant.
Now our Docker image is pushed to our repository. To validate whether or not we pull it to our machine and run it, we first want to delete the Docker image from our system and validate it was not still present.
Et voilà. With Docker, you can easily share python apps without the need for someone else to create a virtual environment with the exact dependencies as you have. All dependencies are handled inside the Docker container, and are independent of the local environment. Also you can push images to public repositories so everyone can access them.
Differences between images and containers.
A Docker image is built up from a series of instruction in the image’s Dockerfile. These instructions are also called “layers”. The layers are stacked on top of each other and are read-only, except the very last one.
With the docker run command, you create a Docker container. When you create a new container, you add a new writable layer on top of the underlying layers. This layer is often called the “container layer”. The major difference between a container and an image is the top writable layer. All writes to the container that add new or modify existing data are stored in this writable layer. When the container is deleted, the writable layer is also deleted. The underlying image remains unchanged.
Schematic overview of the different commands in Docker. (Source: Unknown)