Containerization with Docker

BADAS Slides

Installation

To view detailed installation instructions you may visit the Docker website. For this tutorial we have taken out much of the details and provided you with the commands you will most likely need.

MacOS

All you need is to download the dmg here. Alternatively you will need to visit the Docker Hub website, create an account, and download it there.

Windows

Download Docker for Windows Installer.exe and follow the installation instructions.

Linux

For this tutorial we will assume that you are running Ubuntu. If you are running a different flavor of Linux please refer to the installation page on Docker Hub.

Set up the repository

Output from the apt-key command should be the same as the output below

If all is good then set up the stable repository

Install Docker CE

Now that you have linked the Docker repository you must update your list of repositories so that you can then install via apt-get.

Validate the Installation

In order to test that your installation was successful we must run the hello-world command using docker on the command line. To do so, open your terminal and run the following command.

Note: Linux installations will most likely need to run all docker commands as sudo

You should see the following output describing what just happened:

Congratulations! You’ve now got docker installed and working. Let’s move on to some cool stuff!

Pre-Built Docker Images

When setting up a particular environment with Docker you have two options on how to proceed:

  1. Configure it yourself (pretty tedious and not too fun)
  2. Grab a prebuilt image and modify if needed (preferred).

Docker Hub is a great resource for finding pre-built docker images that will save you from doing most of the leg work. Not only are there hundreds of thousands of publicly available options to choose from, you can also share your own images!

As seen in the hello-world example output, docker run contacts Docker Hub, looks for a prebuilt image called “hello-world” and (if it exists) it will essentially download that image to your computer and will execute the startup commands.

Now that we know the order of operations, let’s start by checking to see if there are any Jupyter images currently available. You can find a list of stacks available on both Docker Hub as well as Jupyter’s Docker page.

Let’s start out by grabbing the basic Jupyter Notebook image:

The -p option binds the host port to the container’s port which is necessary in this instance. --rm automatically removes the container when it exits. This is good practice to prevent future headaches.

You can check to see that it is running by using docker ps:

The CONTAINER ID is how you can access this particular image instance. With this you can kill the image docker kill 04af1ce8822c, but leave it for now.

Docker Image Workflow

Important to understand what just happened to make the best of what Docker has to offer. To summarize, Docker Hub hosts a plethora of prebuilt, immutable images which we can pull onto our local machine and run. A container is an instance of an image which can then be modified, rebuilt, and executed. In the next section we will be dealing primarily with containers.

Accessing Your Container

Containers can be accessed in a couple of ways, depending on how they’re configured as well as what purpose they serve. Since this particular container is hosting a Jupyter Notebook we will most likely want to access via a GUI.

Access via Web Interface

The output from the docker run command provided you with a URL which you can use to access via a web browser. It is running locally and thus can be accessed by visiting 127.0.0.1:8888. Note that you will need to copy and paste the alphanumeric token that is output in your terminal after running the docker run command as this is the password for the Jupyter notebook. Play around with the notebook to make sure it works as expected. 

Upon successful login you should see a familiar interface. If you are not familiar with Jupyter Notebooks then please see our tutorial.

Access via Command Line

There will be instances where your container has no graphical interface or you need to make changes while it is running. To do so Docker provides an exec subcommand to make this possible. Not only can you access your container but you can send commands to the container.

For now we will access this via the command line. To do so you will need the container ID.

* NOTE your container ID will be different for every container so copying and pasting this will not work!

  •  -i tells docker that you would like an interactive session, so you will actually enter the container.
  • -t is defining the name of this particular container (aka a tag) by which we access it.
  • bash is the command to run upon entering the container, which is essentially a terminal environment.

Take some time to look around and test it out. Do all the commands work? Is there anything missing?

Try ping google.com. So I can just install ping right?

sudo apt-get install ping doesn’t work though? Needs a password?

This is a common issue when dealing with images; often they don’t have everything we need so we need to install them ourselves but there are passwords that are unknown. One way to overcome this is to enter the container as sudo so that you can modify the container as needed.

Type exit or ctrl+d to exit then run:

Defining the user as root instead of jovyan will enter the container as su so you can proceed as normal.

Modifying the container in this manner is ideal for testing purposes. However, what if you want to always have certain packages or configurations every time you run the container? What if you want to publish the image exactly as it is in the current instance? Thankfully there are configuration files for this!

Dockerfiles

EVERY docker image must contain a Dockerfile in order to build an image. You didn’t see one when we pulled from because that image was already built – a requirement for pushing to Docker Hub. Dockerfiles, in this instance, are useful for defining a base image and then modifying it as you deem fit.

The documentation is quite thorough and can be referenced here. In this tutorial we will simply configure the dockerfile to include ping upon starting the image.

Let’s start by creating a brand new directory on your local machine: mkdir jupyter. Enter this directory and create a Dockerfile: touch DockerfileNOTE: Every Dockerfile must be named as Dockerfile (case sensitive), no exceptions.

Now, consider the Dockerfile as a layered configuration that describes what exactly you want in your environment, while layering meaning that every subsequent instruction is applied to the previous instructions (called stacking). There is a specific format to adhere to and you can reference the best practices for further details and use cases.

Consider the following Dockerfile:

Every instruction follows the same format: INSTRUCTION arguments. So COPY is the instruction and . and /app are  the arguments.

Building an Image

Once your Dockerfile is ready to go, you’re ready to build your modified image! To do so, docker provides a build command that looks for a Dockerfile in the specified directory. It is also best a good practice to tag each build if you plan on keeping these images around and/or distributing/maintaining (think of this as versioning software).

This may take some time to build as it has to download the image from the repository and then run the set of build instructions. Upon successful build you can then run the image using the docker run command.

Adding Data to an Image

Being the expert data analysts y’all are, it’s common that we’ll have lots of data we need to work with. Adding it to the image sounds easy, right? It’s the COPY command you learned. Well it depends; it is named COPY because it does just that, so for smaller files this is fine (think scripts, small csv files, etc.). However, what happens if you’re analyzing sequence data and the files are > 20GB? It takes forever to copy outside of docker!

Behold volumes. Volumes allow a particular directory or directories to be accessed via the docker container without copying the entire dataset into the container itself. The only real drawback is that the content within the mounted directories is not packages with the image, though the data are persisted, meaning that if the instance is killed the data will be accessible upon a restart.

Let’s give it a shot!

Notice how there is only the work directory in the login page of your Jupyter notebook. Let’s add a test file and see if we can get it to show up in the notebook.

Kill the container then create a file named test in your current directory. Using the same docker run command, add the volume command:

-v is the argument for adding a volume to the container, with the syntax being host_directory:container_directory. This command adds everything from the local current working directory to the container’s directory. Now visit the work directory in your notebook:

Test Your Skills!

Now that you’ve went through the basics of how to get up and running with Docker see if you can set up a custom container yourself:

  1. Add the ping command to the Jupyter image you downloaded at the beginning of this tutorial
  2. Build the image
  3. See if ping works

 

Leave a Reply

Your email address will not be published. Required fields are marked *