Planning and Designing Your Docker Image Hierarchy
Create your Docker image with robust design and image hierarchy in mind.
Join the DZone community and get the full member experience.
Join For FreeOver the past few years, I’ve had the need to create Docker images for various applications/microservices. In my case, the most predominant use cases were Java and Python. Of course, there are a myriad of Java and Python images available on Docker Hub, and often these images can be used as a good basis for applications. However, as time goes on, I found myself having to manage multiple applications requiring different versions of these platforms.
It’s easy enough to reuse and customize a base image, but it tends to lead to repetitive work in configuration. Over time, like many open source projects, Docker Hub images evolve. That is to be expected. However, challenges arise when that change breaks the way dependent images are assembled or required versions aren’t available. In addition, I have also worked in environments where regulatory considerations were a priority and pulling images from Docker Hub tended to generate questions regarding the source and assurances of meeting specified guidelines. Anyone whose been involved in compliance and security audits understands that the use of external resources always requires some type of additional validation demonstrating the authenticity of the source. It can be done, but, with audits, the fewer questions raised are and the more straightforward the process.
You may also enjoy: Docker Images and Containers
Given these caveats, I decided to build a set of base images that can be leveraged by applications within our organization. Building your own Docker images provides a level of independence, control and security. It also helps in establishing Docker image patterns and practices within an organization. Plus, it’s not difficult to do at all. The following presents a few examples. As is the case when presenting any type of code, there are always variations and possible improvements. The primary concept here is the presentation of a reproducible pattern than can be applied in many organizations in building their Docker image catalogs.
An Image Hierarchy
Establishing a well-thought-out Docker image hierarchy is time well spent. Apart from certain special cases, a Docker image inherits from another Docker image. The most common scenario is a Docker image inheriting from a base operating system image. It can be thought of like a class hierarchy in an object-oriented language. A Docker image inheriting or “extending” from another Docker image is capable of all functionality packaged in the inherited image. Also, similarly, A Docker image inheriting from another Docker image can replace or “override” functionality packaged in the base image.
The benefits of leveraging inheritance with your Docker images are similar to the notion of object-oriented inheritance.
The benefits of leveraging inheritance with your Docker images are similar to the notion of object-oriented inheritance.
- Reuse – Functionality added to the base image is available to all inherited images
- Extension – Additional capabilities can be added to the image while maintaining inherited functionality
- Overriding – Functionality in a base image can be replaced
- Structure – The base image file system layout is the same across all inherited images
- Maturity – As the base images evolve, so do the inherited images. Addressing security vulnerabilities is a prime example
The following image presents a sample image hierarchy:
This diagram may appear a bit involved, but whether you’re using any of these technologies or others, most larger applications are comprised of multiple frameworks – especially when deployed as Docker containers in a microservice architecture. Even if your application is using a single language platform and a single framework for that language, your application deployments will benefit significantly from the use of even the most basic hierarchy.
Starting With the Operating System
Docker images are based on the file system layouts of various Linux distributions. (Docker images can also be built for Microsoft Windows, but for the purpose of this conversation, Linux will be used.) Organizations providing Linux distributions continue to evolve their Docker images to be more Docker or cloud-native friendly. This essentially means that organizations are providing base images that are compact and secure; only containing the essentials. Reading through various blogs and articles will present various perspectives on Docker images and their size. However, one aspect of image size on which there’s widespread acknowledgment is that of attack surface. Smaller images generally mean fewer files. Fewer files result in fewer opportunities for malicious behavior to occur. Most Linux distributions provide images considered to be cloud-native friendly.
As mentioned, there are various official and certified Linux distributions. The most commonplace to see what’s available is to visit Docker Hub. Specifically, to see a list of official and certified Linux distributions, click here.
At the base of the image, hierarchy is an operating system image extending from the standard Alpine Linux distribution. Alpine Linux has built a reputation in the Docker community. It's non-commercial, resource-efficient and secure.
Extending a Linux distribution allows you to customize the distribution to meet the needs of your organization. The example below demonstrates a few modifications that might be made to a standard operating system image. The Dockerfile is available on GitHub.
FROM alpine:3.10.2
LABEL relenteny.repository.url=https://github.com/relenteny/alpine
LABEL relenteny.repository.tag=3.10.2
LABEL relenteny.alpine.version=3.10.2
RUN set -x && \
addgroup -g 1000 -S alpine && \
adduser -u 1000 -G alpine -h /home/alpine -D alpine && \
apk add --no-cache curl bind-tools
USER alpine
WORKDIR /home/alpine
ENTRYPOINT ["/bin/sh", "-l"]
CMD []
This example is basic and straightforward:
- It creates a username alpine and defines it to be the user under which instantiated containers will run. As configured, the alpine user is unable to run as a privileged user. This is an important aspect in securing Docker images. The directory in which instantiated containers will start is set to the alpine user’s home directory.
- As an example, curl and bind-tools are added to the base image. When adding additional packages, there’s a balance between packages that will commonly be used by subsequent images versus increasing the size, and attack surface, of the base image.
- Upon instantiation, containers will simply start a login shell. The
CMD []
directive is not necessary here, but definingENTRYPOINT
andCMD
in a Dockerfile, provides a level of documentation by specifying the complete invocation definition. -
LABEL
s can serve many purposes. They are purely informational but can be useful when tracking images through development and deployment processes.
Example: Supporting Python Versions
Regardless of runtime platform, the question of platform version arises. Over time, it’s to be expected that different applications and services will be using different, or specific, versions of a runtime platform. Python applications are no different.
Operating system distributions will support a subset of runtime platform versions. However, depending on the operating system distribution to provide specific versions of runtime platforms creates a dependency on operating system version that, over time, can get difficult to manage. In keeping with a common Docker image heritage, runtime platforms should be built using the same parent image.
Fortunately, most versions of Linux support a wide range of application runtime platform versions. The details that need to be resolved are designing a pattern whereby multiple versions of runtime platforms can be quickly and easily be assembled while maintaining the image hierarchy.
Runtime platforms often have multiple ways in which they can be installed and configured. For the Python runtime images, my choice was to leverage pyenv. pyenv is typically used to manage multiple versions of Python that are installed on a single system, but pyenv also does a very nice job of installing and configuring specific versions of Python for general use.
Extending the operating system image just discussed, the next Docker image in the Python image hierarchy is one that installs and configures pyenv. The Dockerfile is available on GitHub.
FROM relenteny/alpine:3.10.2
LABEL relenteny.repository.url=https://github.com/relenteny/python-pyenv
LABEL relenteny.repository.tag=1.2.14
LABEL relenteny.pyenv.version=1.2.14
LABEL relenteny.pyenv.virtualenv.version=1.1.5
COPY build /opt/build
USER root
RUN set -x && \
apk add --no-cache git bash build-base libffi-dev openssl-dev bzip2-dev zlib-dev readline-dev sqlite-dev && \
cp -r /opt/build/home/alpine/* /home/alpine && \
chmod +x /home/alpine/bin/*.sh && \
chown -R alpine.alpine /home/alpine && \
rm -rf /opt/build
USER alpine
RUN set -x && \
cd /home/alpine && \
git clone https://github.com/pyenv/pyenv.git /home/alpine/.pyenv && \
cd /home/alpine/.pyenv && \
git branch pyenv-1.2.14 v1.2.14 && \
git checkout pyenv-1.2.14 && \
cd /home/alpine && \
git clone https://github.com/pyenv/pyenv-virtualenv.git /home/alpine/.pyenv/plugins/pyenv-virtualenv && \
cd /home/alpine/.pyenv/plugins/pyenv-virtualenv && \
git branch virtualenv-1.1.5 v1.1.5 && \
git checkout virtualenv-1.1.5 && \
cd /home/alpine && \
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> /home/alpine/.profile && \
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> /home/alpine/.profile && \
echo 'eval "$(pyenv init -)"' >> /home/alpine/.profile && \
echo 'eval "$(pyenv virtualenv-init -)"' >> /home/alpine/.profile
This Dockerfile follows the documented instructions for installing pyenv. There are also a set of scripts installed that will be discussed below. The following are some notes on this Dockerfile:
- The pyenv installation guidelines are available here. The required packages for an Alpine Linux installation are documented on the pyenv Wiki page.
- Specifics on installing the pyenv-virtualenv plugin are available here.
- Both pyenv and pyenv-virtualenv are locked at specific versions to ensure consistency with downstream images. For cleaner approach than simply checking out at the version tags, local branches are created at version tags.
- The alpine user’s .profile is updated to support the pyenv environment.
The COPY
directive adds files from the source build directory into image directory /opt/build. Over the past several years, this is a pattern I’ve used to copy files into the image. The source build directory contains one or more subdirectories indicating the destination of the source files in the image being built. For this image, the source build directory has a subdirectory structure of home/alpine/bin indicating that files in this directory are to be placed into the image directory /home/alpine/bin.
The files added are convenience scripts available for subsequent images:
-
install-python.sh
is used to configure a specific version of Python. The intent is that a new image would be created from the resulting execution of this script. For example, the Dockerfile snippet below will build an image with Python 3.7.4 installed and the default Python installation set for the built-in user alpine.
FROM relenteny/pyenv:1.2.14
RUN /home/alpine/bin/install-python.sh 3.7.4
Typically, although not required, the resultant image would be stored in a registry as a generic Python runtime image. Using the example above, an example image name and tag could be myregistry/python:3.7.4.
install-requirements.sh - is a convenience script that will install additional Python modules as specified through a requirements file, by convention, named requirements.txt. This script can only be executed once a version of Python has been installed. When invoked, the path to the text file containing the modules to be installed must be provided as an argument as follows:
/home/alpine/bin/install-requirments.sh /home/alpine/requirements.txt
Here requirements.txt is specified as being in the home directory of the built-in user, alpine. This, of course, is dependent on how the actual Dockerfile is written. An example of its usage is below.
Building a Standard-Specific Python Image
As discussed in the previous section, the pyenv image is designed to have subsequent Python images built. With a standard operating system image configured, and an image that will install and configured versions of Python, it becomes very easy to build Python images that can be used for multiple purposes.
The next image in our hierarchy is the installation of Python 3.7.4. The Dockerfile is available on GitHub.
FROM relenteny/pyenv:1.2.14
LABEL relenteny.repository.url=https://github.com/relenteny/python
LABEL relenteny.repository.tag=3.7.4
LABEL relenteny.python.version=3.7.4
RUN /home/alpine/bin/install-python.sh 3.7.4
That’s all there’s to it. Creating new versions of Python images with common ancestry becomes a matter of a few simple directives in a Dockerfile.
Adding an Application Framework
It’s rare with any language platform that applications are built from the ground up only using the constructs of the base platform. It's common to add additional frameworks and/or tooling on which applications are built.
The example below demonstrates installing the popular Python web framework, Flask. The Dockerfile is available on GitHub.
FROM relenteny/python:3.7.4
LABEL relenteny.repository.url=https://github.com/relenteny/flask
LABEL relenteny.repository.tag=1.1.1
LABEL relenteny.flask.version=1.1.1
COPY build /opt/build
USER root
RUN set -x && \
cp -r /opt/build/home/alpine/* /home/alpine && \
chown -R alpine.alpine /home/alpine/* && \
rm -rf /opt/build
USER alpine
RUN set -x && \
cd /home/alpine && \
bin/install-requirements.sh /home/alpine/requirements.txt && \
rm /home/alpine/requirements.txt
Other than the LABEL
directives, there’s nothing in this Dockerfile that indicates Flask is being installed. This is a repeatable pattern. Whether you’re using Flask, Django, TensorFlow, Ansible, or any of the many Python frameworks, add a requirements.txt, invoke install-requirements.sh, perform any final customizations, and you have Docker images with shared ancestry that can be shared across an organization.
One note on the COPY
pattern used here. The contents of the source build directory are copied to /opt/build within the image. In addition, the COPY
is not specifying any UID, so the implied user is root. This could be simplified for this image by copying the contents of the source build directory to /home/alpine using the --chown
option to COPY
. That eliminates the need for user switching between root and alpine. As mentioned earlier, I find this pattern useful. It’s of most use for more complex image build procedures where files may need to be installed in various file system locations (e.g. /etc) where the user must be root to perform the operation. So, even though it’s not required here, it remains as an established pattern for my images.
Finally, an Application
We’ve now reached the point in this image hierarchy where we can build an image that contains an actual application. Well, at least a simple example, that is. It may seem that we took the long way to get to this point. While I do hope you appreciate the value of laying out a hierarchy of images for reuse across an organization, there’s an additional benefit of getting to this point. Docker image builds can take some time. Whether it be on a developer’s workstation or in a CI/CD pipeline, images that take even a few minutes to build can impede productivity, or, at the very least, be frustrating as team members wait for their test images to build. By inheriting images where most of the assembly work has been accomplished, the final image build step of laying down and configuring application components should literally take seconds.
Using Miguel Grinberg’s The Flask Mega-Tutorial Part I: Hello, World!, the following image build clones the GitHub repo for the tutorial and configures it to execute as a Docker image. The Dockerfile is available on GitHub.
FROM relenteny/flask:1.1.1
LABEL relenteny.repository.url=https://github.com/relenteny/flask-helloworld
LABEL relenteny.repository.tag=1.0.0
RUN set -x && \
cd /home/alpine && \
git clone https://github.com/miguelgrinberg/microblog.git && \
cd microblog && \
git checkout v0.1 && \
echo "FLASK_APP=microblog.py" > .flaskenv
WORKDIR /home/alpine/microblog
ENTRYPOINT [ "/bin/sh", "-lc", "flask run --host 0.0.0.0"]
If you build this image and tag it with a name such as myrepository/flask-helloworld:1.0.0, a container can be instantiated using the command docker run -it -p 5000:5000 myrepository/flask-helloworld:1.0.0
. Using curl or a browser, the URL http://localhost:5000 or http://localhost:5000/index with respond with “Hello World.”
That’s a Wrap
It took some time to get here, but hopefully, the value of a clear Docker image ancestry tree is apparent. Building Docker images is no different than the code contained within the images. The process requires the same concepts as applications themselves including principles, patterns, best practices, security and regulatory compliance, traceability, version control, and so on. Adhering to this or similar patterns becomes a key aspect in consistent and mature containerized application deployment processes.
Further Reading
Opinions expressed by DZone contributors are their own.
Comments