Docker is a powerful tool, and when used with Python, can effectively streamline software delivery. However, it must be used with care and precision. Let’s explore how to best to optimize these images and navigate common stumbling blocks.
Joel Burch
COO
In software development and deployment, Docker emerged as a paradigm-shifting tool. It offers engineers a lightweight, reproducible, and portable environment for running applications. This powerful tool, particularly when used with Python, has streamlined the process of software delivery, making it more efficient and reliable. However, like any powerful tool, Docker must be used with care and precision. Inefficient use, such as creating large images and the resulting containers, can lead to slower deployment times, increased bandwidth usage, and even potential security vulnerabilities. In this article, we explore Python-based Docker images, providing a guide on how to optimize those images, helping to streamline the software development and deployment process.
If you are familiar with Docker, take the quiz proposed by one of our SRE engineers, Lucy Linder (@derlin).
Docker is a revolutionary tool; it enables developers to package applications into containers—standardized executable components that combine application source code with the operating system (OS) libraries and dependencies required to run that code in any environment. However, when dealing with Python applications, Docker images can often be unoptimized. This can lead to bloated images that consume unnecessary resources. Consider the following example of an unoptimized image containing a Flask application. The application is simple; consisting of a single file, app.py
, with several routes that return data when called:
from flask import Flask
app = Flask(__name__)
@app.route('/')
def home():
return 'Welcome to the Home Page!'
@app.route('/about')
def about():
return 'About Page - This is a basic Flask app with three routes.'
@app.route('/contact')
def contact():
return 'Contact Page - You can contact us at info@example.com.'
if name == '__main__':
app.run(debug=True)
A possible Dockerfile is as follows:
FROM python:3.11
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python", "app.py"]
Another approach:
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
WORKDIR /app
COPY . /app
RUN pip3 install -r requirements.txt
CMD ["python3", "app.py"]
While these Dockerfiles will certainly work, they are far from optimized. They use base images that are larger than necessary, and they don't take advantage of Docker's layer caching, which can speed up build times and reduce the size of the final image. Large images will increase build times, as well as potentially increase network transit costs if the image repository is hosted in a cloud platform. Additionally, all of the files in the build directory are being copied over, resulting in larger images (and potential security issues if sensitive data is exposed).
To benchmark the Docker images, three basic metrics can be used:
Image size
Build time
Runtime performance
Runtime performance of the application may not be perceptibly affected as it’s a simplistic environment with minimal network latency (vs. a distributed cloud environment). However, runtime performance costs of a heavier image may not be trivial. It should also be noted that Docker will often intelligently apply caching for repeated builds, which will significantly shorten build times on development machines. However, in build environments like CI runners, there is almost never an existing image cache available. To simulate this, docker build
will be run with the -–no-cache
option.
For the sake of simplicity, this article will focus on the first two metrics, as they are most relevant to image optimization. The testing methodology is as follows:
System: x86_64 Macbook Pro
Environment: Fresh Docker Desktop install, all cached images, containers, and layers are purged prior to the build with docker system prune
Data Collection: Build time is collected using the time
unix program. Image size is collected from the Docker environment using the command docker images
Image size: 1.09GB
Build time: 50.276s
Image size: 541MB
Build time: 1:11.7m
Although the Ubuntu base image actually came in at nearly half the image size, it took ~20 more seconds to complete the build, likely owing to the Dockerfile command to update the base OS. In either case there are definitely some optimization issues.
There are a variety of strategies that can be employed to help optimize Docker images. Remember that the primary goal of optimizing an image is to reduce its size while minimally compromising on performance or security.
The base image is the foundation upon which a Docker image is built. By choosing a smaller base image, users can significantly reduce the size of the final Docker image. For Python applications, the official Python Docker image is a common choice. However, these images can be quite large. An alternative is to use the Alpine-based Python image, which is significantly smaller:
FROM python:3.11-alpine
Multi-stage builds allow the usage of multiple FROM
statements in a Dockerfile. Each FROM
statement can use a different base image, and only the final stage will be kept. This is particularly useful when an application requires build-time dependencies that are not needed at runtime. Here's an example:
# First stage: build
FROM python:3.11-alpine as builder
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip && pip install -r requirements.txt
# Second stage: runtime
FROM python:3.11-alpine
COPY --from=builder /app /app
WORKDIR /app
Each command in a Dockerfile creates a new layer in the Docker image. By chaining commands together using the &&
operator, the number of layers and overall image size can be reduced:
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
Temporary files created during the build process can take up a significant amount of space. By removing these files in the same layer they are created, they will be prevented from becoming part of the final Docker image:
RUN pip install --upgrade pip && pip install -r requirements.txt && rm -rf ~/.cache/pip
The ADD
and COPY
commands can add unnecessary files to a Docker image if not used carefully. Be selective about what is added to the Docker image and consider using a .dockerignore
file to exclude unnecessary files and directories. Here's an example of a .dockerignore
file:
.git
__pycache__
*.pyc
*.pyo
*.pyd
Docker uses a build cache to speed up image builds by reusing layers from previous builds. By carefully ordering Dockerfile commands, users can take full advantage of the build cache. Commands that change frequently should be placed towards the end of the Dockerfile, while commands that rarely change should be placed at the beginning. Here's an example:
# These commands rarely change, so they are placed at the beginning
FROM python:3.11-alpine
WORKDIR /app
# These commands change frequently, so they are placed at the end
COPY . /app
RUN pip install --upgrade pip && \
pip install -r requirements.txt && \
rm -rf ~/.cache/pip
Not all of these strategies are going to be relevant for the basic test cases used for this article. But, as applications and Docker builds grow more complex, they can be employed to achieve meaningful reduction in image size and build time. The next section will look at improvements using some of these strategies for the previous unoptimized examples.
The optimized Dockerfile will include some of the strategies described above. Here is the complete file with optimizations:
# Smaller Alpine image
FROM python:3.11-alpine
WORKDIR /app
# Only copying needed files
COPY requirements.txt .
# Multiple commands in a single RUN invocation
RUN pip install --upgrade pip && pip install -r requirements.txt && rm -rf ~/.cache/pip
COPY . /app/app.py
CMD ["python", "app.py"]
A Docker ignore file will also be employed with the same entries described in the previous section.
Image size: 128MB
Build time: 28.436s
The optimized image resulted in a 50% reduction in build time and an image size that is at least 5x smaller! This is a simplistic example, but it highlights the optimizations that can be had with some easy-to-implement configuration changes.
The return on investment (ROI) of making builds more efficient multiplies as the number of developers and applications grow. By keeping Docker images lean, software organizations will enjoy faster deployments, save on bandwidth and storage, and reduce potential security risks.
Docker image optimization is not just a good practice — it's a necessity for efficient software development and deployment. By understanding and applying these principles, developers can significantly improve their Docker usage, leading to faster, more efficient, and more secure software delivery.
If you are building Docker images using Github Actions and want to learn some of the best practices, have a look at this article by our SRE Engineer Lucy Linder (@derlin).
Interested to learn more about how Python can be used with the Divio PaaS? Reach out now!
Don't forget to join our LinkedIn and X/Twitter community. Access exclusive insights and be the first to know about our latest blog posts.
Cloud Management / Developer Tools / Development
Why Dockerize Your Digital Applications?
If you work anywhere remotely IT-adjacent, it's extremely unlikely you won't have at least some familiarity with Docker, or have heard somebody discussing “dockerizing” an application, but what does this mean, and why would it be a good thing?