Docker images are a major source of leaked secrets.
Security researchers at RWTH Aachen University sampled nearly 400,000 images on DockerHub and found 9% leaked unverified secrets. That meant attackers could publicly access 52,107 private keys and 3,158 API secrets.
Redhunt Labs analyzed millions of public Dockerfiles and discovered 46,076 of them exposed sensitive information.
These aren’t just statistics either. The extensive supply chain attack against CodeCov users in 2021 started with a threat actor extracting a secret from a public CodeCov docker image.
To illustrate the challenges associated with managing secrets in Docker, let’s scan this test Docker image using our open-source secret scanning tool TruffleHog.
trufflehog docker --image=ghcr.io/trufflesecurity/node-app-with-canary-token:main --only-verified
Scanning a Docker Image with TruffleHog
TruffleHog’s Docker detection engine found an AWS key in the
/app/.env file. That’s a pretty common finding.
exec into our running Docker container, and cat the file.
The .env File Does Not Exist in the Final Image
What’s going on? The
.env file does not exist in the final image, so how did TruffleHog find that AWS key?
To understand how we can find a valid secret in a seemingly non-existent file, we need to understand more about docker layers and how secrets get leaked.
Insecurely Using Secrets in Docker
Complexities in Docker’s layered architecture and caching has created an environment where developers can easily unintentionally leak secrets. Most Docker secret leaks fall into 3 categories: overly permissive file operations, hardcoded secrets in Dockerfiles, and the misuse of build arguments.
According to the RWTH Aachen University study, misunderstood/misconfigured file operations cause most Docker image secret leaks. As an example, let’s review a pretty standard Dockerfile for a Node application.
We declare Linux alpine as the base image, copy the current host directory (our source code) into the Docker image, install Node dependencies with yarn, and then run the Node application on port 3000. Do you see the vulnerability? Despite countless tutorials and Docker’s own “Best Practices” page suggesting developers run
COPY . ., this command can leak sensitive files.
COPY command in our
Dockerfile copies the entire current directory into our Docker image. As a result, sensitive files like
.env and our Git history live forever in the Docker image. Although copying every file in the current directory is convenient—and fairly common for Docker images—it can be dangerous.
For perspective, an advanced GitHub search revealed 289,000 Dockerfiles containing the term “COPY . .”. Not all Docker images built off of these repositories are vulnerable, but undoubtedly many are.
COPY then Delete?
A logical next thought is to delete any sensitive files brought into the image from the
COPY command. Unfortunately, that still does not prevent the secrets in the
.env file from leaking. Docker caches the output of each command into its own layer; if step 1 copies all source code files (including .env) and step 2 deletes the
.env file, step 1’s layer will still contain the contents of the
To visualize this, we’ll make a small adjustment (add a step:
RUN rm .env) to our
Dockerfile above and then load it into a tool called Dive to explore the Docker layers.*
Note: Follow along with this Docker image.
After building the Docker image, we load it into Dive with the following command:
On the left-hand side of the Dive terminal screen, we view each layer in the Docker image. The right-hand side shows the files that were deleted, added or modified during that step in the
Highlighting the layer
COPY . . reveals that we added in the
.env file and the
.git/ directory in this step (both containing sensitive information).
Unpacking the COPY . . Layer
The next layer
RUN /bin/sh -c rm .env shows we deleted the
Unpacking the rm .env Layer
Despite deleting the
.env file, the previous layer still contains the contents of the
.env file. To verify this, we’ll attempt to extract the
.env file from the
COPY . . layer.
First, in Dive, highlight the
COPY . . layer once more and take note of the Id value.
Accessing the Layer’s Id Value in Dive
Second, save the Docker image as a tar file:
docker save <image_name> -o vulnerable.tar
> Note: We recommend creating a new folder to run the next command, since it will create a lot of files.
Third, untar the saved Docker image:
tar xvf vulnerable.tar
Fourth, untar the
layer.tar file corresponding to the Layer Id above:
tar xvf 9e006c89ce028192e631139eccaf2d2bfab4ac12164481390f1b38aa99abe855/layer.tar
cat out the contents of the
Secret Accessible in .env File Hidden in Docker Image Layer
*For an excellent article on secrets hiding in Docker layers and how you can use Dive to access them, please see this post by Dana Epp.
What about .gitignore?
What if there was a
.gitignore file that excluded
.env, among other sensitive files, from committing to Git? Unfortunately, Docker uses a different ignore file called
.dockerignore and treats
.gitignore as a regular file. Unless
.dockerignore enumerates files to exclude, secrets in
.env and similar files will end up in the Docker image.
Fortunately, thousands of developers (at least on GitHub) know to add
.git) to the
GitHub Search for .dockerignore Files Containing .env
> Note for NPM users: By default, NPM will ignore all files enumerated in a
.gitignore file (unlike Docker). However, if you create a
.gitignore file is no longer used to block files and directories from being published. See official documentation for more details.
The situation becomes more complicated when developers want to intentionally expose secrets to the Docker build process. Suppose there is an internal dependency for the node project. In addition, the dependency is in the developer’s private npm namespace—meaning that they need to authenticate with npm to install it. Somewhere in the
Dockerfile, they’ll have to use npm credentials.
Let’s add one line to the original Dockerfile to configure NPM credentials:
RUN npm set…
Unfortunately, hardcoding the secret into a
Dockerfile is extremely insecure. Attackers can download the image and then list all build commands using the command
docker image history --no-trunc.
Exposed Secret from Hardcoded String in Dockerfile
As an alternative to hardcoding credentials inside a
Dockerfile, many developers use build arguments. While this option might feel more secure, it is not. Docker will store any build argument passed into the Docker image in its history. Attackers can then browse the Docker image history using the
docker image history --no-trunc command.
Consider the example
Dockerfile below. We removed the hardcoded
NPM_TOKEN value from above and replaced it with a build argument.
Exposed NPM Token with Docker Build Arguments
docker image history revealed the token. An attacker can easily access the values of any secret passed in via a build argument.
Securely Using Secrets in Docker Images
Docker provides developers with two primary methods to securely pass secrets into images.
The most secure way to use secrets in a Docker image is with a multi-stage build. In a multi-stage build, developers create multiple stages, which can be based off of different base images and isolated from other stages. Developers must choose what (if any) content from one stage is accessible in another stage. The final stage is the only code that is published in the final image. This architecture helps secure secrets by providing developers with the ability to “selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image”
Consider the following
Note: Consider using the “scratch” container when building super minimal images. The scratch container is empty and works well for the final stage in a multi-stage build where the goal is to just run a single binary (like a golang binary)
There are two stages: the first authenticates to npm, installs dependencies, and builds a package; and the second (final stage) copies the built package from the first stage and sets it as the container command.
Notice the build argument in the first stage (
ARG NPM_TOKEN). While this method can be insecure (as discussed above), the
NPM_TOKEN value is not passed into the second (final) stage. Docker only caches the final image. In other words, it won’t cache the build arguments in the first stage—unless they are used in the last stage as well.
Multi-stage builds are both secure and compact. The final image won’t contain the source code, the first stage’s layers, or the sensitive npm token. The latter guarantee is so strong that some insecure approaches above (copying a secrets file or specifying build arguments) can be used securely—as long as they aren’t in the last stage. Note that hard coding credentials is never recommended.
The new default Docker Engine builder backend (version 23.0) is called BuildKit. Among other improvements, BuildKit provides developers with a way to inject sensitive information. Users can mount secret files without adding them into the final image. There are two steps to use BuildKit’s secrets feature:
First, in the
Dockerfile, developers must mount the secret, so that resources in the RUN command can use the secret information.
Here’s an example from the Official documentation:
Dockerfile, the AWS command line interface is installed on a Python image. Then the AWS S3 Bucket copy command is run. This command by default looks for credentials located at
~/.aws/credentials. The developer used BuildKit to mount AWS credentials directly to the
/root/.aws/credentials directory, so that the AWS command can run authenticated.
Second, during the build process, users must pass in a
--secret argument, which requires an id and source path to the secret file.
Continuing with the example AWS S3 example above, users would run this command to securely inject their AWS credentials:
docker build --secret id=aws,src=$HOME/.aws/credentials .
> Note: BuildKit secrets default to mounting at the path
Unfortunately, BuildKit secrets have one major pitfall. If the application that uses the injected secret logs that secret into another file, the public image would still contain that secret. For example, authenticating with
npm set stores the token in
/root/.npmrc. Despite attempts to securely use an
NPM_TOKEN with BuildKit, an attacker could still access the secret in the
.npmrc file. Conversely, multi-stage builds don’t have the same problem. As long as the
.npmrc file is not in the last stage, it won’t be cached.