Joe Leon

THE DIG

September 11, 2023

How Secrets Leak out of Docker Images

How Secrets Leak out of Docker Images

Joe Leon

September 11, 2023

Docker images are a major source of leaked secrets. 

Security researchers at RWTH Aachen University sampled nearly 400,000 images on DockerHub and found 9% leaked unverified secrets. That meant attackers could publicly access 52,107 private keys and 3,158 API secrets.

Redhunt Labs analyzed millions of public Dockerfiles and discovered 46,076 of them exposed sensitive information. 

These aren’t just statistics either. The extensive supply chain attack against CodeCov users in 2021 started with a threat actor extracting a secret from a public CodeCov docker image.

To illustrate the challenges associated with managing secrets in Docker, let’s scan this test Docker image using our open-source secret scanning tool TruffleHog.

trufflehog docker --image=ghcr.io/trufflesecurity/node-app-with-canary-token:main --only-verified

Scanning a Docker Image with TruffleHog

TruffleHog’s Docker detection engine found an AWS key in the /app/.env file. That’s a pretty common finding. 

Let’s, exec into our running Docker container, and cat the file.

The .env File Does Not Exist in the Final Image

What’s going on? The .env file does not exist in the final image, so how did TruffleHog find that AWS key?

To understand how we can find a valid secret in a seemingly non-existent file, we need to understand more about docker layers and how secrets get leaked.

Insecurely Using Secrets in Docker

Complexities in Docker’s layered architecture and caching has created an environment where developers can easily unintentionally leak secrets. Most Docker secret leaks fall into 3 categories: overly permissive file operations, hardcoded secrets in Dockerfiles, and the misuse of build arguments. 

FILE OPERATIONS

According to the RWTH Aachen University study, misunderstood/misconfigured file operations cause most Docker image secret leaks. As an example, let’s review a pretty standard Dockerfile for a Node application.

FROM node:18-alpine
WORKDIR /app COPY . . 
RUN yarn install --production 
CMD ["node", "src/index.js"] 
EXPOSE 3000

We declare Linux alpine as the base image, copy the current host directory (our source code) into the Docker image, install Node dependencies with yarn, and then run the Node application on port 3000. Do you see the vulnerability? Despite countless tutorials and Docker’s own “Best Practices” page suggesting developers run COPY . ., this command can leak sensitive files.

The COPY command in our Dockerfile copies the entire current directory into our Docker image. As a result, sensitive files like .env and our Git history live forever in the Docker image. Although copying every file in the current directory is convenient—and fairly common for Docker images—it can be dangerous.

For perspective, an advanced GitHub search revealed 289,000 Dockerfiles containing the term “COPY . .”. Not all Docker images built off of these repositories are vulnerable, but undoubtedly many are.

COPY then Delete?

A logical next thought is to delete any sensitive files brought into the image from the COPY command. Unfortunately, that still does not prevent the secrets in the .env file from leaking. Docker caches the output of each command into its own layer; if step 1 copies all source code files (including .env) and step 2 deletes the .env file, step 1’s layer will still contain the contents of the .env file. 

To visualize this, we’ll make a small adjustment (add a step: RUN rm .env) to our Dockerfile above and then load it into a tool called Dive to explore the Docker layers.*

FROM node:18-alpine
WORKDIR /app
COPY . .
RUN yarn install --production
RUN rm .env # INSECURE
CMD ["node", "src/index.js"]
EXPOSE 3000

Note: Follow along with this Docker image.

After building the Docker image, we load it into Dive with the following command:

dive <image_name>

On the left-hand side of the Dive terminal screen, we view each layer in the Docker image. The right-hand side shows the files that were deleted, added or modified during that step in the Dockerfile.

Highlighting the layer COPY . . reveals that we added in the .env file and the .git/ directory in this step (both containing sensitive information).

Unpacking the COPY . . Layer

The next layer RUN /bin/sh -c rm .env shows we deleted the .env file.

Unpacking the rm .env Layer

Despite deleting the .env file, the previous layer still contains the contents of the .env file. To verify this, we’ll attempt to extract the .env file from the COPY . . layer.

First, in Dive, highlight the COPY . . layer once more and take note of the Id value.

Accessing the Layer’s Id Value in Dive

Second, save the Docker image as a tar file:

docker save <image_name> -o vulnerable.tar

> Note: We recommend creating a new folder to run the next command, since it will create a lot of files.

Third, untar the saved Docker image:

tar xvf vulnerable.tar

Fourth, untar the layer.tar file corresponding to the Layer Id above:

tar xvf 9e006c89ce028192e631139eccaf2d2bfab4ac12164481390f1b38aa99abe855/layer.tar

Finally, cat out the contents of the .env file.

Secret Accessible in .env File Hidden in Docker Image Layer

*For an excellent article on secrets hiding in Docker layers and how you can use Dive to access them, please see this post by Dana Epp.

What about .gitignore?

What if there was a .gitignore file that excluded .env, among other sensitive files, from committing to Git? Unfortunately, Docker uses a different ignore file called.dockerignore and treats .gitignore as a regular file.  Unless .dockerignore enumerates files to exclude, secrets in .env and similar files will end up in the Docker image.

Fortunately, thousands of developers (at least on GitHub) know to add .env (and .git) to the .dockerignore file.

GitHub Search for .dockerignore Files Containing .env

> Note for NPM users: By default, NPM will ignore all files enumerated in a .gitignore file (unlike Docker). However, if you create a .npmignore, the .gitignore file is no longer used to block files and directories from being published. See official documentation for more details.

DOCKERFILE

The situation becomes more complicated when developers want to intentionally expose secrets to the Docker build process. Suppose there is an internal dependency for the node project. In addition, the dependency is in the developer’s private npm namespace—meaning that they need to authenticate with npm to install it. Somewhere in the Dockerfile, they’ll have to use npm credentials.

Let’s add one line to the original Dockerfile to configure NPM credentials: RUN npm set…

FROM node:18-alpine
WORKDIR /app
COPY . .
RUN yarn install --production
RUN npm set "//registry.npmjs.org/:_authToken=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" # INSECURE
CMD ["node", "src/index.js"]
EXPOSE 3000

Unfortunately, hardcoding the secret into a Dockerfile is extremely insecure. Attackers can download the image and then list all build commands using the command docker image history --no-trunc.

Exposed Secret from Hardcoded String in Dockerfile

Build Arguments

As an alternative to hardcoding credentials inside a Dockerfile, many developers use build arguments. While this option might feel more secure, it is not. Docker will store any build argument passed into the Docker image in its history. Attackers can then browse the Docker image history using the docker image history --no-trunc command.

Consider the example Dockerfile below. We removed the hardcoded NPM_TOKEN value from above and replaced it with a build argument.

FROM node:18-alpine
WORKDIR /app
COPY . .
RUN yarn install --production
ARG NPM_TOKEN # INSECURE
RUN npm set "//registry.npmjs.org/:_authToken=$NPM_TOKEN" # INSECURE
CMD ["node", "src/index.js"]
EXPOSE 3000

Exposed NPM Token with Docker Build Arguments

Simply running docker image history revealed the token. An attacker can easily access the values of any secret passed in via a build argument.

Securely Using Secrets in Docker Images

Docker provides developers with two primary methods to securely pass secrets into images.

MULTI-STAGE BUILDS

The most secure way to use secrets in a Docker image is with a multi-stage build. In a multi-stage build, developers create multiple stages, which can be based off of different base images and isolated from other stages. Developers must choose what (if any) content from one stage is accessible in another stage. The final stage is the only code that is published in the final image. This architecture helps secure secrets by providing developers with the ability to “selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image

Consider the following Dockerfile:

FROM node:alpine AS builder
WORKDIR /src
COPY package.json index.js ./
ARG NPM_TOKEN
RUN npm set "//registry.npmjs.org/:_authToken=$NPM_TOKEN"
RUN npm install
RUN npm run build

FROM node:alpine
WORKDIR /dist
COPY --from=builder /src/dist/* ./
CMD ["node", "./dist/index.js"]

Note: Consider using the “scratch” container when building super minimal images. The scratch container is empty and works well for the final stage in a multi-stage build where the goal is to just run a single binary (like a golang binary)

There are two stages: the first authenticates to npm, installs dependencies, and builds a package; and the second (final stage) copies the built package from the first stage and sets it as the container command. 

Notice the build argument in the first stage (ARG NPM_TOKEN). While this method can be insecure (as discussed above), the NPM_TOKEN value is not passed into the second (final) stage. Docker only caches the final image. In other words, it won’t cache the build arguments in the first stage—unless they are used in the last stage as well.

Multi-stage builds are both secure and compact. The final image won’t contain the source code, the first stage’s layers, or the sensitive npm token. The latter guarantee is so strong that some insecure approaches above (copying a secrets file or specifying build arguments) can be used securely—as long as they aren’t in the last stage. Note that hard coding credentials is never recommended.

BUILDKIT

The new default Docker Engine builder backend (version 23.0) is called BuildKit. Among other improvements, BuildKit provides developers with a way to inject sensitive information. Users can mount secret files without adding them into the final image. There are two steps to use BuildKit’s secrets feature:

First, in the Dockerfile, developers must mount the secret, so that resources in the RUN command can use the secret information.

Here’s an example from the Official documentation:

# syntax=docker/dockerfile:1
FROM python:3
RUN pip install awscli
RUN --mount=type=secret,id=aws,target=/root/.aws/credentials \
  aws s3 cp s3://... …

In this Dockerfile, the AWS command line interface is installed on a Python image. Then the AWS S3 Bucket copy command is run. This command by default looks for credentials located at ~/.aws/credentials. The developer used BuildKit to mount AWS credentials directly to the /root/.aws/credentials directory, so that the AWS command can run authenticated.

Second, during the build process, users must pass in a  --secret argument, which requires an id and source path to the secret file.

Continuing with the example AWS S3 example above, users would run this command to securely inject their AWS credentials:

docker build --secret id=aws,src=$HOME/.aws/credentials .

> Note: BuildKit secrets default to mounting at the path /run/secrets/ + id.

Unfortunately, BuildKit secrets have one major pitfall. If the application that uses the injected secret logs that secret into another file, the public image would still contain that secret. For example, authenticating with npm set stores the token in /root/.npmrc. Despite attempts to securely use an NPM_TOKEN with BuildKit, an attacker could still access the secret in the .npmrc file. Conversely, multi-stage builds don’t have the same problem. As long as the .npmrc file is not in the last stage, it won’t be cached.