tl;dr Scroll to the bottom for a script that scans for secrets within every tag and architecture of a Docker image.
In the past, we’ve explained how secrets leak out of docker images: overly permissive file operations, hardcoded secrets in Dockerfiles, and misusing build arguments. While working on new research into secrets leakage, our team developed a script to more thoroughly scan Docker images for secrets using TruffleHog. By open-sourcing our script, we’re hoping to help code owners answer two questions:
"Do any of my tagged Docker images leak secrets?"
"What about secrets in images with multiple architectures?"
How TruffleHog Scans a Docker Image
Docker images are just fancy archive files containing “layers” that are stacked on top of each other. Each layer represents a change in the filesystem from the previous layer, such as adding, modifying, or deleting files.
When you run TruffleHog to scan for secrets in Docker images, it will automatically pull down the image, and then unarchive each layer to search for secrets.
As an example, the following command scans a sample image that we uploaded containing a live AWS canary token.
You’ll see output like this:
TruffleHog lets you know which layer and which file within that layer the secret was found.
Docker Image Tags
A Docker image tag is a label applied to a Docker image in a repository that identifies a specific version of the image. Tags are used to reference different versions of the same image, such as different versions of a software application or different builds from the same codebase. Secrets can exist in any layer in any tagged version.
The trufflesecurity/secrets
repository we referenced has one tagged version on DockerHub.
Many (most?) repositories have more than one tag; sometimes a repository will have thousands of tags. Additionally, image creators can provide images for multiple platforms, such as linux/amd64, linux/386, linux/arm64
and more.
What this means is that each tag, and each architecture version within that tag, could contain slightly different data (and secrets). Docker provides each platform-specific variant with a unique sha256 hash, called a manifest digest.
Within TruffleHog, users can supply a manifest digest to scan platform-specific variants of a tagged docker image. For example, the following command will scan Alpine’s Linux/386
build from the 20240329
tag:
By default, TruffleHog scans the amd64/linux
version of the tag you pass into the --image flag. That’s because TruffleHog uses Google’s Go-ContainerRegistry library and that’s their default.
So, if you run the following command to scan Alpine’s latest version, you’ll scan the amd64/linux
version.
Note: If there is no amd64/linux
version, then TruffleHog will scan another version.
For most use-cases, scanning one architecture version is sufficient, given the significant code-overlap across multi-platform builds. However, that’s not always the case.
Scan All Tags/Architectures of a Docker Image
During our research, we wanted to scan everything: all tags and all architectures for a particular image. So, we put together a Python script that enumerates all relevant image manifest digests and then passes each one into TruffleHog’s docker
command.
The code is rather straightforward. First, we query DockerHub’s API for a list of all tags. Second, we iterate through all tags and extract each architecture’s manifest digest (the sha256 hash). Finally, we pass the manifest digests one-by-one to an OS command that invokes TruffleHog.
To run this script, simply replace “trufflesecurity/secrets” with the name of Docker image you want to scan.
For our research, we added concurrency, robust error handling mechanisms (since it’s very easy to max out your RAM when you’re scanning many docker images at once), and ported the entire script to golang. But we’ll leave those additions as an exercise for the reader.
If you think this should become a part of TruffleHog, we encourage you to open a PR! If you’d like to collaborate with someone on our research team, we’d be happy to work together.