TRUFFLEHOG

COMPANY

RESOURCES

Joe Leon

THE DIG

April 4, 2024

Scan Every Tag and Architecture of a Docker Image for Secrets

Scan Every Tag and Architecture of a Docker Image for Secrets

Joe Leon

April 4, 2024

tl;dr Scroll to the bottom for a script that scans for secrets within every tag and architecture of a Docker image. 

In the past, we’ve explained how secrets leak out of docker images: overly permissive file operations, hardcoded secrets in Dockerfiles, and misusing build arguments. While working on new research into secrets leakage, our team developed a script to more thoroughly scan Docker images for secrets using TruffleHog. By open-sourcing our script, we’re hoping to help code owners answer two questions: 

  • "Do any of my tagged Docker images leak secrets?"

  • "What about secrets in images with multiple architectures?"

How TruffleHog Scans a Docker Image

Docker images are just fancy archive files containing “layers” that are stacked on top of each other. Each layer represents a change in the filesystem from the previous layer, such as adding, modifying, or deleting files. 



When you run TruffleHog to scan for secrets in Docker images, it will automatically pull down the image, and then unarchive each layer to search for secrets.

As an example, the following command scans a sample image that we uploaded containing a live AWS canary token.


trufflehog docker --image trufflesecurity/secrets


You’ll see output like this:


TruffleHog lets you know which layer and which file within that layer the secret was found.

Docker Image Tags

A Docker image tag is a label applied to a Docker image in a repository that identifies a specific version of the image. Tags are used to reference different versions of the same image, such as different versions of a software application or different builds from the same codebase. Secrets can exist in any layer in any tagged version.

The trufflesecurity/secrets repository we referenced has one tagged version on DockerHub. 



Many (most?) repositories have more than one tag; sometimes a repository will have thousands of tags. Additionally, image creators can provide images for multiple platforms, such as linux/amd64, linux/386, linux/arm64 and more. 



What this means is that each tag, and each architecture version within that tag, could contain slightly different data (and secrets). Docker provides each platform-specific variant with a unique sha256 hash, called a manifest digest. 


Within TruffleHog, users can supply a manifest digest to scan platform-specific variants of a tagged docker image. For example, the following command will scan Alpine’s Linux/386 build from the 20240329 tag:


trufflehog docker --image alpine@sha256:c4a262d530f57d1b7b68b52ba8383c2e55fd1a0cb5b4f46b11eed7a2c4e143da


By default, TruffleHog scans the amd64/linux version of the tag you pass into the --image flag. That’s because TruffleHog uses Google’s Go-ContainerRegistry library and that’s their default.



So, if you run the following command to scan Alpine’s latest version, you’ll scan the amd64/linux version.


trufflehog docker --image alpine:latest

Note: If there is no amd64/linux version, then TruffleHog will scan another version.

For most use-cases, scanning one architecture version is sufficient, given the significant code-overlap across multi-platform builds. However, that’s not always the case.

Scan All Tags/Architectures of a Docker Image

During our research, we wanted to scan everything: all tags and all architectures for a particular image. So, we put together a Python script that enumerates all relevant image manifest digests and then passes each one into TruffleHog’s docker command. 


import requests, os

YELLOW = "\033[93m"
RESET = "\033[0m"
dockerhub_tag_endpoint = lambda name, page: 
f"https://hub.docker.com/v2/repositories/{name}/tags?page_size=100&page={page}"

def skip_tag(tag):
    deny_list = [".sig", ".enc"]
    return any(tag.endswith(t) for t in deny_list)

def get_container_tag_page(name, page):
    url = dockerhub_tag_endpoint(name, page)
    response = requests.get(url)
    return response.json()

def get_container_tags(name):
    page = 1
    while True:
        response = get_container_tag_page(name, page)
        results = response.get("results", [])
        for result in results:
            yield result
        if not response.get("next"):
            break
        page += 1

if __name__ == "__main__":
    name = "trufflesecurity/secrets"
    for tag in get_container_tags(name):
        if skip_tag(tag["name"]):
            continue
        for img in tag["images"]:
            print("--------------------------------------------------")
            print(f"Scanning tag {YELLOW}{tag['name']}{RESET} with architecture {YELLOW}{img['architecture']}{RESET} for secrets...\n")
            os.system(f"trufflehog docker --image {name}@{img['digest']} --only-verified --no-update")

The code is rather straightforward. First, we query DockerHub’s API for a list of all tags. Second, we iterate through all tags and extract each architecture’s manifest digest (the sha256 hash). Finally, we pass the manifest digests one-by-one to an OS command that invokes TruffleHog. 

To run this script, simply replace “trufflesecurity/secrets” with the name of Docker image you want to scan.

For our research, we added concurrency, robust error handling mechanisms (since it’s very easy to max out your RAM when you’re scanning many docker images at once), and ported the entire script to golang. But we’ll leave those additions as an exercise for the reader.

If you think this should become a part of TruffleHog, we encourage you to open a PR! If you’d like to collaborate with someone on our research team, we’d be happy to work together.