Tl;dr TruffleHog automagically scans for secrets in several different encoded string formats (eg. Base64) and archived file types (eg. zip).
Is this string a secret?
I know. You’re frustrated. You have to open up an online Base64 decoder or remember the command line argument to decode it. I’ll save you the trouble. Yes. Yes it is.
And as much as we’d love to just pretend secrets are never encoded, and are never buried in compressed and archived files, the reality is we find lots of secrets this way.
Consider your typical AWS access key. It starts with the prefix AKIA.
Base64 encoding AKIA
outputs a string starting with QUtJQ
.
A simple code search on GitHub for QUtJQ
returns over 4,000 results!
While most are invalid AWS keys, there are plenty of live ones.
Fun Fact: GitHub’s Advanced Secret Scanner prevents users from committing AWS keys to public repositories, but not if they are base64 encoded.
To help our enterprise clients and open-source community members, our engineering team has dedicated considerable effort in developing efficient ways to scan strings and files for secrets in various obfuscated formats.
In this post, we’ll break down all of the different encoded and archived data formats that TruffleHog supports and provide examples for you to test TruffleHog’s secret detection yourself.
Encoded Data
TruffleHog currently scans for secrets in 4 different types of encoded strings: UTF-8, UTF-16, Base64 and Escaped Unicode.
Source: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/decoders/decoders.go
UTF-8
UTF-8 is the de facto standard for text encoding. Most of the secrets TruffleHog finds are simple UTF-8 encoded strings.
UTF-16
UTF-16 is another widely used encoding standard, especially for handling text in systems that require two-byte characters. It’s common in Windows environments and often used for international text. Secrets can easily end up encoded in UTF-16, especially when they’re part of applications supporting multi-language content. TruffleHog automatically scans for secrets within UTF-16 encoded files, whether little-endian or big-endian.
Base64
TruffleHog’s Base64 decoder is pretty cool. When analyzing a string of data, it looks for all substrings that look like base64 code and then attempts to decode them one-by-one. If it finds a valid decoded string, it’ll then search for secrets in that decoded string. See the code for yourself here.
Escaped Unicode
For those unfamiliar, Escaped Unicode often looks like this: \u0061\u0077\u0073\u005F\u0061\u0063\u0063
. While it’s not the most common format for encoding secrets, some tooling allows users to supply credentials via Escaped Unicode (like Maven passwords in build.gradle
files).
What’s up with that command? The command (1) converts our AWS access key string from UTF-8 to UTF-16 Big Endian format, (2) transforms that output into a hexadecimal string, (3) formats it into the \u####\u####
Unicode escape sequence format, and (4) removes any line breaks.
I need support for a different decoder type.
Need to find secrets in Base32 or some other encoding? The decoders.go file is purposefully extensible. Adding an additional decoder should only require a little bit of code. In fact, a well-worded GPT prompt could do most of the heavy lifting for you.
This could also make for an interesting research project for Truffle’s Open CFP. Interested? Apply here.
Archived Files
In addition to scanning encoded strings, TruffleHog detects secrets hidden within archived files. By utilizing specialized file handlers, TruffleHog efficiently processes a couple dozen different compressed file types.
Below are the supported archive formats, along with examples you can use to test TruffleHog.
Unix Archives and Debian Packages
TruffleHog’s arHandler processes Unix archive files and Debian packages, with support for three (3) MIME types: application/x-archive, application/x-unix-archive
, and application/vnd.debian.binary-package
. These formats are often used in Linux distributions and package management systems.
This command creates a simple Unix archive (unix_archive.ar
) with a file (secret.txt
) that contains an AWS access key and secret.
RPM and CPIO Archives
TruffleHog's rpmHandler handles RPM package files and CPIO archives, with support for the following two (2) MIME types: applications/x-rpm
and application/cpio
. The RPM file format is often sued in Red Hat-based Linux distributions.
This command creates a CPIO archive (archive.cpio
) containing the secret.txt
file.
Common Archive Formats
TruffleHog’s archiveHandler leverages the awesome archiver library to handle popular archive formats such as .zip
, .tar
, .gz
, etc. The archiver library’s broad compatibility allows TruffleHog to scan a wide range of compressed file types for secrets, as well as expand its reach as the archiver project grows.
Here’s a screenshot from the archiver repository’s README file documenting all of the currently supported compression and archive formats:
[Source: https://github.com/mholt/archiver?tab=readme-ov-file#supported-compression-formats]
The full list is: .br
, .bz2
, .zip
, .gz
, .lz4
, .lz
, .sz
, .xz
, .zz
, .zst
, .tar
, .tar.gz
, .rar
, .7z
.
We can test TruffleHog's secret scanning on any of those, but for simplicity, we'll just create a .zip
archive.
I need support for a different archive type.
TruffleHog's file handler system is highly extensive. Check out the handlers.go file, specifically, the selectHandler
function.
Add a new case to that switch, create a supporting file to actually uncompress and unarchive your files, (plus a few other minor code changes), and then you’ll be able to scan for secrets in a custom archive type.
Conclusion
TruffleHog supports searching for secrets in a lot of different data encodings and archive formats. While we’ve covered a lot here, it’s worth noting that keeping everything running smoothly takes ongoing effort from both our engineering team and the open-source community. We're always excited to add support for new formats, but our top priority is making sure the existing ones are rock-solid and accurate across all detectors (meaning the 800+ secret types we support). Got an idea for a new decoder or file handler? We’d love to hear from you!