Joe Leon

The Dig

October 18, 2024

Secret Scanning Encoded and Archived Data

Secret Scanning Encoded and Archived Data

Joe Leon

October 18, 2024

Tl;dr TruffleHog automagically scans for secrets in several different encoded string formats (eg. Base64) and archived file types (eg. zip). 

Is this string a secret?


YXdzX2FjY2Vzc19rZXlfaWQgPSBBS0lBMlVDM0JTWE1JWUVHSUVPRiBhd3Nfc2VjcmV0X2FjY2Vzc19rZXkgPSBWRUpDY2R0WS9sUGR2WmdWbnJqV1UxT3VzT1BKcEVPdkZtV1dycUJkCg

I know. You’re frustrated. You have to open up an online Base64 decoder or remember the command line argument to decode it. I’ll save you the trouble. Yes. Yes it is. 

And as much as we’d love to just pretend secrets are never encoded, and are never buried in compressed and archived files, the reality is we find lots of secrets this way.

Consider your typical AWS access key. It starts with the prefix AKIA. 



Base64 encoding AKIA outputs a string starting with QUtJQ

A simple code search on GitHub for QUtJQ returns over 4,000 results!



While most are invalid AWS keys, there are plenty of live ones. 

Fun Fact: GitHub’s Advanced Secret Scanner prevents users from committing AWS keys to public repositories, but not if they are base64 encoded.

To help our enterprise clients and open-source community members, our engineering team has dedicated considerable effort in developing efficient ways to scan strings and files for secrets in various obfuscated formats.

In this post, we’ll break down all of the different encoded and archived data formats that TruffleHog supports and provide examples for you to test TruffleHog’s secret detection yourself.

Encoded Data

TruffleHog currently scans for secrets in 4 different types of encoded strings: UTF-8, UTF-16, Base64 and Escaped Unicode.


Source: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/decoders/decoders.go

UTF-8

UTF-8 is the de facto standard for text encoding. Most of the secrets TruffleHog finds are simple UTF-8 encoded strings.


echo "aws_access_key_id = AKIA2UC3BSXMIYEGIEOF aws_secret_access_key = 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" > utf8.txt 
trufflehog filesystem utf8.txt

UTF-16

UTF-16 is another widely used encoding standard, especially for handling text in systems that require two-byte characters. It’s common in Windows environments and often used for international text. Secrets can easily end up encoded in UTF-16, especially when they’re part of applications supporting multi-language content. TruffleHog automatically scans for secrets within UTF-16 encoded files, whether little-endian or big-endian.


echo -n "aws_access_key_id = AKIA2UC3BSXMIYEGIEOF aws_secret_access_key = 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" | iconv -f UTF-8 -t UTF-16LE > utf16.txt
trufflehog filesystem utf16.txt

Base64

TruffleHog’s Base64 decoder is pretty cool. When analyzing a string of data, it looks for all substrings that look like base64 code and then attempts to decode them one-by-one. If it finds a valid decoded string, it’ll then search for secrets in that decoded string. See the code for yourself here.


echo "aws_access_key_id = AKIA2UC3BSXMIYEGIEOF aws_secret_access_key = VEJCcdtY/lPdvZgVnrjWU1OusOPJpEOvFmWWrqBd" | base64 > base64.txt
trufflehog filesystem base64.txt

Escaped Unicode

For those unfamiliar, Escaped Unicode often looks like this: \u0061\u0077\u0073\u005F\u0061\u0063\u0063. While it’s not the most common format for encoding secrets, some tooling allows users to supply credentials via Escaped Unicode (like Maven passwords in build.gradle files).


echo "aws_access_key_id = AKIA2UC3BSXMIYEGIEOF aws_secret_access_key = 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" | iconv -f UTF-8 -t UTF-16BE | xxd -p | sed 's/\(..\)\(..\)/\\u\1\2/g' | tr -d '\n' > escaped_unicode.txt
trufflehog filesystem escaped_unicode.txt


What’s up with that command? The command (1) converts our AWS access key string from UTF-8 to UTF-16 Big Endian format, (2) transforms that output into a hexadecimal string, (3) formats it into the \u####\u#### Unicode escape sequence format, and (4) removes any line breaks.

I need support for a different decoder type.

Need to find secrets in Base32 or some other encoding? The decoders.go file is purposefully extensible. Adding an additional decoder should only require a little bit of code. In fact, a well-worded GPT prompt could do most of the heavy lifting for you. 

This could also make for an interesting research project for Truffle’s Open CFP. Interested? Apply here.

Archived Files

In addition to scanning encoded strings, TruffleHog detects secrets hidden within archived files. By utilizing specialized file handlers, TruffleHog efficiently processes a couple dozen different compressed file types. 

Below are the supported archive formats, along with examples you can use to test TruffleHog.

Unix Archives and Debian Packages

TruffleHog’s arHandler processes Unix archive files and Debian packages, with support for three (3) MIME types: application/x-archive, application/x-unix-archive, and application/vnd.debian.binary-package. These formats are often used in Linux distributions and package management systems.


echo "aws_access_key_id = AKIA2UC3BSXMIYEGIEOF aws_secret_access_key = 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" > secret.txt
ar r unix_archive.ar secret.txt
trufflehog filesystem unix_archive.ar

This command creates a simple Unix archive (unix_archive.ar) with a file (secret.txt) that contains an AWS access key and secret.

RPM and CPIO Archives

TruffleHog's rpmHandler handles RPM package files and CPIO archives, with support for the following two (2) MIME types: applications/x-rpm and application/cpio. The RPM file format is often sued in Red Hat-based Linux distributions.


echo "aws_access_key_id = AKIA2UC3BSXMIYEGIEOF aws_secret_access_key = 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" > secret.txt
echo secret.txt | cpio -ovF archive.cpio
trufflehog filesystem archive.cpio

This command creates a CPIO archive (archive.cpio) containing the secret.txt file.

Common Archive Formats

TruffleHog’s archiveHandler leverages the awesome archiver library to handle popular archive formats such as .zip, .tar, .gz, etc. The archiver library’s broad compatibility allows TruffleHog to scan a wide range of compressed file types for secrets, as well as expand its reach as the archiver project grows.

Here’s a screenshot from the archiver repository’s README file documenting all of the currently supported compression and archive formats:


[Source: https://github.com/mholt/archiver?tab=readme-ov-file#supported-compression-formats]

The full list is: .br, .bz2, .zip, .gz, .lz4, .lz, .sz, .xz, .zz, .zst, .tar, .tar.gz, .rar, .7z.

We can test TruffleHog's secret scanning on any of those, but for simplicity, we'll just create a .zip archive.


echo "aws_access_key_id = AKIA2UC3BSXMIYEGIEOF aws_secret_access_key = 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" > secret.txt


zip secret.zip secret.txt
trufflehog filesystem secret.zip

I need support for a different archive type.

TruffleHog's file handler system is highly extensive. Check out the handlers.go file, specifically, the selectHandler function.



Add a new case to that switch, create a supporting file to actually uncompress and unarchive your files, (plus a few other minor code changes), and then you’ll be able to scan for secrets in a custom archive type.

Conclusion

TruffleHog supports searching for secrets in a lot of different data encodings and archive formats. While we’ve covered a lot here, it’s worth noting that keeping everything running smoothly takes ongoing effort from both our engineering team and the open-source community. We're always excited to add support for new formats, but our top priority is making sure the existing ones are rock-solid and accurate across all detectors (meaning the 800+ secret types we support). Got an idea for a new decoder or file handler? We’d love to hear from you!