Joe Leon

The Dig

August 15, 2023

TruffleHog Commands: Git vs Filesystem

TruffleHog Commands: Git vs Filesystem

Joe Leon

August 15, 2023

TL;DR: Scan local git repos with TruffleHog’s git command (trufflehog git file://local-repo/); the filesystem command could miss hard-coded secrets hiding in git history. There is one exception to this rule: when scanning a corrupted git repository, use the filesystem command, since git will not yield results.

Scenario #1

You’re a security engineer (either in-house AppSec or external penetration tester) tasked with identifying security vulnerabilities in a code base. Once you gain access to the source code, you clone the repository onto your local machine and start a code review. The lowest hanging fruit (read vulnerabilities) are often hard-coded API keys and secrets in git commits. So, you decide to run TruffleHog to identify leaked secrets. 

What TruffleHog command do you use?

GIT VS FILESYSTEM

TruffleHog has two commands that appear relevant to scanning a repository on a local machine: filesystem and git.

The filesystem command (trufflehog filesystem /path/to/repo) seems like it would scan the files in a local directory and report all detectable secrets committed to git. Unfortunately, it’s not that simple. The inner-complexities of git object compression and storage require a separate workflow to detect secrets buried in git history. The best command to find leaked secrets in locally-cloned git repositories is the git command (trufflehog git file://folder/).

To illustrate the difference between running git vs filesystem, we’ll scan Truffle Security’s Test_keys repository. This repository contains 3 valid secrets: an authentication URI, an SSH private key and AWS credentials. The authentication URI and SSH private key are located in the “keys” file in the current commit. 


Valid credentials in TruffleSecurity’s “Test_Keys” GitHub repository


Importantly, the AWS credentials are not present in any of the files in the current commit; they reside in a past git commit.


AWS key in TruffleSecurity’s “Test_Keys” GitHub repository git history. This key is not located in the current commit.


The filesystem command found the 2 valid secrets located in the current commit’s “keys” file. Interestingly, the filesystem command did not identify the AWS credentials located in git history.


Git vs Filesystem: Running TruffleHog’s Filesystem Command


As expected, the git command identified all 3 secrets.


Git vs Filesystem: Running TruffleHog’s Git Command


A Layer Deeper

Instead of using Truffle Security’s “Test_Keys” repository, try replicating this on your own. 

Follow these steps:

  1. Initialize a git repo.

  2. Hard-code a secret (ex: using Twilio’s free SDK example) and commit it.

  3. Delete the hard-coded secret and commit again.


An Example Python Script using the Twilio SDK to Send an SMS message. The Image Above Does Not Contain a Secret Key; however, the Git Repository Contains a Historical Commit that has a Hard-Coded Secret Key.


You now have a repository with no leaked secrets in the current commit; however, the git history contains a valid secret.

We know the git command will work, but try running filesystem. Did it work? Yes, yes it did.


Running the TruffleHog Filesystem Command


So, why does the filesystem command sometimes look into git history to identify past hard-coded secrets? 

Loose Objects vs. Pack Files

In the example above, the Twilio API key is located inside the /.git/objects directory in a file named “c18809830302036b538c17efcff65b959da6ac”. Simply opening that file reveals a bunch of gibberish. 


Running the cat Command Against a Git Object File


To limit the space required for storing repository data, git compresses files (called blobs) using the compression tool zlib. Running the git cat-file command deflates the compressed blobs and returns the cleartext files. If you follow these steps, you’ll see your own cleartext credentials.


Running git cat-file


So, why can’t the filesystem command recognize a zlib compressed file and deflate it? It does. That’s why the filesystem command worked. In fact, TruffleHog recognizes several archive types and attempts to peek inside for hard-coded secrets.

The challenge with git is Packfiles. The official git tutorial explains it best:

The initial format in which Git saves objects on disk is called a “loose” object format. However, occasionally Git packs up several of these objects into a single binary file called a “packfile” in order to save space and be more efficient. Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server.

When you make a change to a file, git often stores the difference between the two versions instead of two full copies of the “same” file. This helps save space and enables git to operate more efficiently. When git packs up multiple objects into a single binary file, called a “packfile”, TruffleHog’s filesystem command can no longer simply unarchive a git object file to view the cleartext data. Instead, TruffleHog uses a separate workflow (and command) to dive into the git commit history to search for leaked credentials.

To simulate this process occurring in your test repository, run the following command:


git


This stands for “garbage collection” and “runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance)…”.


Garbage Collection Packs File Version History into binary Packfiles


The previous object files no longer exist and the file version history is neatly packed into a binary Packfile. Running the filesystem TruffleHog command again reveals the absence of the hard-coded secret.


Running the Filesystem Command Against a Repository after Garbage Collection


Scenario #2

You come across an exposed git directory on a website while bug bounty hunting. So, you use a tool like goop to download the repository onto your local machine. Then, you decide to search for leaked secrets in the source code using TruffleHog. Unfortunately, the git repository is corrupted and the TruffleHog git command does not work. What’s next?

CORRUPTED GIT REPOSITORIES

Truffle Security will soon publish research highlighting the quantity of ./git directories exposed on the Alexa Top 1 million websites. While conducting that research, the team encountered several thousand corrupted git repositories.


Corrupted Git Directories Prevented Basic Git Commands from Running.


Corrupted repositories prevent basic git commands from executing successfully. Since TruffleHog relies on git commands to rebuild commit history and deflate compressed files, the TruffleHog git command could not properly analyze the corrupted repositories.


TruffleHog Running Against a Corrupted Git Repository.


In these narrow situations, running filesystem may identify secrets in the corrupted directories. Typically, those secrets will be located in the /.git/logs, /.git/config or /.git/objects files.

Below is an example of TruffleHog not identifying a valid AWS key in a corrupted git repository using the git command, but finding it using the filesystem command.


Example of TruffleHog not Identifying a Key in a Corrupted Git Repository using the Git Command, but Finding it using the Filesystem Command


Scenario #3

You oversee your team’s DevOps efforts and want to include a secrets check in your CI/CD pipeline. You’ll always want to use the git command. In fact, we have two additional CLI flags that directly help DevOps teams efficiently scan PRs and Pushes. 

To scan the difference between the head branch (PR) and base branch, provide the base branch name as an argument to the --since-commit flag and the feature branch name as an argument to the --branch flag. 

As an example, if you wanted to merge a branch named “feature” into the base branch “main”, you could run this command in CI:


trufflehog git


To scan the difference between a push directly to one branch and that branch’s current commit, provide the current commit hash to the --since-commit flag and the most recent commit hash in your push to the --branch flag. 

The specific implementation in CI varies based on the platform, but those two examples should provide a starting point.

We have a few blog posts created for specific CI/CD tools, such as GitHub Actions and Circle CI, and will continue to create more. Also, please review our enterprise documentation on “Scanning in CI”.

Conclusion

When you’re searching for leaked secrets in git repositories, always use TruffleHog’s git command, even for local repositories. The intricacies of git file compression and storage require a separate workflow from the filesystem command. There is one exception to this rule: when  scanning a corrupted repository, use the filesystem command, since git will not yield results.

[BONUS] PRO TIP: VALID GIT URIS

Seeing this error?


TruffleHog Error Due to Lack of File Protocol


The git command requires a valid git URI. In the context of locally cloned repositories, the path to the git repository folder must be prepended with file://


TruffleHog Running with File Protocol Argument

The Dig

Thoughts, research findings, reports, and more from Truffle Security Co.