TL;DR: Scan local git repos with TruffleHog’s git
command (trufflehog git file://local-repo/
); the filesystem
command could miss hard-coded secrets hiding in git history. There is one exception to this rule: when scanning a corrupted git repository, use the filesystem
command, since git
will not yield results.
Scenario #1
You’re a security engineer (either in-house AppSec or external penetration tester) tasked with identifying security vulnerabilities in a code base. Once you gain access to the source code, you clone the repository onto your local machine and start a code review. The lowest hanging fruit (read vulnerabilities) are often hard-coded API keys and secrets in git commits. So, you decide to run TruffleHog to identify leaked secrets.
What TruffleHog command do you use?
GIT VS FILESYSTEM
TruffleHog has two commands that appear relevant to scanning a repository on a local machine: filesystem
and git
.
The filesystem
command (trufflehog filesystem /path/to/repo
) seems like it would scan the files in a local directory and report all detectable secrets committed to git. Unfortunately, it’s not that simple. The inner-complexities of git object compression and storage require a separate workflow to detect secrets buried in git history. The best command to find leaked secrets in locally-cloned git repositories is the git
command (trufflehog git file://folder/
).
To illustrate the difference between running git vs filesystem
, we’ll scan Truffle Security’s Test_keys repository. This repository contains 3 valid secrets: an authentication URI, an SSH private key and AWS credentials. The authentication URI and SSH private key are located in the “keys” file in the current commit.
Valid credentials in TruffleSecurity’s “Test_Keys” GitHub repository
Importantly, the AWS credentials are not present in any of the files in the current commit; they reside in a past git commit.
AWS key in TruffleSecurity’s “Test_Keys” GitHub repository git history. This key is not located in the current commit.
The filesystem
command found the 2 valid secrets located in the current commit’s “keys” file. Interestingly, the filesystem
command did not identify the AWS credentials located in git history.
Git vs Filesystem: Running TruffleHog’s Filesystem Command
As expected, the git
command identified all 3 secrets.
Git vs Filesystem: Running TruffleHog’s Git Command
A Layer Deeper
Instead of using Truffle Security’s “Test_Keys” repository, try replicating this on your own.
Follow these steps:
Initialize a git repo.
Hard-code a secret (ex: using Twilio’s free SDK example) and commit it.
Delete the hard-coded secret and commit again.
An Example Python Script using the Twilio SDK to Send an SMS message. The Image Above Does Not Contain a Secret Key; however, the Git Repository Contains a Historical Commit that has a Hard-Coded Secret Key.
You now have a repository with no leaked secrets in the current commit; however, the git history contains a valid secret.
We know the git
command will work, but try running filesystem
. Did it work? Yes, yes it did.
Running the TruffleHog Filesystem Command
So, why does the filesystem
command sometimes look into git history to identify past hard-coded secrets?
Loose Objects vs. Pack Files
In the example above, the Twilio API key is located inside the /.git/objects
directory in a file named “c18809830302036b538c17efcff65b959da6ac”. Simply opening that file reveals a bunch of gibberish.
Running the cat Command Against a Git Object File
To limit the space required for storing repository data, git compresses files (called blobs) using the compression tool zlib. Running the git cat-file
command deflates the compressed blobs and returns the cleartext files. If you follow these steps, you’ll see your own cleartext credentials.
Running git cat-file
So, why can’t the filesystem
command recognize a zlib compressed file and deflate it? It does. That’s why the filesystem
command worked. In fact, TruffleHog recognizes several archive types and attempts to peek inside for hard-coded secrets.
The challenge with git is Packfiles. The official git tutorial explains it best:
The initial format in which Git saves objects on disk is called a “loose” object format. However, occasionally Git packs up several of these objects into a single binary file called a “packfile” in order to save space and be more efficient. Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server.
When you make a change to a file, git often stores the difference between the two versions instead of two full copies of the “same” file. This helps save space and enables git to operate more efficiently. When git packs up multiple objects into a single binary file, called a “packfile”, TruffleHog’s filesystem
command can no longer simply unarchive a git object file to view the cleartext data. Instead, TruffleHog uses a separate workflow (and command) to dive into the git
commit history to search for leaked credentials.
To simulate this process occurring in your test repository, run the following command:
This stands for “garbage collection” and “runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance)…”.
Garbage Collection Packs File Version History into binary Packfiles
The previous object files no longer exist and the file version history is neatly packed into a binary Packfile. Running the filesystem
TruffleHog command again reveals the absence of the hard-coded secret.
Running the Filesystem Command Against a Repository after Garbage Collection
Scenario #2
You come across an exposed git directory on a website while bug bounty hunting. So, you use a tool like goop to download the repository onto your local machine. Then, you decide to search for leaked secrets in the source code using TruffleHog. Unfortunately, the git repository is corrupted and the TruffleHog git
command does not work. What’s next?
CORRUPTED GIT REPOSITORIES
Truffle Security will soon publish research highlighting the quantity of ./git
directories exposed on the Alexa Top 1 million websites. While conducting that research, the team encountered several thousand corrupted git repositories.
Corrupted Git Directories Prevented Basic Git Commands from Running.
Corrupted repositories prevent basic git commands from executing successfully. Since TruffleHog relies on git commands to rebuild commit history and deflate compressed files, the TruffleHog git
command could not properly analyze the corrupted repositories.
TruffleHog Running Against a Corrupted Git Repository.
In these narrow situations, running filesystem
may identify secrets in the corrupted directories. Typically, those secrets will be located in the /.git/logs
, /.git/config
or /.git/objects
files.
Below is an example of TruffleHog not identifying a valid AWS key in a corrupted git repository using the git
command, but finding it using the filesystem
command.
Example of TruffleHog not Identifying a Key in a Corrupted Git Repository using the Git Command, but Finding it using the Filesystem Command
Scenario #3
You oversee your team’s DevOps efforts and want to include a secrets check in your CI/CD pipeline. You’ll always want to use the git
command. In fact, we have two additional CLI flags that directly help DevOps teams efficiently scan PRs and Pushes.
To scan the difference between the head branch (PR) and base branch, provide the base branch name as an argument to the --since-commit
flag and the feature branch name as an argument to the --branch
flag.
As an example, if you wanted to merge a branch named “feature” into the base branch “main”, you could run this command in CI:
To scan the difference between a push directly to one branch and that branch’s current commit, provide the current commit hash to the --since-commit
flag and the most recent commit hash in your push to the --branch
flag.
The specific implementation in CI varies based on the platform, but those two examples should provide a starting point.
We have a few blog posts created for specific CI/CD tools, such as GitHub Actions and Circle CI, and will continue to create more. Also, please review our enterprise documentation on “Scanning in CI”.
Conclusion
When you’re searching for leaked secrets in git repositories, always use TruffleHog’s git
command, even for local repositories. The intricacies of git file compression and storage require a separate workflow from the filesystem
command. There is one exception to this rule: when scanning a corrupted repository, use the filesystem
command, since git
will not yield results.
[BONUS] PRO TIP: VALID GIT URIS
Seeing this error?
TruffleHog Error Due to Lack of File Protocol
The git
command requires a valid git URI. In the context of locally cloned repositories, the path to the git repository folder must be prepended with file://
.
TruffleHog Running with File Protocol Argument