Introducing Forager
Trufflehog is an open-source secret scanning engine that detects sensitive credentials such as passwords and API keys – secrets that are inadvertently exposed by individuals and organizations. Two years ago, Trufflehog v3 was released, a complete rewrite that has capabilities like credential detectors, native source scanning support, file decoders, and more.
With the new and improved TruffleHog, we wanted to test the capabilities of our tool as well as improve digital safety for those on the web.
Introducing Forager, our powerful new tool to scan the public web: https://forager.trufflesecurity.com/explore
Forager finds live API keys (verified with TruffleHog) finds the emails they’re associated with, and lets that company view the details of the exposure. You can also view redacted information for other companies in case you want to give them a heads up, but only they will be able to view the details.
Example of searching for keys Google associated with @google.com emails
Secret Sources
What are we scanning? Where are we looking for secrets? At the time of writing, Forager scans two sources: GitHub and NPM.
GITHUB
Everyday GitHub processes millions of events, both private and public. These public events are available to anyone via the public events API. The API lists near real-time information of all the public activity happening on GitHub – from issue creation to push events.
Keys are confirmed as still live with TruffleHog
Forager funnels all the information and passes it along to Trufflehog. From there, we throw any suspicious-looking information at our detectors where it then identifies whether a secret is verified or not.
In our CEO’s recent Nahamcon talk, Dylan leaked a valid AWS canary token to see how long it took for someone to interact with a live, verified secret. This was done in a repo with no watchers and no stars – something extremely low visibility that people are unlikely to come across. After just 10 minutes, someone already attempted to authenticate with our token. Over the next few hours, malicious actors in multiple countries used the leaked key multiple times,
Here are some interesting statistics we’ve gathered:
0.1% of pushes (not just commits!) have live credentials in them.
90.9% of pushes with live credentials are to personal repositories. 9.1% are to organization repositories.
7.8% of pushes with live credentials are to forks rather than the original repository.
NODE PACKAGE MANAGER (NPM)
Secrets in NPM don’t match the GitHub Source Code
If you interact with the web, you have definitely interacted with the NPM. NPM is the de-facto package registry for all things JavaScript.
Recently I decided to see what it takes to publish a NPM package. I went into an existing test repo and ran npm init
to set it up as a basic package. From there, I took the following steps to release my cool new package:
Create an NPM account
Authenticate on the terminal with
npm login
Navigate to the root directory of the package
Run
npm publish
Et voilà, a public NPM package! As a developer, that was delightful. But as a security practitioner, warning bells went off.
To publish a package, you send off an entire directory to NPM based on your current working directory. There is a whole lot less precision here with what you decide to publish, making it easy to expose something you don’t want to.
Let’s say I have a .env
file full of secrets that I want to keep private.
With Git, a .gitignore
file will intentionally exclude certain files and directories from your Git history. NPM acknowledges this as a common software pattern and will use .gitignore
to prevent the specified files and directories from being published as well. For now, including *.env
in the repository’s .gitignore
will keep those secrets private.
However, a Git repository and a software package have different priorities. Let’s say I want to minimize the size of my package for faster downloads. A quick search shows that you can add a .npmignore
to prevent certain files from publishing. So I create a .npmignore
that omits my test files – all good, right?
Unfortunately the contents of .npmignore
and .gitignore
are not cumulative. By having a .npmignore
, the .gitignore
is no longer used to block files and directories from being published. If you aren’t careful, there’s a good chance you inadvertently publish your secret .env
file for all to see
The Medium article “For the love of god, don’t use .npmignore” by Jeff Dickey highlights some of the pitfalls and concerns around blocklists. NPM also has a file allowlisting feature, but that requires some research and additional setup in comparison.
Given the ease of leaking a secret, we set up Forager to scan NPM to search for live credentials. In our early findings, we noticed that older package versions leaked secrets more often than newer ones, highlighting security guardrail improvements in NPM. Despite these findings, the data suggests that more people are adopting secure practices.
And more…
We know that secrets don’t just live in GitHub and NPM! Future iterations of Forager will explore other secret sources. We have potential sources like PyPi on our roadmap!
We will keep expanding the sources we’re scanning
No Noise, High Signal Secrets
The secrets on Forager were all real, live, active secrets at the time of scan. Trufflehog has 700+ detectors to detect if a credential is actually verified or not. For example, the AWS detector performs a GetCallerIdentity
API call against the Amazon Web Services (AWS) API to verify if an AWS credential is active. No more noisy lists of credentials to manually validate!
Note that Forager does not keep a copy of the secret for privacy and security reasons. We store the location in which it was found as well as a redacted string to provide context clues. Furthermore, sometimes developers delete their key from history instead of rotating an exposed key. This might break some of our links when investigating a secret.
Forager only identifies what we’ve seen and does not track state. If continuous monitoring and remediation workflows sound interesting to you, contact us to chat about Trufflehog Enterprise.
More details around open source vs enterprise can be found here: trufflesecurity.com/pricing
Forager – Unearth your secrets
With all this credential data, how do we responsibly disclose this information? Manually finding and reporting information for each secret is a monumental task and part of the reason why we built Forager.
With Forager, you can search by domain to see what organizations have leaked secrets. This provides a lightweight redacted view with information around secret types and time of detection.
Forager users can view a lot more information after logging in with their corporate Google or Microsoft account. Secrets are “unmasked” based on the domain you authenticate with. “Public” emails like Gmail and university emails can only view unmasked secrets that the email itself has leaked. Identities tied to organizations (ie. work emails) have access to unmasked secrets from anyone within that organization.
We hope that Forager will be a helpful addition to your security toolkit. Give it a try and let us know what you think!
Community Slack: trufflehog-community.slack.com