We're happy to announce that we've partnered with Hugging Face to bring TruffleHog’s secret scanning to the Hugging Face Hub.
Hugging Face is a platform that enables the machine learning community to collaborate on models, datasets, and applications. At its core is the Hugging Face Hub, a place where users can discover, share, and contribute to a vast collection of open-source models, datasets, and demos.
Hugging Face organizes data in three main places: models, datasets and Spaces. Each of these are structured as git repositories, and as we’ve documented before (see our work on GitHub Gists, GitHub Repos, NPM, GitHub Comments, Alexa Top 1M, GitHub Repos Again), developers tend to leak lots of secrets in code repositories.
To combat secret leakage on public (and private) Hugging Face repositories, we worked with the Hugging Face team on two different initiatives:
Creating a native Hugging Face scanner in TruffleHog.
Adding TruffleHog to Hugging Face’s automated scanning pipeline.
Initiative #1 - Scan Hugging Face for Leaked Secrets
The goal for creating a native Hugging Face scanner in TruffleHog was to empower Hugging Face users (and the security teams protecting them) to proactively scan their own account data for leaked secrets.
Our new open-source Hugging Face integration can scan models, datasets and Spaces, as well as any relevant PRs or Discussions. The only limitation (and this stands for all of our `git` based sources) is TruffleHog will not currently scan files stored in LFS. We’re looking to address this for all of our `git` sources soon.
The native Hugging Face integration is invoked similarly to our other native integrations:
Here are some of the most common commands you’d likely want to run against your own Hugging Face account.
Scan a Hugging Face Model
Scan a Hugging Face Dataset
Scan a Hugging Face Space
Scan by User or Organization
Similar to our GitHub scanner, if you’d like to scan by an organization or user, you can run the following commands:
We also included support for scanning Hugging Face Discussions (--include-discussions
) and PRs (--include-prs
). And if you need to pass in an authentication token, you can do so using the --token
flag or by setting a HUGGINGFACE_TOKEN
environment variable.
Initiative #2 - Hugging Face Adds TruffleHog to their Automated Pipeline
Hugging Face's automated scanning pipeline, which runs on every push, has been extended to include TruffleHog. This integration enables the detection of secret leaks in all files uploaded to Hugging Face repositories.
The trufflehog filesystem
command is executed on each new or modified file, scanning for potential secrets. If a verified secret is detected, the user is notified via email, allowing them to take corrective action to secure their sensitive information.
The Hugging Face team will migrate to the native trufflehog huggingface
command once we extend support for LFS scanning in git repositories.
We look forward to continuing to collaborate with the security team at Hugging Face. If you would like to partner with us on a similar project, please let us know!