We scanned thousands of AWS, Azure & GCP images. Register for webinar to see what we found.

TRUFFLEHOG

COMPANY

RESOURCES

We scanned thousands of AWS, Azure & GCP images. Register for webinar to see what we found.

Joe Leon

The Dig

September 4, 2024

TruffleHog Partners With Hugging Face to Scan for Secrets

Joe Leon

September 4, 2024

We're happy to announce that we've partnered with Hugging Face to bring TruffleHog’s secret scanning to the Hugging Face Hub.

Hugging Face is a platform that enables the machine learning community to collaborate on models, datasets, and applications. At its core is the Hugging Face Hub, a place where users can discover, share, and contribute to a vast collection of open-source models, datasets, and demos.

Hugging Face organizes data in three main places: models, datasets and Spaces. Each of these are structured as git repositories, and as we’ve documented before (see our work on GitHub Gists, GitHub Repos, NPM, GitHub Comments, Alexa Top 1M, GitHub Repos Again), developers tend to leak lots of secrets in code repositories.

To combat secret leakage on public (and private) Hugging Face repositories, we worked with the Hugging Face team on two different initiatives:

Creating a native Hugging Face scanner in TruffleHog.
Adding TruffleHog to Hugging Face’s automated scanning pipeline.

Initiative #1 - Scan Hugging Face for Leaked Secrets

The goal for creating a native Hugging Face scanner in TruffleHog was to empower Hugging Face users (and the security teams protecting them) to proactively scan their own account data for leaked secrets.

Our new open-source Hugging Face integration can scan models, datasets and Spaces, as well as any relevant PRs or Discussions. The only limitation (and this stands for all of our `git` based sources) is TruffleHog will not currently scan files stored in LFS. We’re looking to address this for all of our `git` sources soon.

The native Hugging Face integration is invoked similarly to our other native integrations:

trufflehog huggingface --help

Here are some of the most common commands you’d likely want to run against your own Hugging Face account.

Scan a Hugging Face Model

trufflehog huggingface --model <model_id

Scan a Hugging Face Dataset

trufflehog huggingface --dataset <dataset_id

Scan a Hugging Face Space

trufflehog huggingface --space <space_id

Scan by User or Organization

Similar to our GitHub scanner, if you’d like to scan by an organization or user, you can run the following commands:

# To scan all models, spaces and datasets owned by a User
trufflehog huggingface --user <username>
# To scan all models, spaces and datasets owned by an Organization
trufflehog huggingface --org <orgname

We also included support for scanning Hugging Face Discussions (--include-discussions) and PRs (--include-prs). And if you need to pass in an authentication token, you can do so using the --token flag or by setting a HUGGINGFACE_TOKEN environment variable.

Initiative #2 - Hugging Face Adds TruffleHog to their Automated Pipeline

Hugging Face's automated scanning pipeline, which runs on every push, has been extended to include TruffleHog. This integration enables the detection of secret leaks in all files uploaded to Hugging Face repositories.

The trufflehog filesystem command is executed on each new or modified file, scanning for potential secrets. If a verified secret is detected, the user is notified via email, allowing them to take corrective action to secure their sensitive information.

The Hugging Face team will migrate to the native trufflehog huggingface command once we extend support for LFS scanning in git repositories.

We look forward to continuing to collaborate with the security team at Hugging Face. If you would like to partner with us on a similar project, please let us know!