Learn how AI coding assistants can introduce security risks—register for the webinar

TRUFFLEHOG

COMPANY

RESOURCES

Learn how AI coding assistants can introduce security risks—register for the webinar

Karim Rahal

The Dig

October 25, 2023

Mirror, Mirror, on the Wall, Secrets Leaked from Repos All

Karim Rahal

October 25, 2023

About a year ago, we discovered an AWS key exposed in a public NPM package.

This AWS key was in an interesting looking file: package/out.sql

Naturally, curiosity got the better of us, and we looked inside the file. Unbeknownst to the developer, a copy of the company’s production database was sitting in a public package. Below is a copy of the disclosure email we sent to the developer:

Truffle Security Disclosure Email

The database included password hashes for users, and an AWS key, among other data. It was about as bad of a leak as we could imagine.

How did the developer react? They immediately unpublished that version and removed it from the npm registry. The data and AWS key should’ve been safe, right? Unfortunately, no. By the time they clicked “unpublish”, global npm mirrors had already cloned a copy of the package. NPM (and other public registries like PyPi) have no way to force mirrors to remove specific data. For the leaked AWS key, the only secure way to remediate it was with key rotation. For the affected user data, there was no putting the cork back in the bottle, despite the developer’s best intentions.

We felt the right thing to do at this point was to escalate the problem to the security team, but that presented some challenges in itself:

Registries and Mirrors

A registry is a centralized database/repository designed to store and manage specific types of software. Two of the most popular registries are npm and DockerHub. Npm enables users to publish, share, and download JavaScript packages. DockerHub operates a similar service for Docker images. There are countless other examples as well (Homebrew, PyPi, Maven, etc).

A mirror is a replica of a registry’s content hosted on a separate server. Users and organizations operate mirrors for a few reasons.

LATENCY

A mirror provides a caching layer between the main registry and end users. Mirrors bring packages geographically closer to users—acting like an edge server. The npmmirror.com registry, for instance, provides China-based developers access to the npm registry with lower latency.

PRIVATE PACKAGES

In addition to serving the public-facing registry content, some mirrors serve private packages and overwrite public ones. These are usually self-hosted on internal company infrastructures.

PREVENTING APPLICATIONS FROM BREAKING

Software increasingly relies on third-party packages. If a public registry removes software relied on by an application, that application could break. Many corporations host private mirrors of public software packages to ensure a revoked package won’t disrupt their software.

Verdaccio is a popular tool for self-hosting a npm mirror. If an internal developer requests a package not in Verdaccio’s local database, it clones a copy from the central registry. Any subsequent internal requests for the same package will return the local copy.

When a third-party developer removes a public npm package, Verdaccio retains its local, private copy. By maintaining an offline repository unaffected by upstream registry changes, Verdaccio prevents applications from breaking.

Tencent’s npm mirror https://mirrors.cloud.tencent.com/npm uses Verdaccio.

Mirrors and Secrets Leakage

By their nature, mirrors operate separately from the registry. A registry has no control over what mirrors do with the replicated information. A mirror is free to, for example, retain deleted packages—like in the case of Verdaccio.

npm has warned about this regarding secret leaks:

If you quickly realize your mistake [exposing a secret], you can unpublish the package and it will be deleted from our servers. When that happens a “delete” event is sent to the downstream replicas. However, we cannot control what the downstream replicas do with that event; we cannot guarantee that third parties will delete your data. No matter how important, private, or sensitive the information in that package was, we can’t claw it back.

Once a leak occurs and you unpublish your code on the registry, there is no guarantee that downstream mirrors will listen to the delete event.

CASE STUDY: LEAKING A SECRET ON NPM

Let’s follow the lifecycle of a leaked secret on npm.

We created an npm package called gh_issue_monitor. This package is pretty simple; it queries our GitHub account and notifies us of new issues in our projects.

A “gh_issue_monitor” output showing that my GitHub repositories have new issues opened

We hardcoded a GitHub access token into the script, and then published the package to npm under version 1.0.1:

A screenshot of the package’s code: it has an arrow pointing to the hardcoded GitHub access token

Within a few seconds, the package appeared on registry.npmmirror.com, the Chinese-language npm mirror. The mirror even pulled our code into its CDN and made it available for download. Next, other mirrors, such as servers running Verdaccio instances, requested the new version and stored a local copy.

After our package propagated to the npm mirrors, we issued the npm unpublish --force command to unpublish the code containing the GitHub token. While the official npm registry respected the command and removed that version, we couldn’t control all of the mirrors.

The registry.npmmirror.com mirror almost immediately removed the version metadata, but provided a downloadable copy of the code for a day. Other mirrors ignored the unpublish command and continued to maintain a copy of the vulnerable code. For example, Tencent’s Verdaccio instance still provides a downloadable file to access the vulnerable, unpublished version.

A screenshot of Tencent’s mirror registry. It displays information about the “gh_issue_monitor” package, in addition to a tarball download link

Since each mirror establishes their own retention policy, code authors cannot control who retains a vulnerable package, nor who accesses and downloads it. Malicious actors are likely regularly scrapping mirrored packages and searching for secrets.

CASE STUDY: ANALYZING NPM DATA

A few months ago, we released Forager, a public event monitoring tool. Forager subscribes to the GitHub and npm public event streams and scans for secrets in all new git pushes (for GitHub) and package publications (for npm).

We downloaded a sample of packages (3,240) that Forager identified as having leaked a secret and set out to determine how many still contained live keys and how many were still published.

Finding #1: 85% of Vulnerable Packages Still Contained Valid Secrets

We downloaded every package* and re-scanned for secrets using TruffleHog. 85% of all packages (dating from 2020 to 2023) still contained live secret data.

Despite having the shortest amount of time to correct their mistake, package owners that leaked secrets in 2023 revoked their keys in more than 22% of all sampled packages. That figure stands in contrast to prior years, where key revocation ranged from only 3% to 18%.

Finding #2: 0.9% of Vulnerable Packages were Unpublished

Only 33 packages known to have previously contained secrets were unpublished from the npm registry. Interestingly, 32 of 33 were still accessible on Tencent’s Verdaccio mirror. The earliest published package in this cohort was from 2021. We can reasonably conclude that Tencent maintains all npm package versions, published or unpublished, for at least a couple years.

Finding #3: No Unpublished Packages Contained Live Secrets

This finding surprised us the most. None of the (accessible) unpublished packages still contained a live secret. This is either a sampling size bias (only 32 packages), or reflects a fundamental understanding of package security by the publishing organizations. These groups chose to revoke their keys and unpublish the vulnerable version.

*One package was no longer accessible.

Managing Leaked Secrets in Mirrors

The key to managing leaked secrets in mirrors is to identify leaks in the original published artifacts and then promptly rotate the impacted keys. Below, we break down the process into 3 steps.

First, use a secret scanner, like TruffleHog, to scan your published artifacts for leaked credentials. Your artifact could be a npm package, a Docker image, a code repository or something else. There’s no need to scan mirrored versions, since the goal is to identify if your original published artifact contains a leaked key.

As an example, we ran TruffleHog against our sample gh_issue_monitor node package (a tgz file) and detected the hardcoded GitHub token as shown below:

A terminal output of Trufflehog, showing a leaked GitHub personal access token from index.js

Second, rotate the leaked secrets. It’s impossible to force every mirror to take down the impacted artifact, so the only secure remediation option is to revoke the affected keys.

To help the community easily rotate their leaked API keys and secrets, we created a How-to-Rotate guide (and related GitHub repo) that documents the 2 or 3 steps required to rotate the most popular SaaS provider’s secrets.

HowToRotate.com Screenshot

Third, we recommend reviewing the SaaS provider’s access logs to ensure a threat actor did not gain access to your account.

Importantly, this process only works if you actively scan every published artifact for leaked keys. Since secret scanners often update their detection engine to identify additional secret types, it’s important to periodically re-scan all artifacts.

Conclusion: “Once on the Internet, Always on the Internet”

When you publish a package or image that leaks a secret, it’s highly likely that that secret will remain public forever. At first, nobody may notice; but eventually, a malicious actor may find your secret. Maybe they’re scraping npm mirrors, or maybe they’re scanning for every version of every package you’ve ever published. Whatever the case, deleting and unpublishing your code is not enough—decentralized mirrors will replicate leaked secrets. The only secure way to remediate is key rotation.