tl;dr There are thousands of live API credentials and passwords in public GitHub comments. Unlike accidentally committing a secret to git, GitHub users are inserting passwords into text boxes and publicly posting them for all to see. TruffleHog now supports scanning GitHub issues, pull requests and comments.
Developers accidentally commit secrets to git repositories constantly. And it makes sense – an engineer working on a new integration hardcodes an API key and forgets to remove it before committing their changes. Fortunately, there’s an entire ecosystem of tooling to help prevent these types of leaks. But what about outside of git commits?
How often are GitHub users leaking secrets in other places?
Real screenshot of an exchange in the public comments section
Turns out it’s quite often.
The Findings
1. THOUSANDS OF GITHUB COMMENTS LEAK API KEYS AND PASSWORDS
Using TruffleHog, we sampled a small subset of GitHub’s public Pull Request and Issue comments and discovered 721 live API keys and passwords. By extrapolation, we can reasonably conclude that many thousand live secrets currently exist in public comments.
GitHub Personal Access Token in an Issue Comment
Distribution of Secret Types
2. COMMENTING A SECRET IS DIFFERENT BEHAVIOR THAN COMMITTING TO GIT
In the past, we’ve researched secrets leakage in code repositories, but this research didn’t focus on code.
Instead, we exclusively reviewed cases where users, often knowingly, copied and pasted their credentials into a text box and clicked “Comment”
Users Insert Secrets into Text Boxes and Comment them Publicly
While this isn’t unsurprising, it’s important to keep in mind that although tools like TruffleHog can help with secrets leakage, we must continue to educate developers on the importance of guarding secrets.
3. HUMAN USERS (AND NOT BOTS) ARE COMMENTING SECRETS
Human users authored nearly all comments (97%) containing leaked secrets in our dataset. While we identified a few cases of automated jobs (bots) publicly leaking secrets, they did not represent a statistically significant amount.
Secrets Leaked by a GitHub Bot
4. MOST LEAKERS HAD NO ASSOCIATION WITH THE REPOSITORY
GitHub’s comment metadata indicates the commenting author’s relationship to the repository. A commenter can be the repository owner, a member of the repository owner’s organization, a collaborator/contributor, or have no relation.
The majority of commenters had no relation to the repository.
Commenter’s Association to Repository
In addition, we found several examples of users with no association to the repository leaking keys while looking for support.
Customer Leaked a Key for a SaaS Provider on the SaaS Provider’s Repository
Below, we graphed all instances of a specific provider’s secrets leaking in PRs and Issues on their own repositories.
5. REPOS WITH SECRETS IN COMMENTS OFTEN HAD SECRETS IN GIT
One third of all repositories containing secrets in PR and/or Issue comments also had secrets lurking in git history. While not surprising, this establishes a pattern of insecurity across the repository’s community, since the same individual did not necessarily leak both secrets.
6. EDITING COMMENTS DID NOT DELETE THEM
Whether the user realized their mistake, or a fellow developer reminded them, many users edited their original comment to remove the exposed secret. Unfortunately, unless the user deleted their comment, the prior edits remained in the comment history.
In the example below, the user edited their original comment to remove the exposed key.
Edited Comment Appeared to Not Contain a Secret
The drop down next to the “edited” tag revealed prior versions, including the original containing the valid API key.
Edited Comment Version Revealed a Key
This is a common behavior we’ve seen in other platforms too, such as wiki pages.
About 10% of all live secrets lived in past comment versions.
7. MOST LEAKED SECRETS WERE IN TEXT BLOCKS, NOT CODE BLOCKS
GitHub comments are markdown-rendered. Users can input any markdown-compatible text, including code blocks. The majority of comments containing leaked secrets did not leak inside of a code block (much to our surprise!). Instead, users manually typed (or copy/pasted) their secret directly into a plain text box and treated the secret like any other word. This further cements the idea that leaking secrets in comments is a fundamentally different behavior than leaking secrets in code.
The Research
PULL REQUEST COMMENTS
A Pull Request provides developers with the opportunity to suggest changes to a code base. On GitHub, multiple developers often engage in a conversation about the suggestions via comments. The comment field is an HTML text box; users can upload files, link to new sites, quote lines of code, and generally add any text they want.
A Blank Comment Box
ISSUE COMMENTS
Developers use GitHub issues to file bug reports, submit new feature requests, or engage with the repository maintainers when they do not want to submit a Pull Request.
Similar to the PR comment field, the Issue comment field allows users to add most types of HTML content, including freeform text.
OUR RESEARCH PROCESS
Our goal was to evaluate how often developers leak secrets by typing them into a textbox (not committing them to git). We built our data set by downloading a sample of a couple hundred million GitHub comments dating from 2012 to 2022. There are billions of public comments, so we only sampled a small portion of the total volume. We included new Issue posts, Issue Comments and Pull Request Comments.
We did not scan any code referenced inside a Pull Request comment, since referenced code comes directly from the git committed code base. Instead, we only scanned text inputted by a user.
After pulling down each comment, we ran our open-source secret scanner, TruffleHog, against the comment text.
We only included secrets that TruffleHog verified as live. Our headline would have been a lot cooler if we included expired API keys (Hundreds of Thousands of Comments Leak API Keys), but expired secrets rarely present a meaningful security vulnerability. We’re laser-focused on reducing false positives and only reporting meaningful results to our users.
After scanning each text block, we compiled metadata about the comments containing a live API key/secret and analyzed the results.
Responsible Disclosure
We attempted to notify all impacted parties. Like most of our research projects, this always proves tricky. After identifying an email address for most of the individuals with exposed keys, we sent an email that looked like this.
Sample Disclosure Email
Our outreach was met with a variety of responses. Some disputed the validity of our claims:
A User’s Skeptical Response to our Disclosure
Some thanked us for informing them:
A User’s Positive Response to our Disclosure
Most simply didn’t respond or act on our message. Unfortunately, thousands of keys remain publicly exposed.
How Widespread is this?
There are tens of thousands of secrets lurking in public GitHub PR and Issue comments. By contrast, Truffle Security sees 1800 new secrets leaked in GitHub git pushes every day.
It’s much more likely that developers are committing secrets to a codebase than commenting them on public repositories. That said, if you maintain a public repository, we recommend regularly sweeping the Issue and PR comments for the presence of secrets.
SCAN GITHUB ISSUES, PRS AND COMMENTS WITH TRUFFLEHOG
We recently open-sourced a new feature in TruffleHog that enables users to scan their public repositories for secrets in Issues and PRs.
When you use TruffleHog’s github
module, you can pass in a repository URI as well as the flags --issue-comments
and --pr-comments
to scan all Issue and PR comments and descriptions.
As an example, here’s how you would scan TruffleHog’s test_keys
GitHub repository:
And here’s a sample of a secret found in an Issue comment (don’t worry it’s just a test key):
Sample Output from Scanning a GitHub Issue using TruffleHog
Note: The --pr-comments
flag does not scan the code changes associated with the PR. It only scans the initial PR description and all user comments (which could include user-inserted code blocks).
WHAT ELSE CAN YOU DO?
In addition to regular repository scanning, we recommend the following:
1. Review the output of any automated tooling that comments on PRs or Issues (such as a GitHub Actions bot) and ensure all secrets are masked.
2. Review all of your own comments to ensure they do not contain secrets in past edits.
3. If you’re a SaaS provider, ensure your users aren’t inadvertently leaking their keys in Issue support requests.
What should you do if you expose a key?
If you inadvertently expose a credential, we recommend immediately rotating that key. Simply deleting the GitHub comment containing the key is insufficient for several reasons: (1) A threat actor could already have a copy of that key, (2) GH Archive could contain that key, despite attempts to delete it, (3) Editing a comment does not always delete the previous version containing it. The only way to immediately invalidate the key and render it useless to any threat actor in possession of that key is to rotate it.