Scanning 2.6 million public Bitbucket Cloud repositories for secrets
Scanning 2.6 million public Bitbucket Cloud repositories for secrets
Luke Marshall
November 20, 2025
TL;DR I scanned every public Bitbucket Cloud repository (2,636,562 repositories) using TruffleHog, found over 6,000 verified as live secrets, and made over $10,000 in bounties along the way.
This guest post by Security Engineer Luke Marshall was developed through Truffle Security's Research CFP program. Luke specializes in investigating exposed secrets across open-source ecosystems, a path that led him into bug bounty work and responsible disclosure.
After uncovering surprising results in NPM and PyPI, Luke shifted his focus to public Bitbucket repositories. In this post, he breaks down what he found and how he built automated detection to expose secrets at scale.
Luke will also publish a follow-up post next week, expanding this research to GitLab. Stay Tuned!
What is Bitbucket?
Bitbucket is a Git-based code hosting platform created in 2008 and now owned by Atlassian. It was released the same year as Github and three years earlier than Gitlab.
While often overshadowed by GitHub and GitLab, Bitbucket still hosts code for thousands of enterprise organizations. It’s an attractive target for exposed credentials for two main reasons:
It uses Git, which can bury secrets deep in commit history.
It hasn’t received nearly as much attention from security tooling or researchers as GitHub and Gitlab, at least not recently.
Discovering all public Bitbucket Cloud repositories
The goal of this research was to accurately assess the state of exposed credentials across all public Bitbucket Cloud repositories. To get started, I needed a way to list every single public Bitbucket Cloud repository.
Fortunately, Bitbucket provides a public API endpoint that allows paginated access to all public repositories.

The sample script above loops through the bitbucket.org/api/2.0/repositories/ endpoint to retrieve each full repository name.
At the time of the initial research (08/11/2025), Bitbucket returned 2,636,562 repositories from this endpoint. Since then, over 50,000 new public repositories have been published!
Building the automation
With 2,636,562 repositories to scan, a single VPS wouldn’t scale. To speed things up, I opted for a serverless approach using AWS Lambda, which I’ve successfully used before in some of my large-scale projects.
My automation consisted of two main components:
A local Python script that sent all 2,636,562 repository names to an AWS Simple Queue Service (SQS) queue, which acted as a durable task list.
An AWS Lambda function to (a) scan the repositories with TruffleHog, and (b) log the results.
The beauty of this architecture meant that no repository was accidentally scanned twice, and if something broke, scanning would seamlessly resume.
The scanning architecture looked like this:

AWS Lambda requires container images to embed the Lambda runtime (RIC) and a handler. Since the pre-built TruffleHog image is Alpine-based, it won't run as a Lambda on its own.
I built a custom Lambda function using an AWS Python base image (which already has the RIC and correct entrypoint), copied the TruffleHog binary into that image, and then invoked it from the handler.
My Dockerfile looked like this:

This is the TruffleHog command that I used.

Each Lambda invocation executed a simple TruffleHog scan command with concurrency set to 300. This setup allowed me to complete the scan of 2.6 million repositories over a weekend.
Old & Impactful secrets
In total, I found 6,212 verified secrets across the 2,636,562 Bitbucket repositories.
Secrets Exposed by Date
What surprised me the most was the number of live credentials committed more than 5 years ago. There was even a live AWS key committed 12 years ago in June 2013!

The graph above shows the frequency of live secrets by date of exposure. The average hovers around 600-700 between 2018 - 2024.
And remember, these are live credentials at the time of scanning (August 2025); this graph does not represent the total amount of credentials that were exposed and then revoked.
Secrets Exposed by Type
Google Cloud Computing (GCP) takes the cake for having the most leaked live credentials (977). A GCP secret leaked in 0.038% of all repositories on Bitbucket, which equates to about 4 out of every 10,000 repositories. Below, I plotted the most frequently leaked keys by SaaS/Cloud providers.

Based on their potential access scope, around 10 of the top 20 most frequently leaked credential types could be considered a high-impact finding. These include GCP keys, AWS IAM keys, SendGrid API tokens, MongoDB connection strings, OpenAI keys, Atlassian product tokens, Azure Storage keys, and credentials for platforms like Stripe, Slack, and Twilio.
Secrets Exposed by File Extension
The spread of file extensions was interesting:
jsonfiles leaked the most live secrets.phpfiles were the fourth most common, do people even use PHP anymore??Language files like
pyandjshad a large footprint, probably due to the popularity of these languages.
Here are the top 10 file extensions where valid secrets were found:

Atlassian Keys Leaked on an Atlassian Product?!
Another standout finding was the disproportionately high number of exposed credentials related to Atlassian products (Jira, Bitbucket, and Opsgenie). It makes sense - folks using Bitbucket for git hosting, would likely use the entire suite of Atlassian products.
In total, I discovered 247 valid Atlassian credentials, compared to typically much lower volumes seen in my NPM and PyPI datasets.
Of these 247 exposed credentials, three were rated a P1 (critical) severity submission by the organization’s bug bounty program. The rest were responsibly disclosed to the Atlassian team.
Atlassian (generic Atlassian tokens) | 142 |
Jira | 50 |
Opsgenie | 4 |
BitbucketAppPassword | 51 |
P1 Bug Bounty Submissions
In total, I submitted 11 priority 1 (P1) vulnerabilities to vulnerability disclosure and bug bounty programs. All of the P1 credentials were committed after 2020.

Summary
This project really challenged my assumption that security researchers are consistently targeting all Git platforms for exposed secrets. To be honest, I was not expecting Bitbucket to have as many exposed secrets as it did. The type of secrets and impact also surprised me. From this process, I was able to achieve $10,000 in bug bounties and report to over 50+ organisations.
Alongside the TruffleHog team, we responsibly disclosed (and got revoked) thousands of live secrets. This research shows that established enterprise software that has been around for almost two decades remains a goldmine for security researchers.
TL;DR I scanned every public Bitbucket Cloud repository (2,636,562 repositories) using TruffleHog, found over 6,000 verified as live secrets, and made over $10,000 in bounties along the way.
This guest post by Security Engineer Luke Marshall was developed through Truffle Security's Research CFP program. Luke specializes in investigating exposed secrets across open-source ecosystems, a path that led him into bug bounty work and responsible disclosure.
After uncovering surprising results in NPM and PyPI, Luke shifted his focus to public Bitbucket repositories. In this post, he breaks down what he found and how he built automated detection to expose secrets at scale.
Luke will also publish a follow-up post next week, expanding this research to GitLab. Stay Tuned!
What is Bitbucket?
Bitbucket is a Git-based code hosting platform created in 2008 and now owned by Atlassian. It was released the same year as Github and three years earlier than Gitlab.
While often overshadowed by GitHub and GitLab, Bitbucket still hosts code for thousands of enterprise organizations. It’s an attractive target for exposed credentials for two main reasons:
It uses Git, which can bury secrets deep in commit history.
It hasn’t received nearly as much attention from security tooling or researchers as GitHub and Gitlab, at least not recently.
Discovering all public Bitbucket Cloud repositories
The goal of this research was to accurately assess the state of exposed credentials across all public Bitbucket Cloud repositories. To get started, I needed a way to list every single public Bitbucket Cloud repository.
Fortunately, Bitbucket provides a public API endpoint that allows paginated access to all public repositories.

The sample script above loops through the bitbucket.org/api/2.0/repositories/ endpoint to retrieve each full repository name.
At the time of the initial research (08/11/2025), Bitbucket returned 2,636,562 repositories from this endpoint. Since then, over 50,000 new public repositories have been published!
Building the automation
With 2,636,562 repositories to scan, a single VPS wouldn’t scale. To speed things up, I opted for a serverless approach using AWS Lambda, which I’ve successfully used before in some of my large-scale projects.
My automation consisted of two main components:
A local Python script that sent all 2,636,562 repository names to an AWS Simple Queue Service (SQS) queue, which acted as a durable task list.
An AWS Lambda function to (a) scan the repositories with TruffleHog, and (b) log the results.
The beauty of this architecture meant that no repository was accidentally scanned twice, and if something broke, scanning would seamlessly resume.
The scanning architecture looked like this:

AWS Lambda requires container images to embed the Lambda runtime (RIC) and a handler. Since the pre-built TruffleHog image is Alpine-based, it won't run as a Lambda on its own.
I built a custom Lambda function using an AWS Python base image (which already has the RIC and correct entrypoint), copied the TruffleHog binary into that image, and then invoked it from the handler.
My Dockerfile looked like this:

This is the TruffleHog command that I used.

Each Lambda invocation executed a simple TruffleHog scan command with concurrency set to 300. This setup allowed me to complete the scan of 2.6 million repositories over a weekend.
Old & Impactful secrets
In total, I found 6,212 verified secrets across the 2,636,562 Bitbucket repositories.
Secrets Exposed by Date
What surprised me the most was the number of live credentials committed more than 5 years ago. There was even a live AWS key committed 12 years ago in June 2013!

The graph above shows the frequency of live secrets by date of exposure. The average hovers around 600-700 between 2018 - 2024.
And remember, these are live credentials at the time of scanning (August 2025); this graph does not represent the total amount of credentials that were exposed and then revoked.
Secrets Exposed by Type
Google Cloud Computing (GCP) takes the cake for having the most leaked live credentials (977). A GCP secret leaked in 0.038% of all repositories on Bitbucket, which equates to about 4 out of every 10,000 repositories. Below, I plotted the most frequently leaked keys by SaaS/Cloud providers.

Based on their potential access scope, around 10 of the top 20 most frequently leaked credential types could be considered a high-impact finding. These include GCP keys, AWS IAM keys, SendGrid API tokens, MongoDB connection strings, OpenAI keys, Atlassian product tokens, Azure Storage keys, and credentials for platforms like Stripe, Slack, and Twilio.
Secrets Exposed by File Extension
The spread of file extensions was interesting:
jsonfiles leaked the most live secrets.phpfiles were the fourth most common, do people even use PHP anymore??Language files like
pyandjshad a large footprint, probably due to the popularity of these languages.
Here are the top 10 file extensions where valid secrets were found:

Atlassian Keys Leaked on an Atlassian Product?!
Another standout finding was the disproportionately high number of exposed credentials related to Atlassian products (Jira, Bitbucket, and Opsgenie). It makes sense - folks using Bitbucket for git hosting, would likely use the entire suite of Atlassian products.
In total, I discovered 247 valid Atlassian credentials, compared to typically much lower volumes seen in my NPM and PyPI datasets.
Of these 247 exposed credentials, three were rated a P1 (critical) severity submission by the organization’s bug bounty program. The rest were responsibly disclosed to the Atlassian team.
Atlassian (generic Atlassian tokens) | 142 |
Jira | 50 |
Opsgenie | 4 |
BitbucketAppPassword | 51 |
P1 Bug Bounty Submissions
In total, I submitted 11 priority 1 (P1) vulnerabilities to vulnerability disclosure and bug bounty programs. All of the P1 credentials were committed after 2020.

Summary
This project really challenged my assumption that security researchers are consistently targeting all Git platforms for exposed secrets. To be honest, I was not expecting Bitbucket to have as many exposed secrets as it did. The type of secrets and impact also surprised me. From this process, I was able to achieve $10,000 in bug bounties and report to over 50+ organisations.
Alongside the TruffleHog team, we responsibly disclosed (and got revoked) thousands of live secrets. This research shows that established enterprise software that has been around for almost two decades remains a goldmine for security researchers.
Thoughts, research findings, reports, and more from Truffle Security Co.
The Dig
Thoughts, research findings, reports, and more from Truffle Security Co.
STAY STRONG
DIG DEEP
DOING IT THE RIGHT WAY
© 2025 Truffle Security Co.
STAY STRONG
DIG DEEP
© 2025 Truffle Security Co.


