Background
S3 buckets are a common place to store files in AWS. These buckets have a feature that allows you to make your files readable by anyone on the internet without authentication. If the content is meant for public consumption, like storing HTML, CSS, and JS assets for a website, this feature can be really useful, but it’s a double edged sword. Frequently, these files contain sensitive information, which has caused several high profile security incidents, including:
Typically the data exposed is the end of the reported story, but we’ve found it’s often not the end of the security story. Since we recently added S3 support to TruffleHog, we thought scanning the set of publicly exposed buckets for credentials would be a great way to get ahead of potential security incidents, and we ended up finding thousands of distinct secrets spanning hundreds of customers.
Methodology
The first thing we needed to do is compile a list of open S3 buckets. Luckily, the bucket names are globally unique and can be specified by subdomain. For example, if a bucket was named “trufflehogbucket”, the files could be accessed at: https://trufflehogbucket.s3.amazonaws.com/filename. Because DNS traffic is typically unencrypted, many bucket names are collected by DNS taps. Some vendors like RiskIQ expose this data via their PassiveTotal API.
Other tools like grayhatwarfare take a different approach and generate large lists of likely bucket names and make requests to the S3 API to determine if the bucket exists and contains publicly exposed files. Using these, and other techniques, we built our initial list of buckets. Scanning all of the exposed data quickly grew impractical, so we needed a way to narrow the list to buckets and files likely to contain secrets. Fortunately, greyhatwarfare’s API also allows you to search the names of files, so we searched for common names like ‘.credentials’, ‘.env’, etc. and only scanned buckets containing matching files.
Results
After scanning approximately 4000 buckets containing .env files and .credentials files, we found a file containing secrets had an average of 2.5 secrets in it, with some as high as 10+ secrets in a file.
Secret results
We also found a wide variety of credential types, including:
AWS Keys
GCP service accounts
Azure Blob Storage connection strings
Coinbase API keys
Twilio API keys
Mailgun API keys
RDS passwords
Sendgrid credentials
Pusher credentials
MSSQL passwords
Mailtrap credentials
Google OAuth credentials
Twitter OAuth credentials
Linked in OAuth credentials
Google Maps API keys
Segment API keys
Sauce API keys
Hosted MongoDB credentials
Firebase credentials
Stripe credentials
Rollbar credentials
Twilio credentials
Amplitude credentials
Mailjet credentials
SMS partner credentials
Dropbox credentials
Yousign credentials
PayPal credentials
Mandrill credentials
Zendesk credentials
Hosted message queue connection strings
Razor pay credentials
Text local credentials
Application signing secrets
JWT signing secrets
Impact Magnifier: Wormability
It’s clear from the surrounding context, many of these credentials unlock more buckets that are otherwise authenticated. Here are two examples
Leaked credentials leading to more buckets
Leaked credentials leading to more buckets
It’s probably fair to assume authenticated buckets contain more secrets than unauthenticated ones, due to the implied higher security bar authentication provides. This means attackers can likely use the first round of buckets to find keys that unlock an additional round of buckets and expose more keys, which could expose more buckets, etc. We did not use any of these keys or explore this possibility for obvious reasons, but this makes this type of attack “wormable”, ie, one bucket can lead to another bucket, and so on, magnifying the impact of the leak.
Worming through S3 buckets
What’s worse is some of these keys led to other large data stores that may have access to keys, such as Github API keys, and GCP Storage API keys.
Worming through multiple providers
Next Steps
Naturally at this point we needed to disclose what we found to the affected companies. This proved challenging at times because often buckets don’t have a lot of information connecting them with the bucket creator. We did hundreds of disclosures, and partnered with providers in some cases to get keys revoked for buckets where we couldn’t identify owners. Disclosures ranged from dozens of fortune 500 companies, to NGOs and small startups.
At any scale, it’s a good idea to have all of your buckets scanned routinely to prevent catastrophe.