We're happy to announce that we've partnered with Elastic to bring TruffleHog scanning to Elasticsearch.
Elasticsearch is a distributed, RESTful search and analytics engine. It's designed for scale and can run on anything from a single node all the way up to a cluster with multiple distributed nodes.
Elastic had been using TruffleHog internally for some time looking for secrets in their internal logs. Without a native TruffleHog integration, they were using the filesystem integration, which required a journey that looked something like this:
(Elastic (with configuration files) → Log file on disk → TruffleHog → Results) x Scheduled Runs
Working together with the team at Elastic, we developed a native integration that allows them (and you!) to stream documents straight out of Elastic into TruffleHog, with no extra steps required. Now, the journey looks something like this:
(Elastic cluster → Trufflehog (with configuration flags) → Results) → Streaming in real-time
We’re so proud of our work together and can’t wait to see what you and your team do with it, too.
"TruffleHog's integration with Elasticsearch represents a substantial leap forward in data security capabilities,” said Amit Kanfer and João Duarte, Engineering @ Elastic. “By leveraging Elasticsearch's distributed architecture and TruffleHog's scanning capabilities, organizations can now conduct real-time scanning for leaked credentials within their Elasticsearch clusters. This integration not only enhances security measures but also underscores Elasticsearch's capacity as a robust platform for scalable document storage and analysis."
Scanning an Elasticsearch Cluster
Whether you host your own Elasticsearch cluster or you use Elastic Cloud, TruffleHog can connect to it.
Scan a Local Cluster
There are two ways to authenticate to a local cluster with TruffleHog: (1) username and password, (2) service token.
Scan an Elastic Cloud Cluster
To scan a cluster on Elastic Cloud, you’ll need a Cloud ID and API key.
Scan Configuration
There are several options available to TruffleHog users to configure and optimize their Elasticsearch secret scans.
You can limit the indices the scan examines with --index-pattern
using wildcards or a comma-separated list. You can also specify a query with --query-json
, or give a timestamp (we support date math) to jump to with --since-timestamp
.
Once TruffleHog finishes a scan, it can continuously check for new documents to scan (even in new indices) when the --best-effort-scan
flag is provided. This feature is particularly useful if you have an ELK stack installation, as they often continuously consume log messages from distributed sources.
The --best-effort-scan
flag lets you avoid costly duplicative scanning, which could result in multiple notifications for the same secret, or tedious bookkeeping to skip documents that were already scanned. It will also scale with you: as your cluster grows in size and throughput, you can increase the number of workers with the --concurrency
flag.
Most excitedly, we’ve included “smart skipping” in the --best-effort-scan
flag. TruffleHog tracks the number of documents processed in each scan as well as the time it took to process them. During a scan, if more documents are added to the Elasticsearch cluster than TruffleHog scanned, that means the scanning rate is too slow –indeed, scanning has already fallen behind! To catch up, TruffleHog will intelligently skip documents. For example, if 100 documents were scanned but 150 documents were added, TruffleHog will log a warning (e.g. “Scan coverage rate is 67% (100/150); skipping documents to catch up”) and then skip 50 documents in the next scan in order to stay current with the incoming stream.
Secret Detection Output
In addition to TruffleHog’s normal output, secrets found in Elasticsearch include the Document ID, Index, and Timestamp, so it's easy to identify and address.
Things To Keep In Mind
To implement the --since-timestamp
and --best-effort-scan
flags, we add a @timestamp
clause to the search query. If you also include a @timestamp
clause in a query passed to --query-json
, our clause will override it. In this case, we advise executing a separate run with that --query-json
argument.
This is v1
This is the first version of this scanner and there’s plenty of ways to improve it. For starters, there are several useful search parameters we could expose. As an open source company, contributions are always welcome. Drop by the Slack or the Discord if you’re interested!