Learn how AI coding assistants can introduce security risks—register for the webinar

TRUFFLEHOG

COMPANY

RESOURCES

Learn how AI coding assistants can introduce security risks—register for the webinar

Charlie Gunyon

The Dig

June 17, 2024

TruffleHog Partnering With Elastic to Scan for Secrets

Charlie Gunyon

June 17, 2024

We're happy to announce that we've partnered with Elastic to bring TruffleHog scanning to Elasticsearch.

Elasticsearch is a distributed, RESTful search and analytics engine. It's designed for scale and can run on anything from a single node all the way up to a cluster with multiple distributed nodes.

Elastic had been using TruffleHog internally for some time looking for secrets in their internal logs. Without a native TruffleHog integration, they were using the filesystem integration, which required a journey that looked something like this:

(Elastic (with configuration files) → Log file on disk → TruffleHog → Results) x Scheduled Runs

Working together with the team at Elastic, we developed a native integration that allows them (and you!) to stream documents straight out of Elastic into TruffleHog, with no extra steps required. Now, the journey looks something like this:

(Elastic cluster → Trufflehog (with configuration flags) → Results) → Streaming in real-time

We’re so proud of our work together and can’t wait to see what you and your team do with it, too.

"TruffleHog's integration with Elasticsearch represents a substantial leap forward in data security capabilities,” said Amit Kanfer and João Duarte, Engineering @ Elastic. “By leveraging Elasticsearch's distributed architecture and TruffleHog's scanning capabilities, organizations can now conduct real-time scanning for leaked credentials within their Elasticsearch clusters. This integration not only enhances security measures but also underscores Elasticsearch's capacity as a robust platform for scalable document storage and analysis."

Scanning an Elasticsearch Cluster

Whether you host your own Elasticsearch cluster or you use Elastic Cloud, TruffleHog can connect to it.

Scan a Local Cluster

There are two ways to authenticate to a local cluster with TruffleHog: (1) username and password, (2) service token.

# Connect to a local cluster with username and password
trufflehog elasticsearch --nodes 192.168.14.3 192.168.14.4 --username truffle --password hog

# Connect to a local cluster with a service token
trufflehog elasticsearch --nodes 192.168.14.3 192.168.14.4 --service-token ‘AAEWVaWM...Rva2VuaSDZ’

Scan an Elastic Cloud Cluster

To scan a cluster on Elastic Cloud, you’ll need a Cloud ID and API key.

trufflehog elasticsearch \
  --cloud-id 'search-prod:dXMtY2Vx...YjM1ODNlOWFiZGRlNjI0NA==' \
  --api-key 'MlVtVjBZ...ZSYlduYnF1djh3NG5FQQ=='

Scan Configuration

$ ./trufflehog elasticsearch -h
--nodes=NODES ...      Elasticsearch nodes ($ELASTICSEARCH_NODES)
--username=USERNAME    Elasticsearch username ($ELASTICSEARCH_USERNAME)
--password=PASSWORD    Elasticsearch password ($ELASTICSEARCH_PASSWORD)
--service-token=SERVICE-TOKEN  
                        Elasticsearch service token ($ELASTICSEARCH_SERVICE_TOKEN)
--cloud-id=CLOUD-ID    Elasticsearch cloud ID. Can also be provided with environment variable
                        ($ELASTICSEARCH_CLOUD_ID)
--api-key=API-KEY      Elasticsearch API key. Can also be provided with environment variable
                        ($ELASTICSEARCH_API_KEY)
--index-pattern="*"    Filters the indices to search ($ELASTICSEARCH_INDEX_PATTERN)
--query-json=QUERY-JSON Filters the documents to search ($ELASTICSEARCH_QUERY_JSON)
--since-timestamp=SINCE-TIMESTAMP  
                        Filters the documents to search to those created since this timestamp;
                              overrides any timestamp from --query-json ($ELASTICSEARCH_SINCE_TIMESTAMP)
--[no-]best-effort-scan Attempts to continuously scan a cluster ($ELASTICSEARCH_BEST_EFFORT_SCAN)

There are several options available to TruffleHog users to configure and optimize their Elasticsearch secret scans.

You can limit the indices the scan examines with --index-pattern using wildcards or a comma-separated list. You can also specify a query with --query-json, or give a timestamp (we support date math) to jump to with --since-timestamp.

Once TruffleHog finishes a scan, it can continuously check for new documents to scan (even in new indices) when the --best-effort-scan flag is provided. This feature is particularly useful if you have an ELK stack installation, as they often continuously consume log messages from distributed sources.

The --best-effort-scan flag lets you avoid costly duplicative scanning, which could result in multiple notifications for the same secret, or tedious bookkeeping to skip documents that were already scanned. It will also scale with you: as your cluster grows in size and throughput, you can increase the number of workers with the --concurrency flag.

Most excitedly, we’ve included “smart skipping” in the --best-effort-scan flag. TruffleHog tracks the number of documents processed in each scan as well as the time it took to process them. During a scan, if more documents are added to the Elasticsearch cluster than TruffleHog scanned, that means the scanning rate is too slow –indeed, scanning has already fallen behind! To catch up, TruffleHog will intelligently skip documents. For example, if 100 documents were scanned but 150 documents were added, TruffleHog will log a warning (e.g. “Scan coverage rate is 67% (100/150); skipping documents to catch up”) and then skip 50 documents in the next scan in order to stay current with the incoming stream.

Secret Detection Output

In addition to TruffleHog’s normal output, secrets found in Elasticsearch include the Document ID, Index, and Timestamp, so it's easy to identify and address.

Things To Keep In Mind

To implement the --since-timestamp and --best-effort-scan flags, we add a @timestamp clause to the search query. If you also include a @timestamp clause in a query passed to --query-json, our clause will override it. In this case, we advise executing a separate run with that --query-json argument.

This is v1

This is the first version of this scanner and there’s plenty of ways to improve it. For starters, there are several useful search parameters we could expose. As an open source company, contributions are always welcome. Drop by the Slack or the Discord if you’re interested!