Miccah Castorina

The Dig

August 30, 2023

Do Pre-Commit Hooks Prevent Secrets Leakage?

Do Pre-Commit Hooks Prevent Secrets Leakage?

Miccah Castorina

August 30, 2023

Developers and security teams often deploy secret scanners, like TruffleHog, through Git pre-commit hooks. This preemptively mitigates API keys, secrets, and passwords from leaking at the code-commit stage on individual developer workstations. But, is that workflow sufficient to prevent all secrets from leaking into production code? 

What are Pre-Commit Hooks?

One of Git’s lesser-known, but most powerful features is Git hooks. Git provides developers with an opportunity to hook into multiple different points in Git’s execution path and trigger an arbitrary action. If you haven’t used a Git hook before, cd into a Git repository on your local machine and ls the .git/hooks directory. Git seeds every repository with several sample Git hooks. None run by default, but you can customize any of them to fit your workflow.


Git Adds a Folder of Sample Git Hooks after Running `git init`


The most popular type of hook is called a pre-commit. Opening the sample pre-commit file reveals the hook’s purpose: “verify what is about to be committed”.


Excerpt from the Sample Pre-Commit File Provided by Git


A pre-commit hook enables developers to run arbitrary checks before committing code to a repository. These checks range from verifying code formatting to detecting AWS secrets.

Developers can author custom pre-commit hooks in any language that can create a single executable file named “pre-commit”.

THE PRE-COMMIT PACKAGE MANAGER

Managing multiple pre-commit hooks can easily become unwieldy. Imagine a single pre-commit file referencing multiple other script files written with a mix of bash, perl and python. Each script adds new code to the codebase and requires testing/debugging. Ensuring compatibility and consistency across multiple developer environments and code repositories is often untenable without more structure.

A group of engineers created an extensible framework called pre-commit to act as a package manager for Git pre-commit hooks. Developers use pre-commit to add, remove and manage a wide variety of Git pre-commit hooks. A single file named .pre-commit-config.yaml inside the repository’s root directory manages all of the Git pre-commit hooks. For example, here’s a pre-commit configuration file that implements multiple checks prior to committing to Git.


# Example Pre-Commit config 
repos: 
  - repo: https://github.com/pre-commit/pre-commit-hooks 
    rev: v3.2.0 
    hooks:
      - id: check-yaml 
      - id: debug-statements 
      - id: end-of-file-fixer 
      - id: trailing-whitespace 
      - id: check-added-large-files 
  - repo: https://github.com/pycqa/isort 
    rev: 5.12.0 
    hooks: 
      - id: isort 
        name: isort (python) 
        files: "." 
        args: [--settings-path=pyproject.toml] 
  - repo: https://github.com/psf/black 
    rev: 22.3.0 
    hooks: 
      - id: black 
        args: [--config=pyproject.toml]


To install TruffleHog using pre-commit, please review the documentation.

Undeniably, Git pre-commit hooks streamline the development process by automating tedious tasks and upholding the repository’s overall quality. In particular, these hooks prove helpful in preventing accidental commits of sensitive data, such as credentials or API

keys, safeguarding that information from ever leaving your computer!

Except…when they don’t.

Pre-Commit Hooks Rarely Scale Well

Git pre-commit hooks, while great in theory, can be a challenge to scale with an organization. We’ve seen a few large organizations with really mature centralized tooling effectively deploy Git pre-commit hooks to all engineers’ repositories. We’ve also seen plenty of large organizations start with this intention, and fall short when they realize the dystopian hellscape their fragmented microservice developer workforce uses to sling code every which way. 

Most organizations have a small handful of repositories they care most about, and a never-ending tail of less important, unorganized, internal tooling and microservices. The challenge with API keys and secrets in particular is, it doesn’t matter where the code is deployed, or if the code is even deployed at all. A sensitive secret API key checked into a one-off repo is just as much of a problem as the same key checked into the centralized well managed mono-repo.

Unfortunately, Git pre-commit hooks are fully local; every developer must install, configure and run the pre-commit code on their local machine. Compounding that complexity, each Git repository could have a different set of local hooks. Surprisingly, there’s no built-in tooling that provides uniform Git pre-commit hook installation for all of an organization’s repositories.

Additionally since they’re locally configured, developers can easily bypass them, turn them off, or simply forget to install them on a fresh clone. Worst of all, Git pre-commit hooks are a one-time-only, in-the-moment check. When used for secrets management, false negatives (unreported secrets leakage) can slip by and live in your Git history undetected until someone finds them.

What about pre-receive hooks?!

And for all of the Git wizards out there, I know what you’re thinking: pre-receive hooks scale way better than pre-commit hooks, just use that!

For those unaware, a pre-receive hook works similarly to a Git pre-commit hook except it runs on a Git server. These pre-receive hooks can reject any changes pushed to the server and are much easier to maintain than setting up Git pre-commit hooks on every machine and every repository.


Source: https://codeburst.io/understanding-git-hooks-in-the-easiest-way-bad9afcbb1b3


While an improvement, there are still several drawbacks to pre-receive hooks. 

Git hosting providers do not uniformly support them (ex: Azure). If a provider does offer pre-receive hooks, they are often limited to enterprise/on-prem customers or come with other restrictions, such as time limits for execution. It’s understandable that SCM providers restrict this feature in SaaS services due to the sensitivity of executing arbitrary scripts; however, there are safe ways to architect sandboxed script execution.

Pre-receive hooks can be sufficient for deterministic quality checks, like code formatting. However, dynamic checks, like preventing secrets leakage, require on-going scanning. A single point-in-time review of code for leaked secrets could produce false negatives, since secret detection engines are constantly providing new support for additional secrets types. 

So, what should we do?

A (MINI) CASE STUDY

We asked a senior security leader from a publicly traded tech company how they integrate secret scanning into their pre-commit workflow and this is what they said:

Our team is using TruffleHog to detect and block code pushes that have secrets embedded in commits…TruffleHog [runs] as part of our pre-push process, and it fails the process with [an] error if a secret is detected. We created an escape hatch that can be used to bypass the automated process for an after-the-fact human review, in case of false positives or other sensible reasons (it has not been used yet). This has been part of our security controls for 6 months and have prevented a number of legitimate secrets from landing into our code repository. This is only on our monorepo, we depend on additional scanning for all other repos.

While this workflow is relatively straight forward to setup for a monorepo setup, the configuration didn’t quite make the cut to scale out to other repos, because it needs to be configured on a per-repository basis.

(AGAIN) SO, WHAT SHOULD WE DO?

There is no simple and straightforward solution. Pre-commit, pre-receive and CI/CD secrets detection all contribute to preventing secrets leakage, but are insufficient by themselves. Regular interval scanning can partially address these gaps, but ultimately we recommend combining all four scanning tactics together to establish a defense-in-depth posture.

To add further complexity, secrets also leak out in more places than just committed code. Oftentimes organizations inadvertently expose secrets in the comments sections of Git tools, in log outputs, or dozens of other places that pre-commit coverage won’t give you. The best-in-class security organizations deploy robust secret scanning across communication applications (Slack, Teams), internal documentation (Google Drive, etc), external storage (S3, GCS, etc) as well as throughout the development process.

Conclusion

Unlike other vulnerability classes, secrets pose a threat simply by existing, regardless of whether it is in a pre-production or post-production environment. This necessitates the use of multiple tools to thoroughly analyze all code, both during the development process and afterwards, in order to identify active secrets and promptly replace any exposed ones.