Joe Leon

The Dig

August 22, 2023

Running TruffleHog in Azure Pipelines

Running TruffleHog in Azure Pipelines

Joe Leon

August 22, 2023

This post is part of a series documenting how to incorporate TruffleHog into your CI/CD pipeline. If you use GitHub Actions or Circle CI, please see those posts. If you use another tool, please let us know and we’ll do our best to write a post for your use case.

In this post, we’ll walk through how to incorporate TruffleHog (Open-Source or Enterprise) into an Azure Pipeline. Click here to skip the tutorial and check out the code.

Let’s start by understanding the code required to prepare an Azure Pipeline to run TruffleHog. 

The Pipeline File

To run an Azure Pipeline workflow, developers must place a file named azure-pipelines.yml in the main directory of their repository. 

PIPELINE TRIGGERS

The first lines in our Azure Pipeline YAML file define when we will run TruffleHog. 


# PRs to main 
pr: 
- main 

# Pushes to main 
trigger: 
- main


At a minimum, we recommend running TruffleHog against all PRs as well as all pushes directly to the production branch (main in this example).

Note: if you host your code on Azure Repos, you cannot use the pr directive. Instead, you’ll need to set up a “Build Validation” manually inside your Azure project. If you use GitHub, GitLab or any other supported VCS hosting provider, Azure says you can use the pr directive.

DEFINE A JOB

Next, we’ll create a job named “SecretsCheck”. 

This job will contain all of the code required to run TruffleHog against PRs and pushes. 

jobs: - job: SecretsCheck pool: vmImage: ubuntu-latest

Before executing any code, we have to specify which OS we want to run it on (see the pool attribute in the screenshot above). 

We chose ubuntu (and specifically the latest version). Why? We need cURL and jq in the next step and Ubuntu has both of those applications pre-installed. 

You can use any OS that supports running TruffleHog. If your selected OS does not come with cURL and jq, you can either (a) install those tools, or (b) adjust the code in the next step.

Within an Azure Pipeline “job”, you define “steps”. Each “step” takes some action within the virtual environment you selected for that job (such as running a script or installing software). The next 3 steps prepare our git repository for scanning and then invoke TruffleHog.

STEP 1: COUNT THE COMMITS


steps: 
- script: | 
# Set up CURL headers + URI 
headers="Authorization: Bearer $(System.AccessToken)" 
uri="$(System.TeamFoundationServerUri)$(System.TeamProject)/_apis/build/builds/$(Build.BuildId)/changes" 
# Set count of changes in this Push to the ChangesCount variable 
ChangesCount=$(echo $(curl -sSL -H "$headers" "$uri") | jq '.count') 
# If PR, add 1 to ChangesCount. 
if [ "$(Build.Reason)" == "PullRequest" ]; then ChangesCount=$(($ChangesCount+1)); fi 
# Export ChangesCount variable for use in other steps 
echo "##vso[task.setvariable variable=ChangesCount]$(echo $ChangesCount )"


In our first step, we execute a bash script that counts the number of commits present in the PR or Push. Unfortunately, Azure does not provide a built-in variable to reference this value in a pipeline (as far as we can tell!). 

Developers must query the pipeline API to get the commit count. Using the System.AccessToken, this script executes an authenticated query against the pipeline URI for the “changes” information related to this workflow. We parse the count value from the JSON response, which is an integer value representing the count of commits in the PR or Push, and set it as an environment variable named ChangesCount.  

Interestingly, when executing a workflow against a PR (not a push), Azure creates an additional commit in the context of our CI/CD pipeline. It’s usually titled something like “Merge pull request <#> from <branch> into main”. This additional commit is not included in the count of changes retrieved from the API. As a result, we must increment our ChangesCount variable by one to account for the additional commit in our git history.

In the example below, we made two commits and then created a PR. Running git log during CI/CD execution revealed one commit more current than the last commit in the PR. The ChangesCount value from the pipeline API was 2; however, we needed to increment this value to 3 to properly scan all commits in the PR. 


Example of an Azure-Generated Commit in our CI Runtime


Finally, we export our ChangesCount variable for use outside of this step. Just like a Python function, our script has a local context. To use the ChangesCount variable outside of this step, we export it using the odd-looking ###vso[task.setvariable] syntax. Future steps can now reference the number of commits being reviewed by this pipeline by invoking the $(ChangesCount) variable.

STEP 2: GIT CHECKOUT

So why did we go through all of that trouble to get the number of commits present in the PR or Push? 

Efficiency! A naive approach to adding TruffleHog into CI would clone the entire repository and run TruffleHog against the whole git history. Depending on the size of the repository, this could take several minutes (instead of seconds). Also, every pipeline execution would replicate previous TruffleHog scanning, which is a waste of computing resources and time.

Instead, our goal is to efficiently scan only the difference between the base branch and the PR or Push. This cuts down TruffleHog’s scanning time to seconds. To accomplish this, we modified Azure’s built-in “Git Checkout” task by customizing the fetchDepth. 

Azure’s Git Checkout Process

Azure’s default “Git Checkout” step does the following:

  1. Initialize a new, empty git repository on the temporary VM.


git init "/home/vsts/work/1/s"


  1. Create a remote connection between your code (hosted in Azure Repos or elsewhere) and the new git repository.


git


  1. Fetch the relevant git data from the remote repository (ie: where your code lives).


git --config-env=http.extraheader=env_var_http.extraheader fetch --force --tags --prune --prune-tags --progress --no-recurse-submodules origin --depth=1


  1. Check out the most recent commit on the remote origin, thus forcing the local git status into a detached HEAD state.


git checkout --progress --force


As tempting as it is to use git commands to figure out how many commits there have been in the Push or PR since the base branch, it’s not possible to do this with the default “Git Checkout” step. You would need to fully clone the repository to establish a reference to the base branch (ex: main). 

Alternatively, you could develop a custom “Git Checkout” command, but since that seemed like a lot of work (and likely less stable), we decided to customize Azure’s built-in command.

Shallow Fetch

Azure provides users with a fetchDepth argument for the “Git Checkout” step. This enables developers to specify a value for the --depth argument in the git fetch command that is run during checkout (step 4 in Azure’s Git Checkout process).

Here is the official Git documentation about the --depth flag:


Git Fetch Depth Documentation Screenshot


Essentially, the fetchDepth argument limits how far back in git history Azure grabs commits. This enables developers to only clone down the commits between the base branch and the current Push or PR commit. Azure calls this process Shallow Fetching.

Here’s the YAML code to accomplish shallow fetching:


- checkout: self 
  fetchDepth: $(ChangesCount) 
  displayName: "Git Checkout"


The YAML code uses the ChangesCount variable as a value for the fetchDepth argument. This prevents the entire git history from cloning onto the CI/CD VM and limits the git history reviewed by TruffleHog to the relevant PR or Push commits.

STEP 3: RUN TRUFFLEHOG

In CI/CD pipelines, we recommend running TruffleHog Open-Source from Docker. It’s easier than installing Go and compiling from scratch. But you’re welcome to run it however you’d like.


- script: | 
    docker run --rm -v "$(pwd)":/tmp ghcr.io/trufflesecurity/trufflehog:latest \ 
    --only-verified --fail --no-update git file:///tmp/ displayName: "Running TruffleHog (Open-Source)"


Above we’re running a fancy docker command that mounts our git repository inside the docker container and checks it for secrets. Here’s a breakdown of that code:

docker run : run a command in a new container

--rm : automatically remove the container when it exits

-v “$(pwd)”:/tmp : mount our Ubuntu machine’s current working directory (where our git code is) into the /tmp folder in our docker container

ghcr.io/trufflesecurity/trufflehog:latest: use the latest version of TruffleHog’s docker image

--only-verified : only report secrets that are verified to be current/valid

--fail : if a secret is discovered, exit the program (which will fail the pipeline)

--no-update : don’t reach out to our update server (you’re already using the latest version + this would only slow things down)

git file:///tmp/ : look through our git repository located at /tmp/.git in the Docker container

Now that we’ve reviewed the YAML code, let’s get this working inside your Azure project.

Configuring the Pipeline

We’ll assume you already have a project + repository set up in Azure. 

The first step is to add an azure-pipelines.yml file. 

If you don’t have a pipeline setup already, please copy/paste this version. Please commit your changes.If you already have a pipeline file, copy the SecretsCheck job into your existing file. Please commit your changes.

If this is your first time setting up an Azure Pipeline in this project, you’ll need to click on “Pipelines” and then “Create Pipeline” (as shown below).


Creating a Pipeline in Azure


You should be forwarded to the “Review” stage, since we already committed a YAML file to our repository. Click the “Run” button. If you already had a pipeline setup, please manually trigger a “Run” to ensure the new job was added correctly.


Starting the First Pipeline Run


You should see all green checks from the SecretsCheck job. 


TruffleHog Successfully Completing a Scan in an Azure Pipeline


If this failed, please review the error message and this tutorial to ensure you implemented it correctly. If for some reason it’s still failing, please let us know. We’re happy to help.

Azure Repos User?

Above, we mentioned that code hosted in Azure Repos cannot use the pr directive to run TruffleHog during PRs. Instead, you need to set up a Build Validation. 

To get started, click on “Branches”, then the three dots to the right of whichever base branch you want to run TruffleHog on during a PR, and then “Branch Policies”.


Accessing Branch Policies in Azure


Click the “+” sign under “Build Validation”.


Adding a Build Validation to Azure


Change “Build Expiration” to “Immediately when main is updated”, name your Build Validation, and then click “Save”.


Adding an Azure Build Policy


That’s it! Test it out by creating a new PR containing a Canary Token and ensure TruffleHog catches the leaked secret.

Note: If you get an error message when saving your Build Validation, click “Cancel”, refresh the page and try again. That almost always worked for us.