Tl;dr TruffleHog now automatically decodes Android Package Kit (APK) files and searches them for secrets. It runs ~9x faster than using an external decompiler before calling TruffleHog.
Android Package Kit (apk
) files drive the apps on Android phones. Despite their cool extension (.apk
), they’re essentially just zip
files containing, among other things, compiled Java and Kotlin code.
During the build process, Java and Kotlin code
undergo complex compilation and encoding steps that often obscure secrets. While TruffleHog has long supported scanning zip
files for secrets, it lacked the specialized logic to process the compiled code inside apk
files—until now.
TruffleHog now automatically decodes and parses APK files. And it’s really fast. By eliminating the need to decompile, TruffleHog performs apk
secret scanning up to nine times faster, drastically reducing the time and effort required to secure Android applications.
9x Faster APK Secret Scanning
We recently downloaded the top 5 most popular apk
files from APKMirror: Google Play, Google Authenticator, WhatsApp, Facebook, and Facebook Messenger.
We scanned each apk
6 times:
3 times using our old method (a script combining the jadx decompiler and the pre-updated TruffleHog)
3 times using the updated TruffleHog
The results surprised us.
By decoding apk
files natively within TruffleHog, we saw a 9x improvement over our previous open-source apk
scanning method.
Why does this matter?
Apk
files are known for leaking keys. Until now, researching apk
secret leakage at scale was prohibitively expensive. Imagine having to wait over 3 minutes to scan just one apk
file (like in the case of Facebook Messenger). Then multiply that by all the different versions and architecture releases of that app. It just took forever.
This version of Facebook Messenger has 40 variants! Imagine waiting 3 minutes do decompile each one!
Now, researchers and other TruffleHog users can efficiently search for leaked secrets in apk
files at scale. We’re hopeful this will help bug bounty hunters, internal security teams, and Android developers.
So, what exactly are the old and new methods?
The Old Method
In the past, the most straightforward way to scan an apk
file for secrets was a two-part process: (1) Decompile the apk
using a tool like jadx
, (2) Scan the decompiled data using TruffleHog’s filesystem
command. It works. And it’s thorough. But it just takes a long time.
Here’s a snippet of Python if you’d like to try it out:
Note: There are other open-source apk
secret scanning tools - like APKLeaks and apkscan - but those tools also leverage external decompilers (like jadx
) and rely only on regex matching (without secret verification). Since validating secrets is core to our research at Truffle Security, we did not leverage those tools and instead created this simple Python script. But those tools also search for more than just secrets, so check them out!
The New Method
In adding support for apk
files to TruffleHog, we initially looked at external decompilers like jadx
and apktool
, but those would require users to install a third-party tool, which was a non-starter. Also, while those tools are powerful, we wanted something lighter-weight and faster.
So we went a different route. We researched the most common places secrets leak in Android applications (e.g., AndroidManifest.xml
, strings.xml
, asset files such as Javascript, and dex
files), identified Golang packages to parse just those files (dextk and apkparser), and then added logic to scan for secrets in those specific locations.
Admittedly, this approach will not conduct as thorough a scan as using an external decompiler and then invoking TruffleHog. Still, it will identify the vast majority of secrets in a fraction of the time. The tradeoff is worth it.
Below is detailed summary of all the file types TruffleHog apk
scanner will check for secrets.
XML
Android XML files are stored in a unique Android Binary XML format. Simply unzipping an apk
file and inspecting the .xml
files look like this:
Removing those special characters in red makes the text slightly more intelligible. However, there’s still an issue: Android stores XML attribute values (aka strings that could contain API keys and passwords) as reference IDs instead of the actual values.
For example, instead of storing the plaintext string Example AWS Key
, an Android XML file would store a reference to the strong like this:
The resource ID 7F0300b3
tells us where to find the value, but we need context about the application’s resources to retrieve the actual string value. Android provides that data in a special file named resources.arsc
.
By parsing the resources.arsc
file, we can build a ResourceTable to look up 7F0300b3 and discover the plaintext string (Example AWS Key
).
Avast’s open-source apkparser
package exposes a function called ParseXml()
that accepts both an XML file and a ResourceTable object. It uses the ResourceTable to resolve those resource IDs (in addition to reformatting the binary XML) to deliver a more complete picture of the original .xml
file
Example of secrets found in a reconstructed AndroidManifest.xml file.
Every xml
file that TruffleHog identifies in the unzipped apk
file is processed through this function with the appropriate ResourceTable context.
strings.xml
One of the xml
files most likely to contain a secret is named strings.xml
. It’s basically just a bunch of key/value pairs.
Sample strings.xlm file.
Unfortunately, reconstructing this file was challenging. When we unzipped the apk
files, we couldn’t find a file named strings.xml
. But we did find a workaround.
It turns out that the resources.arsc
file (discussed above) houses all key/value pairs from the strings.xml
file. In particular, those values are located in the resource ID range: 0x7f000000-0x7fffffff
. By iterating over every string
key/value pair in that resource ID range, we can reconstruct the strings.xml
file. It’s a hacky workaround, but we end up scanning the same data stored in the original strings.xml
file.
Dex
A dex
file contains compiled code (it’s where the Java or Kotlin source code is transformed into bytecode that runs on Android). Apk
files generally include at least one dex
file, usually named classes.dex
, but if the app is extensive or modular, there might be multiple dex
files—like classes2.dex
, classes3.dex
, etc.
This is what a dex
file looks like when you open it.
It’s chaos. It makes even less sense than the Android XML file. Fortunately, an awesome Golang package called dextk efficiently parses dex
files into “source code.” Emphasis on the quotes around “source code” since, technically, the output is a bunch of bytecode instructions and their values.
For example, this is the sample output provided on the tool’s README.
Source: https://github.com/csnewman/dextk#:~:text=%2C%20o)%0A%09%7D%0A%7D-,Output%3A,-android/support/v4
The first line above reads invoke-interface method=android/os/Parcelable$Creator:createFromParcel
. This translates to calling a function named createFromParcel()
. The source code is mostly there; it just looks funny.
In secret scanning, we primarily care about strings since that’s where developers store API keys and passwords. In dex
bytecode, strings are all referred to as const-string
instructions. While we can easily filter and grab the const-string
values, we have another problem.
TruffleHog scans for secrets using a technique called keyword pre-flighting. Each secret type (e.g., AWS, Stripe, etc.) has a keyword (e.g., AKIA
for AWS, sk_live
for Stripe, etc.), and we check that keyword is near a suspected secret.
To get the required keywords near suspected secrets, we parse several other dex
bytecode instruction types (ex: those that contain a class name or method name) and then place them near each suspected secret. It’s a little convoluted, but it ensures we can scan the decoded source code thoroughly. If you’re curious, the dex scanning logic is here.
Everything Else
All other files in the decompressed and unarchived apk
file are scanned normally (no special decoding). This allows TruffleHog to review many other target file types, like *.properties
, *.json
, and more. We found all kinds of files included as raw assets, including .git directories, sqlite databases, and more.
Limitations
Our implementation works super well. But it’s not perfect. There are a few limitations.
Incomplete file coverage. Our scanning is not as thorough as fully decompiling an
apk
and then scanning each file. There’s no way to do this fast enough (or without external dependencies). But we do search all of the usual locations secrets leak inapk
files. Unless a secret leaks in an uncommon location, we’ll find it.We don’t support scanning
Xapk
andapkm
files out-of-the-box. If youunzip
them and then point TruffleHog to that folder, it will scan the embeddedapk
files. The reasons are complicated, and we hope to address them soon—similarly, anyapk
files embedded in a zip must beunzip
ped first.Encrypted and Packed DEX files. We can’t decrypt and unpack
dex
files. But this is a limitation of other tools too. If you have an idea, we’re open to suggestions!
Special Thanks
Significant updates to TruffleHog’s open-source code base like this are a team effort. We’d like to thank Richard Gomez and Noman Shaikh for their feedback during development. We also thank Brandon Weeks for his expert review of our apk
scanning logic.