Security static analysis tools generally seem to throw a ton of false positives. Did you know there’s a reason for that, that’s rooted in math?
86 years ago Alan Turing proved it’s impossible to look at a program’s source code and know for certain what it’s going to do in runtime.
We now refer to this problem as the halting problem, but interestingly in Alan’s original paper he didn’t describe whether a program would halt or not, he described it an uncertainty in what a program will print out:
…Similarly [if] there is a general process for determining whether U- prints 1 infinitely often. By a combination of these processes we have a process for determining whether. U prints an infinity of figures, i.e. we have a process for determining whether M is circle-free. There can therefore be no machine i .
Interestingly, this proof is usually taught in a philosophical computer science context, but I always found it interesting it’s rarely applied to practical computer security problem solving.
Determining the output of a computer program based on its source code is indeed at the heart of many security problems; unsafely outputting user input is the direct cause of XSS, SQLi, and many other security vulnerabilities.
We now have an entire industry set out to do the impossible: find every security vulnerability in an application based on its source code. But if we already know this is impossible based on Alan’s 1936 proof, how do these programs behave in the real world?
Static analysis tools by definition must at times be uncertain about whether a security vulnerability is present. This leads the programs to make a best guess, and guessing wrong leads us to one of two possibilities: a false positive or a false negative.
This outcome is mathematically and provably, unpreventable.
So what’s a better outcome? A program that sometimes alerts on not real vulnerabilities (false positives) or a program that misses real vulnerabilities (false negative)?
Naively one may want to lean into false positives, and see all “potential” vulnerabilities. This is especially true with particular bug classes we want to squash, but trying to strike a balance between false positives and false negatives creates a scaling problem. A static analysis tool with rules that throw false positives 1% of the time, will start to throw false positives 90% of the time after it gets its 230th rule. This rapidly leads to alert fatigue and developers turning off or ignoring the output of the scanner.
For the above reasons, we believe the best approach to static analysis is not to strike a balance between false positives and false negatives, but rather to become intolerant to false positives in any form, and embrace false negatives as the mathematical certainty Alan Turing proved to be.
How do we eliminate false positives? Through automating the triage process with dynamic testing.
Imagine a security engineer read a line of code they thought was vulnerable, the first thing they’d do is test that code out to see if they can hit the vulnerability in runtime. Once they trigger the vulnerability, the tester becomes confident the issue is a true positive result.
Similarly, at Truffle Security when we find keys, we test them at runtime. For all currently supported >700 key detectors we test to see if they can authenticate successfully.
People often ask us if we support detecting things like PII or other types of data that can’t be dynamically triaged and confirmed in runtime. Our answer is no, because the introduction of false positives to the engine, from not being able to auto-triage the results, is not worth the inevitable scaling issues and alert fatigue.
For a more detailed proof of the halting problem, as well as an interview with the author of Bandit, one of the most popular Python static analysis tools, check out the video above.