Developers, beware of the tarpits for SAST in your...

LucaCompagna · ‎05-23-2022

Static application security testing (SAST) is a common essential step in the development lifecycle of large software companies like SAP. It enables detection of critical vulnerabilities in an application source code before deployment, when fixing the problem is the least expensive.

While SAST have many known limitations, the impact of coding style on their ability to discover vulnerabilities remained largely unexplored and the following questions emerge:

What does it mean when a SAST tool reports the green traffic light indicating that no vulnerability was detected?

Was the entire source code fully analyzed or were there any code areas left unexplored, leaving dangerous vulnerabilities under the carpet?

To answer these questions, we experimented with a combination of commercial and open source SAST scanners, and compiled a list of over 270 different code testability patterns capturing challenging code instructions---we refer to these as tarpits---that, when present, impede the ability of state-of-the-art SAST tools to analyze application code in PHP and/or JavaScript. In other words, a tarpit for SAST is just a set of code instructions that may confuse SAST tools in their analysis.

While we targeted the two most used web application languages, similar code patterns can be created for other programming languages and, very likely, similar results would be obtained.

By discovering the presence of these tarpits during the software development lifecycle, our approach can provide important feedback to developers about the testability of their code. It can also help them to better assess the residual risk that the code could still contain vulnerabilities even when static analyzers report no findings. Finally, our approach can also point to alternative ways to transform the code to increase its testability for SAST.

Our experiments show that testability tarpits are very common. For instance, an average PHP application contains over 21 of them and even the best state-of-art static analysis tools fail to analyze more than 20 consecutive instructions before encountering one of them.

To assess the impact of tarpit transformations over static analysis findings, we experimented with both manual and automated code transformations designed to replace a subset of patterns with equivalent, but more testable, code. These transformations allowed existing SAST tools to better understand and analyze the applications, and lead to the detection of 440 new potential vulnerabilities in 48 projects. We responsibly disclosed all these issues: 31 projects already answered confirming 182 vulnerabilities. Out of these confirmed issues-- that remained previously unknown due to the poor testability of the applications code-- there are 38 impacting popular Github projects (>1k stars), such as PHP Dzzoffice (3.3k), JS Docsify (19k), and JS Apexcharts (11k). 25 CVEs have been already published and we have others in-process.

You got a short summary of our work, hope you enjoyed. If you want to see some technical details just continue reading...

An example of tarpit for SAST

To illustrate what a tarpit for SAST is, let us consider the code example shown in the previous picture and here enlarged reported for simplicity.

That code is an excerpt of the Mantis Bug Tracking application found vulnerable to a File injection vulnerability in 2011, allowing remote attackers to include and execute arbitrary local files (more details in CVE-2011-3357).

Specifically, the file that is executed via the require_once instruction is dynamically defined by the value of the $act_file variable (line 20 in our example). This instruction is what in static analysis terminology is referred to as a sink, i.e., a dangerous instruction that must have clean (sanitized) data. Indeed, if an attacker can influence $act_file (the data processed by the dangerous operation), then she can influence the file that will be executed. By constructing the backward propagation from variable $act_file, we can see that it depends on $_POST[$name] and thus on whatever a user passes to the application via the HTTP POST method as value for the parameter $name:

In static analysis terminology the $_POST[$name] is referred to as a source i.e., locations in the program where data is being read from a potentially risky source. When a dataflow path exists between a source and a sink without a proper sanitization, then an injection attack may be built and the SAST tools should report it. For instance, if an attacker makes an HTTP POST request including action=<attacker_payload>, then the file that will be included and executed in line 20 will depends on <attacker_payload>. This is referred to a File injection attack.

Unfortunately, 4 over 6 of the SAST tools we tried in our experiments (this included two commercial tools and 4 state-of-the-art open-source ones), were not able to detect that vulnerability. Our hypothesis is that the call_user_func_array dynamic dispatching feature in line 12 is confusing those SAST tools that are missing the File injection vulnerability.

How can we validate this hypothesis so to evaluate if call_user_func_array as used in line 12 is indeed a tarpit for those SAST tools? Our idea is to craft a testcase for the SAST tools based on that tarpit. We will refer to these testcases as testability pattern instances.

Testability pattern instances share a common structure shown in this puzzle picture, where the tarpit is encapsulated between a source-sink dataflow vulnerable to a cross-site scripting (XSS, the most common injection vulnerability that SAST tools can detect). The idea is that if a SAST tool cannot detect the XSS, it must be because of the tarpit. The tarpit may require some additional companion code to be fully executable.

By concretizing this discussion on our example, the common skeleton (blue part in the puzzle) for a testability pattern instance would be:

$a = $_GET[“p1”]; // source

$b = $a // replace with tarpit!

echo $b; // sink

This code reads a parameter from an HTTP GET request and just print the parameter’s value in the web page without any sanitization. Notice that SAST tools not detecting the expected XSS on this trivial skeleton are just excluded.

By adding the call_user_func_array tarpit as in line 12 of our example, the testability pattern instance becomes:

function F($var) {

   return $var;

}

$a = $_GET[“p1”]; // source

$b = call_user_func_array(“F”, [$a]); // tarpit

echo $b; // sink

The part in bold indicates the tarpit and its code companion (the function). Running a SAST tool against this testability pattern instance amount to evaluate whether that tool gets confused or not by that usage of the call_user_func_array. Indeed, if the SAST tool does not report the expected XSS, then we can derive that the tarpit confuses the tool.

To further validate our hypothesis, we removed the tarpit via a simple refactoring of line 12 into:

$r = gpc_get($args); // no tarpit anymore

Indeed, line 12 was making use of a dynamic feature of PHP, even if the function to be called was hardcoded and known already at static time. This simple refactoring was sufficient to remove the tarpit and to enable the SAST tools to detect the File injection vulnerability, confirming the tarpit was indeed at line 12 and only there.

What to do with tarpits for SAST

You do not do much with one or few pattern instances. You can just claim that a SAST tool does not support these and those tarpits. However, when you start creating many of them, trying to be comprehensive with respect to a programming language, then you can do very interesting things.

We reviewed the documentation, the internal specifications, and the APIs of both PHP and JS and distilled this information into hundreds of potential tarpits that emphasize different functionalities. We then embedded these tarpits into testability pattern instances as the one we illustrated above. For instance, 6 pattern instances were created just to capture different variants of the call_user_func_array (e.g., a variant where the first parameter is not hardcoded, but is rather a variable; another where that parameter is a variable concatenated with a constant string; etc). Similar pattern instances are further clustered into a testability pattern that provides an overall textual description of the tarpits that are captured and simplify their presentation to end-users.

Now that we have all these testability pattern instances, capturing many tarpits and covering a significant spectrum of the targeted programming language, we aim to perform three key activities:

Measurement: evaluate SAST tools against our pattern instances

Discovery: make developers aware of the tarpits in their code via automated discovery rules

Mitigation: make apps more testable for SAST by removing tarpits via transformations or improve the SAST tool

Measurement of SAST tools

As mentioned, each testability pattern instance is like a testcase for SAST to determine whether the SAST tool support or not the tarpit capture in the instance. We tested all our pattern instances against a set of commercial and open sources SAST tools to identify the tarpits that could impede the testability of an application for each of these tools. We used 6 SAST tools for PHP: 2 commercial and 4 open-sources ones (RIPS [5], PHPsafe [6], WAP [7], and Progpilot [8]). Similarly, we used 5 SAST tools for JS: 3 commercial and 2 open-source ones (LGTM [4] and NodeJSScan [3]).

The detailed results are presented in the graphs below and we refer the interested reader to our technical report [1] for more details. Here we focus only on the overall score (see the bars in blue labelled as “All”). The best commercial tools were only able to handle 50% of the PHP and 60% of the JS tarpits, thus potentially leaving large parts of an application code unexplored.

SAST measurement over PHP tarpits

SAST measurement over JS tarpits

Our testability pattern instances are available for the community and SAST tools’ owners can thus use them to measure the progresses of their tools against tarpits and to improve their support rate over time.

Discovery: make developer aware of SAST tarpits

Measuring SAST tools against tarpits is good, as long as those tarpits are used in the real world. If they aren’t, the fact that a SAST tool does not support them is less impactful. To evaluate the impact on those unsupported tarpits, we implemented automated discovery rules for all our PHP patterns and used them to scan 3341 open-source PHP applications borrowed from the following four datasets:

GH: 1000 applications with high popularity in Github (more than 1000 stars)

GM: 1000 applications with medium popularity in Github (between 200 and 700 stars)

GL: 1000 applications with low popularity in Github (between 20 and 70 stars)

SC: all the 341 applications from Sourcecodester [9], which hosts open-source PHP projects that serve as references to other developers that want to implement their websites

The results, shown in the graph below, demonstrate that the prevalence of our tarpits is very high in the real world. The horizontal axis indicates how many pattern instances per line of code were discovered. The vertical axis indicates how many of the discovery rules created for our pattern instances returned tarpit occurrences in an application. The average project contains 21 different tarpits and even the best SAST tool cannot process more than 20 consecutive instructions without encountering a tarpit that prevents it from correctly analyzing the code. Again, we redirect the interested reader to access our technical report [1] for more details about the discovery rules and the prevalence analysis of our tarpits.

Prevalence of PHP tarpits

The ability to automatically discover each tarpit brings many benefits. It can provide immediate and precise feedback to the developers about the tarpits in their code (e.g., by integrating the discovery rules into an IDE). This information can then be used to make an informed decision about which combination of SAST tools are better suited to analyze the code, which parts of the application are blind spots for a static analyzer and thus may require a more extensive code review process, and which region of code could be refactored into more testable alternatives.

Mitigation: make apps more testable for SAST or improve SAST tools

Experimental results, both from measurement and from discovery, show that our tarpits are problematic for SAST tools and that they are prevalent in the real world. All in all, the testability for SAST, intended as how good are SAST tools to test applications, is problematic. This is further demonstrated by the outcomes that emerged in our additional experiments (see below).

How can this problem be mitigated? Two options can be envisaged:

Improve SAST tools

Make applications more testable for SAST

Indeed, owners of SAST tools can use our publicly available libraries of testability patterns for SAST [2] (we are currently enriching these libraries so if you plan to use them get in touch with us if you want to use the latest version) to determine which tarpits are not supported and so to improve the tool in forthcoming releases to increase the support rate. The libraries could be used to monitor the progress of SAST tools toward tarpits. Since we are not owner of a SAST tool we did not explore this option.

In our research, we explored the second option that is more interesting in the context of software company like SAP. In doing so, we also achieved very good results that demonstrate that we can increase testability for SAST and detect more vulnerabilities.

Make applications more testable for SAST

We performed two experiments to assess the use of code refactoring as a mean to make an application more testable for SAST tools. In the first, we manually investigate five PHP and five JS applications, for which SAST tools were unable to discover the presence of known vulnerabilities. By transforming (manually) the testability tarpits in those applications we enabled the tools to detect the vulnerabilities. Moreover, over 200 additional bugs were reported, leading us to the disclosure of 71 confirmed vulnerabilities, as some of the discovered issues still applied to the latest version of the tested projects. In the second experiment, we target instead thousands of popular real-world applications (the same we used for the prevalence experiment), to which we apply five pattern transformations in a fully automated fashion. Our tool modified 1170 applications, by transforming 32,192 occurrences of the five tarpits. By running SAST tools both before and after the transformations we could observe the improvement in the overall testability, supported by the detection of ~9000 new findings over which we inspected ~2700 entries uncovering hundreds of previously unknown vulnerabilities. In particular, we discovered 370 vulnerabilities in 43 different applications, 55 of which affected very popular projects with more than 1000 stars in Github. We responsibly disclosed all issues, and we have received 111 confirmations from the development teams (36 confirmations for the popular projects). These outcomes confirm the added value of our approach and the impact of removing tarpits to increase testability for SAST tools.

More details about these two transformation experiments are available in our technical report [1].

Transformation of testability tarpits is a very interesting and challenging research topic. Clearly not all the tarpits can be automatically transformed by preserving the semantic of the program. Sometimes we can transform losing the semantic but ensuring that if the original program was vulnerable, then also the transformed program, easier to test, is vulnerable (transformation that preserves the vulnerability). In all the other cases, automated transformations would be impossible without some help from the development team of the application.

Take-away messages

SAST tools are subject to testability issues that may prevent them from detecting important vulnerabilities. Just accepting a green light from the SAST tool without knowing what fragments of the application were analyzed may just hide vulnerabilities under the carpet.

By devising measurable, discoverable, and possibly transformable tarpits for SAST we can get higher awareness of what a SAST tool is analyzing and even improve the testability for SAST by acting on the SAST tool itself or on the application code.

References

[1] Feras Al Kassar, Giulia Clerici, Luca Compagna, Davide Balzarotti, Fabian Yamaguchi. Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications. NDSS 2022. https://www.ndss-symposium.org/wp-content/uploads/2022-150-paper.pdf.

[2] Our libraries of testability patterns. https://github.com/enferas/TestabilityTarpits.

[3] LGTM. https://lgtm.com/, Accessed January 17, 2022. Artifact: LGTM v1.27.0.

[4] Ajin Abraham. NodeJSScan. Https://ajinabraham.github.io/nodejsscan/, Accessed January 17, 2022. Artifact: NodeJSScan v4.5.

[5] Johannes Dahse and Thorsten Holz. Simulation of built-in PHP features for precise static code analysis. NDSS 2014. Artifact: RIPS v0.55.

[6] Paulo Jorge Costa Nunes, Jose´ Fonseca, and Marco Vieira. PHPsafe: A security analysis tool for OOP web application plugins. DNS 2015. Artifact: PHPSafe version for DSN2015.

[7] OWASP. OWASP WAP – Web Application Protection Project. https://securityonline.info/owasp-wap-web-application-protection-project/. Artifact: WAP v2.1.

[8] Progpilot. Progpilot - A static analyzer for security purposes. https://github.com/designsecurity/progpilot. Artifact: Progpilot v0.7.

[9] Sourcecodester Website. Sourcecodester - free source codes. https://www.sourcecodester.com/php-project, Accessed January 17, 2022.

Contact and credits

Discover how SAP Security Research serves as a security thought leader at SAP, continuously transforming SAP by improving security.

Contact for further information:

Luca Compagna, research expert at SAP Security Research, luca.compagna

Joint work with: Feras Al-Kassar (PhD at SAP), Giulia Clerici (former intern at SAP), Fabian Yamaguchi (Chief Scienstist at ShiftLeft Inc), Prof. Davide Balzarotti (EURECOM),