O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


Detecting Web Application Security Vulnerabilities

by Shreeraj Shah
11/02/2006

Web Application Vulnerability Detection with Code Review

Web application source code, independent of languages and platforms, is a major source for vulnerabilities. One of the CSI surveys on vulnerability distribution suggests that 64% of the time, a vulnerability crops up due to programming errors and 36% of the time, due to configuration issues. According to IBM labs, there is a possibility of at least one security issue contained in every 1,500 lines of code. One of the challenges a security professional faces when assessing and auditing web applications is to identify vulnerabilities while simultaneously performing a source code review.

Problem Domain

Several languages are popular for web applications, including Active Server Pages (ASP), PHP, and Java Server Pages (JSP). Every programmer has his own way of implementing and writing objects. Each of these languages has exposed several APIs and directives to make a programmer's life easy. Unfortunately, a programming language cannot offer any guarantee on security. It is the programmer's responsibility to ensure that his own code is secure against various attack vectors, some of which may be malicious in nature.

On the other side, it is imperative to get the developed code assessed from a security standpoint, externally or in-house, prior to deploying the code on production systems. It's impossible to use only one tool to determine vulnerabilities residing in the source code, given the customized nature of applications and the many ways in which programmers can code. Source code review requires a combination of tools and intellectual analysis to determine exposure. The source code may be voluminous, running into thousands or millions of lines in some cases. It is not possible to go through each line of code manually in a short time span. This is where tools come into play. A tool can only help in determining information; it is the intellect--with a security mindset--that must link this information together. This dual approach is the one normally advocated for a source code review.

Assumption

To demonstrate automated review, I present a sample web application written in ASP.NET. I've produced a sample Python script as a tool for source code analysis. This approach can work to analyze any web application written in any language. It is also possible to write your own tool using any programming language.

Method and Approach

I've divided my method for approaching a code review exercise into several logical steps with specific objectives:

Dependency determination

Prior to commencing a code review exercise, you must understand the entire architecture and dependencies of the code. This understanding provides better overview and focus. One of the key objectives of this phase is to determine clear dependencies and to link them to the next phase. Figure 1 shows the overall architecture of a web shop in the case study under review.

architecture for the sample web application
Figure 1. Architecture for web application [webshop.example.com]

The application has several dependencies:

With this information in place, you are in a better position to understand the code. To reiterate, the entire application is coded in C# and is hosted on a web server running IIS. This is the target. The next step is to identify entry points to the application.

Entry point identification

The objective of this phase is to identify entry points to the web application. A web application can be accessed from various sources (Figure 2). It is important to evaluate every source; each has an associated risk.

web app entry points
Figure 2. Web application entry points

These entry points provide information to an application. These values hit the database, LDAP servers, processing engines, and other components in the application. If these values are not guarded, they can open up potential vulnerabilities in the application. The relevant entry points are:

These are the important entry points to the application in the case study. It is possible to grab certain key patterns in the submitted data using regular expressions from multiple files to trace and analyze patterns.

Scanning the code with Python

scancode.py is a source code-scanning utility. It is simple Python script that automates the review process. This Python scanner has three functions with specific objectives:

Using the program is easy; it takes several switches to activate the previously described functions.

D:\PYTHON\scancode>scancode.py
Cannot parse the option string correctly
Usage:
scancode -<flag> <file> <variable>
flag -sG : Global match
flag -sR : Entry points
flag -t  : Variable tracing
                  Variable is only needed for -t option

Examples:

scancode.py -sG details.aspx
scancode.py -sR details.aspx
scancode.py -t details.aspx pro_id

D:\PYTHON\scancode>

The scanner script first imports Python's regex module:

import re

Importing this module makes it possible to run regular expressions against the target file:

p = re.compile(".*.[Rr]equest.*[^\n]\n")

This line defines a regular expression--in this case, a search for the Request object. With this regex, the match() method collects all possible instances of regex patterns in the file:

m = p.match(line)

Looking for entry points

Now use scancode.py to scan the details.aspx file for possible entry points in the target code. Use the -sR switch to identify entry points. Running it on the details.aspx page produces the following results:

D:\PYTHON\scancode>scancode.py -sR details.aspx
Request Object Entry:
22 :    NameValueCollection nvc=Request.QueryString;

This is the entry point to the application, the place where the code stores QueryString information into the NameValue collection set.

Here is the function that grabs this information from the code:

def scan4request(file):
        infile = open(file,"r")
        s = infile.readlines()
        linenum = 0
        print 'Request Object Entry:'
        for line in s:                  
        linenum += 1
        p = re.compile(".*.[Rr]equest.*[^\n]\n")
        m = p.match(line)
        if m:                                   
                print linenum,":",m.group()

The code snippet shows the file being opened and the request object grabbed using a specific regex pattern. This same approach can capture all other entry points. For example, here's a snippet to identify cookie- and session-related entry points:

# Look for cookie and session management
        p = re.compile(".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n")
        m = p.match(line)
        if m:
                print 'Session Object Entry:'
                print linenum,":",m.group()

After locating these entry points to the application, you need to trace them and search for vulnerabilities.

Threat mapping and vulnerability detection

Discovering entry points narrows the focus for threat mapping and vulnerability detection. An entry point is essential to a trace. It is important to unearth where this variable goes (execution flow) and its impact on the application.

The previous scan found a Request object entry in the application:

22 :    NameValueCollection nvc=Request.QueryString;

Running the script with the -t option will help to trace the variables. (For full coverage, trace it right through to the end, using all possible iterations).

D:\PYTHON\scancode>scancode.py -t details.aspx nvc
Tracing variable:nvc
   NameValueCollection nvc=Request.QueryString;
   String[] arr1=nvc.AllKeys;
        String[] sta2=nvc.GetValues(arr1[0]);

This assigned a value from nvc to sta2, so that also needs a trace:

D:\PYTHON\scancode>scancode.py -t details.aspx sta2
Tracing variable:sta2
        String[] sta2=nvc.GetValues(arr1[0]);
        pro_id=sta2[0];

Here's another iteration; tracing pro_id:

D:\PYTHON\scancode>scancode.py -t details.aspx pro_id
Tracing variable:pro_id
   String pro_id="";
        pro_id=sta2[0];
   String qry="select * from items where product_id=" + pro_id;
   response.write(pro_id);

Finally, this is the end of the trace. This example has shown multiple traces of a single page, but it is possible to traverse multiple pages across the application. Figure 3 shows the complete output.

vulnerability detection with tracing
Figure 3. Vulnerability detection with tracing

As the source code and figure show, there is no validation of input in the source. There is a SQL injection vulnerability:

String qry="select * from items where product_id=" + pro_id;

The application accepts pro_id and passes it as is to the SELECT statement. It is possible to manipulate this statement and inject SQL payload.

Similarly, another line exposes a cross-site scripting (XSS) vulnerability:

response.write(pro_id);

Throwing back the (unvalidated) pro_id to the browser provides a position for an attacker to inject JavaScript to be executed in the victim's browser.

The scripts -sG option executes the global search routine. This routine looks for file objects, cookies, exceptions, etc. Each has potential vulnerabilities, and this scan can help you to identify them and map them to the respective threats:

D:\shreeraj_docs\perlCR>scancode.py -sG details.aspx
Dependencies:
13 : 

Request Object Entry:
22 :    NameValueCollection nvc=Request.QueryString;

SQL Object Entry:
49 :    String qry="select * from items where product_id=" + pro_id;

SQL Object Entry:
50 :    SqlCommand mycmd=new SqlCommand(qry,conn);

Response Object Entry:
116 :    response.write(pro_id);

XSS Check:
116 :    response.write(pro_id);

Exception handling:
122 :    catch(Exception ex)

This code review approach takes minimal effort by detecting entry points, vulnerabilities, and variable tracing.

Mitigation and Countermeasure

After you have identified a vulnerability, the next step is to mitigate the threat. There are various ways to do this, depending on your deployment. For example, it's possible to mitigate SQL injection by adding a rule to the web application firewall to bypass a certain set of characters such as single and double quotes. The best way to mitigate this issue is by applying secure coding practices--providing proper input validation before consuming the variable at the code level. At the SQL level, it is important to use either prepared statements or stored procedures to avoid SQL SELECT statement injection. For mitigation of XSS vulnerabilities, it is imperative to filter out characters such as greater than (>) and less than (<) prior to serving any content to the end-client. These steps provide threat mitigation to the overall web application.

Conclusion

Code review is a very powerful tool for detecting vulnerabilities and getting to their actual source. This is the "whitebox" approach. Dependency determination, entry point identification, and threat mapping help detect vulnerability. All of these steps need architecture and code reviews. The nature of code is complex, so no single tool can meet all of your needs. As a professional, you need to write tools on the fly when doing code review and put them into action when the code base is very large. It is not feasible to go through each line of code.

In this scenario, one of the methods is to start with entry points, as discussed earlier in this article. You can build complex scripts or programs in any language to grab various patterns in voluminous source code and link them together. Tracing the variable or function is the key that can show up the entire traversal and greatly help in determining vulnerabilities.

Exhibit 1 - scancode.py

Shreeraj Shah is the founder of Blueinfy, a company that provides application security services.


Return to O'Reilly SysAdmin

Copyright © 2009 O'Reilly Media, Inc.