Web application source code, independent of languages and platforms, is a major source for vulnerabilities. One of the CSI surveys on vulnerability distribution suggests that 64% of the time, a vulnerability crops up due to programming errors and 36% of the time, due to configuration issues. According to IBM labs, there is a possibility of at least one security issue contained in every 1,500 lines of code. One of the challenges a security professional faces when assessing and auditing web applications is to identify vulnerabilities while simultaneously performing a source code review.
Several languages are popular for web applications, including Active Server Pages (ASP), PHP, and Java Server Pages (JSP). Every programmer has his own way of implementing and writing objects. Each of these languages has exposed several APIs and directives to make a programmer's life easy. Unfortunately, a programming language cannot offer any guarantee on security. It is the programmer's responsibility to ensure that his own code is secure against various attack vectors, some of which may be malicious in nature.
On the other side, it is imperative to get the developed code assessed from a security standpoint, externally or in-house, prior to deploying the code on production systems. It's impossible to use only one tool to determine vulnerabilities residing in the source code, given the customized nature of applications and the many ways in which programmers can code. Source code review requires a combination of tools and intellectual analysis to determine exposure. The source code may be voluminous, running into thousands or millions of lines in some cases. It is not possible to go through each line of code manually in a short time span. This is where tools come into play. A tool can only help in determining information; it is the intellect--with a security mindset--that must link this information together. This dual approach is the one normally advocated for a source code review.
To demonstrate automated review, I present a sample web application written in ASP.NET. I've produced a sample Python script as a tool for source code analysis. This approach can work to analyze any web application written in any language. It is also possible to write your own tool using any programming language.
I've divided my method for approaching a code review exercise into several logical steps with specific objectives:
Prior to commencing a code review exercise, you must understand the entire architecture and dependencies of the code. This understanding provides better overview and focus. One of the key objectives of this phase is to determine clear dependencies and to link them to the next phase. Figure 1 shows the overall architecture of a web shop in the case study under review.
Figure 1. Architecture for web application [webshop.example.com]
The application has several dependencies:
With this information in place, you are in a better position to understand the code. To reiterate, the entire application is coded in C# and is hosted on a web server running IIS. This is the target. The next step is to identify entry points to the application.
The objective of this phase is to identify entry points to the web application. A web application can be accessed from various sources (Figure 2). It is important to evaluate every source; each has an associated risk.
Figure 2. Web application entry points
These entry points provide information to an application. These values hit the database, LDAP servers, processing engines, and other components in the application. If these values are not guarded, they can open up potential vulnerabilities in the application. The relevant entry points are:
HTTP_REFERER, etc). The ASPX application consumes this data through the
Requestobject. During a code review exercise, look for this object's usage.
These are the important entry points to the application in the case study. It is possible to grab certain key patterns in the submitted data using regular expressions from multiple files to trace and analyze patterns.
scancode.py is a source code-scanning utility. It is simple Python script that automates the review process. This Python scanner has three functions with specific objectives:
scanfile function scans the entire file for specific security-related regex patterns:
".*.[Rr]equest.*[^\n]\n" # Look for request object calls ".*.select .*?[^\n]\n|.*.SqlCommand.*?[^\n]\n" # Look for SQL execution points ".*.FileStream .*?[^\n]\n|.*.StreamReader.*?[^\n]\n" # Look for file system access ".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n" # Look for cookie and session information "<!--.*?#include.*?-->" # Look for dependencies in the application ".*.[Rr]esponse.*[^\n]\n" # Look for response object calls ".*.write.*[^\n]\n" # Look for information going back to browser ".*catch.*[^\n]\n" # Look for exception handling
scan4requestfunction scans the file for entry points to the application using the ASP.NET
Requestobject. Essentially, it runs the pattern
scan4tracefunction helps analyze the traversal of a variable in the file. Pass the name of a variable to this function and get the list of lines where it is used. This function is the key to detecting application-level vulnerabilities.
Using the program is easy; it takes several switches to activate the previously described functions.
D:\PYTHON\scancode>scancode.py Cannot parse the option string correctly Usage: scancode -<flag> <file> <variable> flag -sG : Global match flag -sR : Entry points flag -t : Variable tracing Variable is only needed for -t option Examples: scancode.py -sG details.aspx scancode.py -sR details.aspx scancode.py -t details.aspx pro_id D:\PYTHON\scancode>
The scanner script first imports Python's regex module:
Importing this module makes it possible to run regular expressions against the target file:
p = re.compile(".*.[Rr]equest.*[^\n]\n")
This line defines a regular expression--in this case, a search for the
Request object. With this regex, the
match() method collects all possible instances of regex patterns in the file:
m = p.match(line)
Now use scancode.py to scan the details.aspx file for possible entry points in the target code. Use the
-sR switch to identify entry points. Running it on the details.aspx page produces the following results:
D:\PYTHON\scancode>scancode.py -sR details.aspx Request Object Entry: 22 : NameValueCollection nvc=Request.QueryString;
This is the entry point to the application, the place where the code stores
QueryString information into the
NameValue collection set.
Here is the function that grabs this information from the code:
def scan4request(file): infile = open(file,"r") s = infile.readlines() linenum = 0 print 'Request Object Entry:' for line in s: linenum += 1 p = re.compile(".*.[Rr]equest.*[^\n]\n") m = p.match(line) if m: print linenum,":",m.group()
The code snippet shows the file being opened and the
request object grabbed using a specific regex pattern. This same approach can capture all other entry points. For example, here's a snippet to identify cookie- and session-related entry points:
# Look for cookie and session management p = re.compile(".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n") m = p.match(line) if m: print 'Session Object Entry:' print linenum,":",m.group()
After locating these entry points to the application, you need to trace them and search for vulnerabilities.
Discovering entry points narrows the focus for threat mapping and vulnerability detection. An entry point is essential to a trace. It is important to unearth where this variable goes (execution flow) and its impact on the application.
The previous scan found a
Request object entry in the application:
22 : NameValueCollection nvc=Request.QueryString;
Running the script with the
-t option will help to trace the variables. (For full coverage, trace it right through to the end, using all possible iterations).
D:\PYTHON\scancode>scancode.py -t details.aspx nvc Tracing variable:nvc NameValueCollection nvc=Request.QueryString; String arr1=nvc.AllKeys; String sta2=nvc.GetValues(arr1);
This assigned a value from
sta2, so that also needs a trace:
D:\PYTHON\scancode>scancode.py -t details.aspx sta2 Tracing variable:sta2 String sta2=nvc.GetValues(arr1); pro_id=sta2;
Here's another iteration; tracing
D:\PYTHON\scancode>scancode.py -t details.aspx pro_id Tracing variable:pro_id String pro_id=""; pro_id=sta2; String qry="select * from items where product_id=" + pro_id; response.write(pro_id);
Finally, this is the end of the trace. This example has shown multiple traces of a single page, but it is possible to traverse multiple pages across the application. Figure 3 shows the complete output.
Figure 3. Vulnerability detection with tracing
As the source code and figure show, there is no validation of input in the source. There is a SQL injection vulnerability:
String qry="select * from items where product_id=" + pro_id;
The application accepts
pro_id and passes it as is to the
SELECT statement. It is possible to manipulate this statement and inject SQL payload.
Similarly, another line exposes a cross-site scripting (XSS) vulnerability:
Throwing back the (unvalidated)
-sG option executes the global search routine. This routine looks for file objects, cookies, exceptions, etc. Each has potential vulnerabilities, and this scan can help you to identify them and map them to the respective threats:
D:\shreeraj_docs\perlCR>scancode.py -sG details.aspx Dependencies: 13 : Request Object Entry: 22 : NameValueCollection nvc=Request.QueryString; SQL Object Entry: 49 : String qry="select * from items where product_id=" + pro_id; SQL Object Entry: 50 : SqlCommand mycmd=new SqlCommand(qry,conn); Response Object Entry: 116 : response.write(pro_id); XSS Check: 116 : response.write(pro_id); Exception handling: 122 : catch(Exception ex)
This code review approach takes minimal effort by detecting entry points, vulnerabilities, and variable tracing.
After you have identified a vulnerability, the next step is to mitigate the threat. There are various ways to do this, depending on your deployment. For example, it's possible to mitigate SQL injection by adding a rule to the web application firewall to bypass a certain set of characters such as single and double quotes. The best way to mitigate this issue is by applying secure coding practices--providing proper input validation before consuming the variable at the code level. At the SQL level, it is important to use either prepared statements or stored procedures to avoid SQL
SELECT statement injection. For mitigation of XSS vulnerabilities, it is imperative to filter out characters such as greater than (>) and less than (<) prior to serving any content to the end-client. These steps provide threat mitigation to the overall web application.
Code review is a very powerful tool for detecting vulnerabilities and getting to their actual source. This is the "whitebox" approach. Dependency determination, entry point identification, and threat mapping help detect vulnerability. All of these steps need architecture and code reviews. The nature of code is complex, so no single tool can meet all of your needs. As a professional, you need to write tools on the fly when doing code review and put them into action when the code base is very large. It is not feasible to go through each line of code.
In this scenario, one of the methods is to start with entry points, as discussed earlier in this article. You can build complex scripts or programs in any language to grab various patterns in voluminous source code and link them together. Tracing the variable or function is the key that can show up the entire traversal and greatly help in determining vulnerabilities.
Shreeraj Shah is the founder of Blueinfy, a company that provides application security services.
Return to O'Reilly SysAdmin
Copyright © 2009 O'Reilly Media, Inc.