Background and Literature Survey
In this chapter, a background is given for understanding SQL Injection, how it works and its results. Furthermore, a literature review is presented, the literature review covers various techniques for detection of SQL Injection.
2.1. SQL Injection
SQL Injection is a technique that exploits security vulnerability occurring in the database layer of the application [1]. To exploit SQL Injection vulnerability, attacker must have access to a parameter that web application passes through to the database. By appending the malicious SQL commands into the parameter, attacker will make the web application to send the malicious query to the database server and execute it. Embedding one programming or scripting language inside another is a main reason of a successful SQL injection attack. The query can be vulnerable when the inputs are improperly validated [2].
2.1.1. Example of SQL injection attack
The following login example illustrates a SQL Injection attack. A user can login into a web application using user name = “ahmad” and password =“provided_password”. The server script sends the following query to the database server: Select * from contacts where username= “ahmad” and password = “provided_password”. If the user exists, the query returns one or more rows. The script checks the number of rows in result. If the number is greater than zero, it allows the user to log in.
Now let us consider an attack scenario. A malicious user inserts ahmad‘ or 1=1–’ as the username, and not needed as the password. Therefore, the SQL query sent to the database server has the following structure: Select * from accounts where username=’ahmad’ or 1=1 –’ and password=‘notneeded’. — indicates the beginning of a comment in SQL. As a result; the database server omits the remaining portion of the SQL query after –. It executes the following query: “Select * from accounts where username=‘ahmad’ or 1=1”, where username=‘ahmad’ or 1=1 is a condition which is always true.
After executing the query, the database server returns the full dataset of accounts data table and a user can log in without any real username or password. Fig.1 and fig. 2 shows an illustration for a normal web process and another with SQLI.
Fig. 1. Normal web application process [3]
Fig. 2. Malicious input process [3]
2.2. Categories, Intent and Impact of SQL injection
SQL injection can be viewed and classified according to different perspectives, different intents and impacts. One should understand the real intent and impact of SQL injection attack to estimate the danger and influence of such attacks. The impact and intent differs from one attacker to another.
2.2.1. Main categories
SQL injection attacks can be divided into the three main categories of inband, out-of-band and inferential.
1- Inband: this is the simplest method. The extraction of information will occur using the same channel of attack. For example the list of users will appear in the current page.
2- Out-of-band: the extracted information is send back to the attacker using another channel. The email is an example for this type.
3- Inferential: Also known as Blind injection, which no data send back directly to the attacker. The behavior of the web application is understood through trying different attacks [2].
2.2.2. Impact of SQL Injection
SQL injection attack is accomplished by providing data (inclusion of SQL queries) from an external source which is further used to dynamically construct a SQL query. The impact and consequences of SQL injection attacks can be classified as follows [4]:
1- Confidentiality: Loss of confidentiality is a major problem with SQL Injection attacks. A consequence of successful SQL injection attack causes sensitive and critical information to be viewed by unauthorized users.
2- Integrity: Successful SQL injection attack allows external source to make unauthorized modifications such as altering or even deleting information from target databases.
3- Authentication: Improperly validation of user names and passwords allows unauthenticated attacker to connect to the affected database or application as an authenticated user. The attacker can connect without initial knowledge of the password or even user name.
4- Authorization: Successful abuse of SQL injection vulnerability, allows invader to change authorization information and gain elevated privileges if the authorization information is stored in the attacked database
It is very hard to detect the SQL injection prior to its impact. In most number of scenarios, unauthorized activity is performed by the attacker through valid user credentials. Malicious modifications of existing SQL Queries of web application, that are accessing critical sections of the affected databases, are executed using inherent features of database application.
2.2.3. Attack Intention
The attack intention of SQL injection differs according to the goal that the threat agent tries to achieve by performing a successful.
Identifying Inject-able Parameters [5]: these parameters are the parameters of the Web applications directly used by server-side program logic to construct SQL statements. Such parameters are vulnerable to SQLIA. In order to launch a successful attack, risk specialists should first find which parameters are vulnerable to SQL injection attack.
Performing database finger-printing [5]: The attacker must first find out the version and type of database deployed by a web application for a successful attack and then can execute malicious SQL input according to the acquired information.
Database finger-print is the information that identifies a specific type and version of database system. In order for an attack to be succeeded, the attacker must first find out the type of and version of database deployed by a web application, and then craft malicious SQL input accordingly.
Determining database schema [5]: The schema defines the structure of the database system; the tables, the fields in each table, and the relationships between fields and tables. Database schema is used by security agents to compose a correct subsequent attack in order to extract or modify data from database.
Bypassing Authentication [5]: Authentication is a mechanism employed by web application to assert whether a user is who he/she claimed to be. Matching a user name and a password stored in the database is the most common authentication mechanism for web applications. Bypassing authentication enables an attacker to impersonate another application user to gain un-authorized access.
Extracting Data [5]: In most of the cases, data used by web applications are highly sensitive and precious to threat agents. Attacks with intention of extracting data are the most common type of SQL injection attacks.
Adding or Modifying Data [5]: Adding or modifying data can provide many privileges to the hackers, for instance, a hacker can pay much less for an online purchase by altering the price of a product in the database.
Performing denial of service [13]: Denial of service can lead to block the database and the Web server. Attacks involving locking or dropping database tables also fall under this category.
Evading detection [5]: This category refers to certain attack techniques that are employed to avoid auditing and detection by system protection mechanisms.
Executing Remote Commands [5]: Remote commands are executable code resident on the compromised database server. Remote command execution allows an attacker to run arbitrary commands on the database. These commands can be stored procedures and functions accessible to database users.
Performing privilege escalation [5]: Privileges are described in a set of rights or permissions associated with users. Privilege escalation allows an attacker to gain un-authorized access to a particular asset by associating a higher privilege set of rights with a current user or impersonate a user who has higher privilege.
Downloading and Uploading File: Downloading files from a compromised database server enable an attacker to view file content stored on the server. If the target web application resides on the same host, sensitive data such as configuration information and source code will be disclosed too. Uploading files to a compromised database server enable an attacker to store any malicious code, like a Trojan or a worm, onto the server
2.3. Types of SQL injection
After discussing the impact and intent of SQL injection, it is importat to understand the different types of SQLI. These types are utilized to achieve the attack through SQL queries. The types of SQL injection is illustrated in many related surveys [6] [7] [5] [3] [8].
Fig. 3. Summary of SQLI types [3]
1- Tautologies
This type of SQL injection attack works by making the “WHERE” clause always true, And this will result in bypassing the condition inside the SQL statement. A SQL tautology is a statement that is always true. They also add inline comment signs to ignore the remaining part of the statement to achieve to the highest amount of the result in return with the lowest range of conditions [8]. The following example will illustrate the difference between normal query and injected one.
For example, consider the normal query to be:
“SELECT * FROM table_users WHERE username = ‘admin’ and password = ‘root'”.
After injection query is executed as:
” SELECT * FROM table_users WHERE username = ‘admin’ and password =” OR 1=1 –‘ “.
Fig. 4. Tautology example
2-Illegal/Logically Incorrect Queries
Collecting important information about the database schema and structure is the main target of this kind of attacks. When performing this attack, an attacker tries to inject statements that cause a syntax, type conversion, or logical error into the database.
Syntax errors can be used to identify injectable parameters [5].
For example, consider the following query: SELECT * FROM employee WHERE name = ‘ ’ UNION SELECT SUM(username) from users — ’ and password= ‘ ’ ;
The query tries to execute the column username from users table and it tries to convert the username column into integer, which is not a valid type conversion. Hence, the database server returns an error message which contains name of the database and information of the column field.
3- UNION Query
The UNION operator in SQL language is used to join two independent queries together. In union query, attacker uses “UNION” to extract data from other tables by injecting another select query and unioning that with original SQL statement. By using this method, attacker force the database to return result from extra tables other than the one defined in the legitimate SQL query.
For example, the original URL looks like:
http://www.example.com/news.php?newsid=378.
The URL is manipulated as:
http://www.example.com/news.php?newsid=378 UNION SELECT CreditcardNo, PinNo FROM CreditCardTable.
In result of the SQL injection the query will look alike below:
SELECT NewsTitle, NewsBody FROM News WHERE NewsID = ‘340’ UNION SELECT CreditcardNo, PinNo FROM CreditCardTable;.
Consequently database will return two columns. The content of these columns are join of the results from the first query and the second query.
4- Piggy-Backed Queries
In this type of attack, the attacker will inject an independent query and in result of a successful attack the second query will run after the first original query that already ran. The different of this attack with UNION attack is that the queries will not join each other but they are completely independent. This attack named piggy back because the secondary query will be sent to database under the cover of the first query.
The following query is an example for this type:
SELECT * FROM news WHERE year=’2013′ AND author=”; drop table users — ‘ AND type=public’.
5-Stored Procedures
Stored procedures are premade portion of SQL queries that are designed to do a specific task. Some of the database systems have their own pre-defined stored procedures for working with operating system. Poor written store procedures are also vulnerable to SQL injection attack and attacker can execute them to achieve his malicious goals. If the attacker can execute database predefined stored procedures, he also will be able to run commands on operating system of the server machine (Privilege escalation).
An example is considered as:
CREATE PROCEDURE DBO @userName varchar2, @pass varchar2, AS EXEC (“SELECT * FROM user WHERE id= ‘ “+@userName+”’ and password= ‘ “+@pass+’”); GO.
This scheme is also vulnerable to attacks such as piggybacked queries.
6- Inference
In this type of attack, attackers inject the SQL and observe the differences in return from the web application. This type of attacks target the well secured web applications where there is no usable feedback via error messages. In this situation, the attacker injects commands into the site and then observes how the function/response of the website changes. By carefully noting when the site behaves the same and when its behavior changes, the attacker can deduce not only whether certain parameters are vulnerable, but also additional information about the values in the database. Basically attack launched by asking questions [5]. There are two main attack technique categorized as Inference attacks, “Timing Attacks” and “Blind Injections”.
1- Timing Attacks
A timing attack lets attacker gather information from a database by observing timing delays in the database’s responses. This technique by using if-then statement cause the SQL engine to execute a long running query or a time delay statement depending on the logic injected. This attack is similar to blind injection and attacker can then
measure the time the page takes to load to determine if the injected statement is true. This technique uses an if-then statement for injecting queries. WAITFOR is a keyword along the branches, which causes the database to delay its response by a specified time.
For example, declare @ varchar (8000) select @s = db_name () if (ascii (substring (@s, 1, 1)) & (power (2, 0))) > 0 waitfor delay ‘0:0:5′. Database will pause for five seconds if the first bit of the first byte of the name of the current database is 1. Then code is then injected to generate a delay in response time when the condition is true. Also, attacker can ask a series of other questions about this character. As these examples show, the information is extracted from the database using a vulnerable parameter.
2- Blind Injections
In this type of attack, useful information for exploiting the backend database is collected by inferring from the replies of the page after questioning the server some true/false questions. It is very similar to a normal SQL Injection [14], [15]. However, when an attacker attempts to exploit an application, rather than getting a useful error message, they get a generic page specified by the developer instead. This makes exploiting a potential SQL Injection attack more difficult but not impossible. An attacker can still get access to sensitive data by asking a series of True and False questions through SQL statements.
For example, the orginal URL is:
http://victim/listproducts.asp?cat=books.
The resulted query for this URL is :
SELECT * from PRODUCTS WHERE category=’books’.
After manipulation, the Manipulated URL looks like:
http://victim/listproducts.asp?cat=books’ or ‘1’=’1.
The resulted query after manipulation looks like:
SELECT * from PRODUCTS WHERE category=’books’ or ‘1’=’1′.
7-Alternate Encodings
Alternate encoding is not an independent type of attack but it’s a technique that mostly used next to other SQL injection techniques to avoid security system of that web application or network infrastructure from detecting of the attack. In other words, it only used as a cover for other attacks to evade from Intrusion detection systems (IDS).
In the following example the pin field is injected with this string: “0; exec (0x73587574 64 5f177 6e), ” and the result query is: SELECT accounts FROM users WHERE login=” AND pin=0; exec (char (0x73687574646j776e)).
This example use the char () function and ASCII hexadecimal encoding. The char () function takes hexadecimal encoding of character(s) and returns the actual character(s). The stream of numbers in the second part of the injection is the ASCII hexadecimal encoding of the attack string. This encoded string is translated into the shutdown command by database when it is executed.
2.4. Taxonomy and Approaches for detecting SQL injection
Fig. 6. Approaches of Detection of SQLI
There are many approaches for detecting SQL injection. Researchers have formulated SQL injection as an information flow integrity problem.
1- Static analysis
Static analysis is a principle that depends on finding the weaknesses and malicious codes in the system source code prior to reaching the execution stage [10, 12]. Generally, this principle has been one of the most widely used to detect or prevent SQLIAs [9].
Static analysis techniques, such as flow-sensitive analysis, context sensitive analysis, alias analysis, and inter procedural dependency analysis, to identify input sources and data sinks (database access points) and check whether every flow from a source to a sink is subject to an input validation and/or input sanitization routine.
These approaches suffer from one or more of the following limitations: they do not precisely model the semantics of such routines, do not consider input validation using predicates, fail to specify vulnerability patterns, or require user intervention to state the taintedness of external or library functions that inputs pass through. All these limitations could result in false negatives or positives [10] .
2- Dynamic analysis
It is a technique which has been used to detect a specific type of attacks that should be identified in advance without the need of modifying neither the development lifecycle nor the need of the source code of the system. Such a technique depends on tracking the events of the system through its execution process and detects if there is any of attack that is happing while execution [9] .
Dynamic analysis, unlike static analysis, can locate vulnerabilities of SQL injection attacks without making any adjustments to web applications.
3- Static and dynamic analysis
In this type of analysis, different researches had chosen to combine the two aforementioned techniques to create a more effective and reliable solution to obtain a higher quality with a faster development and testing processes [9].
Recently, researchers have been exploring the use of static analysis in conjunction with runtime validation to detect instances of SQLIAs. Some researchers have proposed the use of parse trees to detect malicious user input, which requires a developer to manually modify new and existing code. Others have used an automaton construction technique to defend against SQLIAs [11].
4- Approaches like , black boxing, tainting
5- Machine learning and soft computing techniques
2.4.1. Static Analysis
A framework was proposed in [12] proposed an approach that uses a static analysis combined with automated reasoning. It is a static analysis framework that operates directly on the source code of the application to prevent tautology based SQLIAs. Static analysis first obtains all the queries that the program can generate. The framework then applies an algorithm on the produced automaton to check whether there is a tautology [13] . This technique confirms that the SQL queries generated in the application usually do not contain a tautology [14].
Advantages of this frawork :
It is effective for SQL injections that insert tautology in SQL queries.
Disadvantages of this framework:
1- It cannot detect other types of SQL injections attacks, it can detect and prevent only tautology based SQLIAs, which is only one of the many kinds of SQLIAs that our technique addresses.
2- It does not precisely handle some of the complex string operations, and its conservative assumptions might result in false positives [10].
JDBC Checker [15] statically checks the type correctness of dynamically generated SQL queries. JDBC Checker can detect SQL injection vulnerabilities caused by improper type checking of the user inputs.
Disadvantage: This technique would not catch more general forms of SQL injection attacks, because most of these attacks consist of syntactically correct and type-correct queries [14].
SAFELI-Static Analysis Framework for discovering SQL Injection approach [16] is intended to identify the SQL Injection attacks at the compile-time. SAFELI statically monitor the Microsoft Symbolic intermediate language byte code of an ASP.NET Web application, using symbolic execution. SAFELI can analyze the source code and will be able to identify delicate vulnerabilities that cannot be discovered by black-box vulnerability scanners. This static analysis tool has two main advantages. Firstly, it does a White-box Static Analysis and secondly, it uses a Hybrid-Constraint Solver. For the White-box Static Analysis, the proposed approach considers the byte-code and deals mainly with strings. For the Hybrid-Constraint Solver, the method implements an efficient string analysis tool which is able to deal with Boolean, integer and string variables [17] [13]. The main drawback of this technique is that this approach can discover the SQL injection attacks only on Microsoft based product.
2.4.2. Dynamic Analysis
CANDID [18] is a Dynamic Candidate Evaluations method for automatic prevention of SQLInjection attacks. This tool dynamically mines the programmer-intended query structure on any input and detects attacks by comparing it against the structure of the actual query issued. Hence, it solves the issue of manually modifying the application to create the prepared statements [17]. CANDID’s natural and simple approach ends up being effective for discovery of SQL injection attacks. Though this tool is shown to be powerful for some cases, it fails in many other cases. For example, it is inefficient when dealing with external functions [3].
2.4.3. Combined Static and Dynamic Analysis
SQLCheck [19] checks SQL queries at runtime to check if they follow a model of expected SQL queries. It checks whether the input queries conform to the
expected ones defined by the programmer. The model is expressed as a context-free grammar that only accepts legitimate queries. A secret key is used to discover user inputs in the SQL queries. Thus, the security of the approach relies on attackers not being able to discover the key. Additionally, this approach requires the application developer to rewrite code to manually insert the secret keys into dynamically generated SQL queries [14]. It is an efficient approach; however, once an attacker discovers the key, it becomes vulnerable. Furthermore, it also needs to be tested with online Web applications [3].
AMNESIA [20] is a model-based technique that combines the static and dynamic analyses for detection and prevention of SQLIAs that combines the static and dynamic analysis. In the static phase, AMNESIA uses a static analysis to build the models of the SQL queries that an application legally generates at each point of access to the database. In the dynamic phase, AMNESIA intercepts all the SQL queries before they are sent to the database and checks each query against the statically built models. Queries that violate the model are identified as SQL injection attacks. The accuracy of AMNESIA depends on that of the static analysis. Unfortunately, certain types of obfuscation codes and/or query generation techniques make this step less precise and results in both false positives and negatives.
SQLGuard [21] checks at runtime whether SQL queries conform to a model of the expected queries. SQLGuard tool detects SQLIAs during application runtime by comparing the parse tree of an intended SQL query before and after the inclusion of user supplied input [22] .
Disadvantages of this approach:
1- SQLGuard requires the application developer to rewrite code to use a special intermediate library [14].
2- The approach is ineffective if the user supplied input does not appear at the leaf of the tree.
SQLUnitGen [23] is the short form of “SQL Injection Testing Using Static and Dynamic Analysis” . Their solution use static analysis with unit testing to detect the effectiveness of SQL injection filter in an application. Static analysis is used to track user inputs to the point of query generation. The core of their tools is based on “JCracher”. They did some changes to create the test cases for the attack [22]. It is a very efficient approach; however, once an attacker discovers the key, it becomes vulnerable. Furthermore, it also needs to be tested with online Web applications.
2.4.4. Machine Learning and Soft Computing Techniques
Valeur et al. [24] have proposed an intrusion detection system capable of detecting a variety of SQL injection attacks. Their approach uses multiple statistical models to build profiles of normal access to the database. As with most learning-based anomaly detection techniques, the system requires a training phase prior to detection. The system contains a parser that processes each input SQL query and returns a sequence of tokens. Each token has a flag that indicates whether the token is a constant or not. A feature vector is thus created by extracting all tokens marked as constants. A profile is then a collection of statistical models and a mapping that dictates which features are associated with which models. The main problem of this technique besides the false positives and negatives is its execution and storage overhead, since multiple statistical models are maintained for each pair of template query and application.
In [25], the authors proposed a SQLI detection technique in adversarial environments by K-Centres. The number and the centers of the clusters in KC are adjusted according to unseen SQL statements in the adversarial environment, in which the types of attacks are changed after a period of time, to adapt different kinds of attacks. The experimental results show that their method has a satisfying result on the SQLIAs detection in the adversarial environment. The main drawback of their method is that it must receive a true label of each statement after classifying [26].
The concept of pattern classifiers to detect injection attacks and protect web applications is introduced in [27]. The system captures parameters of HTTP requests, and converts them into numeric attributes. Numeric attributes include the length of parameters and the number of keywords of parameters. Using these attributes, the system classifies the parameters by Bayesian classifier for judging whether parameters are injection patterns. A main drawback is that the system depends on limited types of features.
The major contribution of the work in [28] is the proposition of the correlation approach to SQLIA detection and the genetic algorithm applied to SQLIA detection task. In this work, the authors try to prove that correlating several sources of information (sensors) and then performing reasoning on the correlated information, allows improving results of cyber attacks detection. However, the complexity of correlating tools represents the main drawback of this algorithm.
The implementation of Artificial Neural Networks (ANN) as a biologically inspired computing in investigated in [29] to detect SQL injection attacks. In this research, Multilayer Feed forward Networks (MLN) was used that has the advantages of the ability to learn and store the empirical knowledge; the nonlinearity of the ANN; the ability to generalize the solutions; the ability to adapt when the context changes; the computational performance; and the massively parallel structure of the ANN. The limitation of this approach is that it is based on the appearance of certain SQL keywords together with suspicious characters; but it does not keep the relative order between them. For this reason, if a normal signature contains many keywords and suspicious characters that often appear together in an SQL injection, then it is highly likely to be misclassified although the order of the keywords is not as in a syntax correct SQL statement. However, it could be considered in the same time a successful approach to avoid evasion traps. Another related work for ANN based SQLI is introduced in [30] However, it depends on limited SQL patterns for training so it is susceptible to generate false positives.
The author in [31] used TF-IDF for weight calculation of tokens to evaluate the performance of three machine learning approaches among SVM, Naive-Bayes and K-NN. This method has low computation time complexity but susceptible to generating false positives [25]. Furthermore Gene Expression Programming (GEP) for detection of SQLI is discussed in [32]. At the beginning chromosomes are generated randomly. Next, in each iteration of GEP, a linear chromosome is expressed in the form of expression tree and executed. The fitness value is calculated and termination condition is checked. To preserve the best solution in a current iteration, the best individual goes to the next iteration without modifications. Next, programs are selected to the temporary population; they are subjected to genetic operators with some probability. New individuals in temporary population constitute current population. Classification accuracy received from GEP depicts great efficiency for SQL queries constituted from 10 to 15 tokens. For longer statements the averaged FP and FN equals to about 23%.
Among the approaches that are relevant to the proposed framework is the one proposing genetic algorithm for detection of SQLI in [33]. In this technique, levels of SQLI are detected using template matching. The ultimate goal of genetic algorithm is to optimize the matching rules of SQLI queue in template library. These rules can find attack sequence of relatively high of which attack intensions are obvious. These rules in the template library are in the form of IF (condition) THEN (execution); where conditions means the attack sequence matches the rule in template library (20 bit binary sequence matching). However, this algorithm relies on template sequence to define SQLIA. Therefore, the system fails to detect the attacks of different sequences not included in the template library.
2.4.5. Other techniques
• Hash technique
The hash value approach is adopted in [34] to improve the user authentication mechanism. They use the user name and password hash values. SQLIPA (SQL Injection Protector for Authentication) prototype was developed in order to test the framework. The username and password hash values are created and calculated at runtime for the first time the particular user account is created. Hash values are stored in the user account table. Though the proposed framework was tested on few sample data and had an overhead of 1.3 ms, it requires further improvement to reduce the overhead time. It also requires to be tested with larger amount of data [3].
• Parse Tree Validation Approach
Buehrer et al. [35] adopted the parse tree framework. They compared the parse tree of a particular statement and its original statement at runtime. The execution of the statement is stopped is a match is found. This method was tested on a student Web application using SQLGuard. Although this approach is efficient, it has two major drawbacks: additional overheard computation and listing of input only (black or white) [3].
• Reverse proxy and MD5 algorithm
Hidhaya et al. [36] developed a method to detect the SQL injection. It used a Reverse proxy and MD5 algorithm to check out SQL injection in user input. Using grammar expressions rules to check for SQL injection in URLs. The source code of the application is not changed and the detection and mitigation of the attack is fully automated. By increasing the number of proxy servers the web application can handle any number of requests without obvious delay in time and still can protect the application from SQL injection attack. In future work, the focus will be on optimization of the system and removing the vulnerable points in the application itself [6].
• Black Box Testing
Huang and colleagues [37] propose WAVES, a black-box technique for testing Web applications for SQL injection vulnerabilities. The technique uses a Web crawler to identify all points in a Web application that can be used to inject SQLIAs.
It then builds attacks that target such points based on a specified list
of patterns and attack techniques. WAVES then monitors the application’s response to the attacks and uses machine learning techniques to improve its attack methodology. This technique improves over most penetration-testing techniques by using machine learning approaches to guide its testing. However, like all black-box and penetration testing techniques, it cannot provide guarantees of completeness [37].
2.5. Conclusion
After studying different types and approaches of detection of SQL injection attacks, it is found that there is no optimal technique. There are still many problems with the existing approaches. Our proposed framework is paying attention to a subset of these problems. The performance and overhead on the system, the learning capability of injection types, flexibility of detection rules and their customization to different injection types and uncertainty and fuzziness of input data are all addressed in our solution.
2016-5-16-1463426677