Module 3: Challenges


CSE 365 - Spring 2025.

Now the pace ramps up... The first of this week's two modules is Welcome to the Web. This module is a one week module, which means that both the checkpoint and the full challenges are due at the end of the week.

And, again, we're covering two modules, though both are introductory:

  1. We'll learn about web communications themselves in a refreshed Talking Web.
  2. Then we'll move on to learning about SQL, one of the technologies behind databses, in Studying SQL.

NOTE: There will be two more challenges that launch in Talking Web by the end of Monday. Check back in Tuesday morning to see if there are any new ones you still need to do!

PLEASE be careful about your time management this week. This entire module, plus the Web Security checkpoint, are both due at the end of the week. Don't delay!


NEED HELP? The official way to get help is via our discord! Start the challenge that you need help with, and then use the /help command! That will get you pointed in the right direction.

QUESTIONS ON GRADING / DUE DATES? Check the grades page!


Challenges

Obviously, as you're accessing this website in your web browser, this isn't your first HTTP request. But it's your first HTTP request for a pwn.college challenge! Run /challenge/server, fire up Firefox in the dojo workspace (you'll need to use the GUI Desktop for this!), and visit the URL that it's listening on for the flag!

Awesome, you got the hang of the basic process. There's one more thing you need to do, though: you must read and understand the source code of the challenge! Web servers route HTTP requests to different endpoints: http://challenge.localhost/pwn might go to the endpoint that handles the request path /pwn, and http://challenge.localhost/college might go to the endpoint that handles the request path college. This challenge has a randomly-chosen endpoint name. You must read the code in /challenge/server, understand it, and figure out which endpoint to visit in the browser!


Confused? Our web servers are implemented using the flask library. Read their documentation to build up understanding of the code, or experiment with it!

HTTP is the HyperText Transfer Protocol. HyperText, named in the techno-optimism of the late 20th century, is text that carries additional data regarding how it should be understood, not just what it means. In modern times, this is done through a variety of means: HTTP is used to transport lots of different types of resources, and your web browser combines them to construct the websites that you see and interact with. The oldest of these is the HyperText Markup Language, or HTML.

HTML describes, in a way that the browser can interpret, the elements that should (initially) appear on a web page. We'll dive into HTML subtleties in later modules, but here, we'll practice piercing the veil of the website and looking at the HTML behind it all. You'll need to, as before, find the endpoint and access it in the in-dojo browser. However, the HTML sent over will hide the flag. You'll need to figure out how to view the Page Source of the HTML, rather than the rendered result, to access this hidden data.


HINT: click Firefox's Hamburger menu (≡), then go to More Tools.

HTTP facilitates the transfer of both data (e.g., the HTML that /challenge/server sends you) and metadata (data about the data). The latter is sent via headers: fields in an HTTP request or response that give additional instructions to the server or browser. In this case, the flag is in a header. Can you find it?


HINT: you can inspect headers using Firefox's Web Developer Tools (≡, then More Tools). The Network tab of the tools displays all of the HTTP connections (you might need to reload the page after opening the Web Developer Tools for the connection to show up). Each of these connections has a Headers subtab, which shows headers that your browser sent alongside its request (Request Headers) and the headers that it received alongside the response (Response Headers). Find the flag header there!

You've learned how to HTTP (though, of course, you've probably been HTTPing for most of your life!). Now, let's learn how to really HTTP. The HTTP protocol itself, as in the exact data that is sent over the network, is actually surprisingly human-readable and human-writable. In this challenge, you'll learn to write it. This challenge requires you to use a program called "netcat" (command name: nc), which is a simple program that communicates over a network connection. Netcat's basic usage involves two arguments: the hostname (where the server is listening on, such as www.google.com for Google), and the port (the standard HTTP port is 80).

When it starts up, netcat connects to the server and gives you a raw channel to communicate with it. You'll be talking directly with the web server, with no intermediary! How cool is that?

Recall the lectures, find the format of an HTTP request, and make a GET request to the / endpoint (we'll do more endpoints later) to get the flag!


HINT: Can't tell if netcat is connecting or not? Use the -v flag to turn on some verbosity!

HINT: Typed your GET request and nothing happens after you hit Enter? HTTP requests are terminated by two newlines. Try hitting Enter again!

A thought... Until this moment, have you ever truly HTTPed?

Okay, you got the basics of netcat down. Now make a GET request to a specific path! As always, check the /challenge/server code to understand more.

Next, we'll practice making HTTP requests with one of the most common commandline tools for HTTP: curl. Unlike netcat, curl is made specifically for HTTP, and you don't have to write raw HTTP commands. Instead, you must learn to the right program options to achieve what you want. Here, you must simply make a GET request to the right endpoint!

Finally, we'll learn the fourth tool in our HTTP toolbox: Python's requests library. This, along with the browser, will likely be the two most heavily used tools in your HTTP toolbox. Requests lets you script complex web interactions, and this will be necessary to pull off tricky hacks later. For now, things are simple: pull up Python, import requests, and GET the flag!

Unfortunately, most of the modern internet runs on the infrastructure of a handful of companies, and a given server run by these companies might be responsible for serving up websites for dozens of different domain names. How does the server decide which website to serve? The Host header.

The Host header is a request header sent by the client (e.g., browser, curl, etc), typically equal to the domain name entered in the HTTP request. When you go to https://pwn.college, your browser automatically sets the Host header to pwn.college, and thus our server knows to give you the pwn.college website, rather than something else.

Until now, the challenges you've been interacting with have been Host-agnostic. Now they start checking. Set the right Host header and get the flag!

Now, let's learn to set the Host header in curl! Read its man page to figure out how to set headers.

And, finally, you can learn how Hosts are actually sent over the network in netcat. This might be a bit trickier. You can actually use curl as a source of information here! Curl's -v option causes it to print out the exact headers it's sending over (and the ones it receives!). Observe it, copy that with netcat, and get the flag!

Recall how HTTP requests contain fields separated by spaces? For example: GET /solve HTTP/1.1. What if the path (e.g., instead of /solve) has spaces inside it? This is a reasonable thing to happen, as these paths often reference directories, and directories may have spaces in their names!

Left to their own devices, spaces would mess up the HTTP request. Consider an HTTP server trying to make sense of GET /solve my challenge HTTP/1.1. A clever server might be able to deal with it, but it's not impossible that a version that simply reads one word at a time would read my instead of HTTP/1.1 and panic!

To avoid such situations, URLs are encoded using URL Encoding. This is a simple encoding compared to what you've seen in Dealing with Data. Any tricky characters (such as spaces) are simply hex-encoded, with a % plopped in front of them. Of course, because % thus becomes a tricky character in itself, it must also be encoded. In the above example, /solve my challenge would become /solve%20my%20challenge, as the hex value of the ASCII space character is 0x20.

Anyways, now we'll practice. We stuck spaces in the endpoints. Can you still get the flag?


INFO: You'll find that you need to encode URLs with curl as well (though we won't make you jump through that hoop), in the exact same way. Python's requests library, however, will automatically urlencode things for you. So useful!

Like a function call in a programming language or a command execution on the shell, HTTP requests can include parameters. GET requests send parameters alongside the path in the URL, in a part of the URL called the Query String. In this challenge, you'll learn how to craft this query string. Read the challenge source to understand what parameter you need, and send it over! You can use any client you want: the process is basically the same in all of them.


SECURITY NOTE: It's tempting to think of HTTP parameters as similar to parameters to a function call. However, keep in mind: when you're writing C or Python or Java code, an attacker (typically) can't just call random functions in your program with random parameters. But with HTTP, they can. They can just make HTTP requests wherever they want! This has caused quote a few security issues...

Of course, you can pass in multiple parameters; you just need to separate each of them with &: what=pwn&where=college. Try it now, in netcat.

Specifying multiple HTTP parameters in curl is a bit of a special case, because & means something special in the shell (it launches a command in the background), and if you're not careful, the shell will trip over your &! Make sure to put the whole URL, including the query string, in quotes to avoid this situation. Try that now.

HTTP GET requests are typically used for retrieving data, and the parameters typically represent data identifiers and various customizations for its retrieval and display. Storying data is usually done with an HTTP POST request. In the old days, POST requests typically resulted from people filling out and submitting HTML forms. This still occurs, but there are plenty of other ways POST requests are created (some of which we'll cover later).

For now, let's practice the oldie and goodie. http://challenge.localhost has a form for you. Fill it out in the browser and submit it for the flag!

Now, let's try this with curl. Look at the man page to figure out how to make a post request (HINT: the most relevant option is -d).


NOTE: Remember what we said about attackers being able to trigger whatever HTTP requests they wanted? Note how this challenge doesn't even have any functionality to make the form, but you can still hit it with the POST request!

Now, we try this with netcat. This is MUCH harder, and is somewhat archaic for historical reasons. Check out the simplest URL-encoded form submission example from Mozilla, and adapt it for your usecase.

Now let's try this with requests! Look at the documentation to find out how to do this.

Now, try to make your browser do a POST request without the website providing a form. Hint: can bring your own form to the table?

Let's play around with multiple form fields!

... and with netcat!

Sometimes, resources on the web move. A website might get redesigned, we might rename a pwn.college module, etc. In these (and other!) cases, the webserver can redirect clients to the new URL. This is done via a special HTTP request,a s you'll discover here. Can you still find the flag?

Now, let's try curl. Curl has a very useful commandline option to automatically follow redirects. It's -L. Try it out, and see how easy this becomes!

And now, Python. Python's requests library automatically follows redirects, so this should be quite easy. Give it a try!

Include a cookie from HTTP response using curl

Include a cookie from HTTP response using nc

Include a cookie from HTTP response using python


HINT: If you aren't already using it, check out requests.Session!

Make multiple requests in response to stateful HTTP responses using python

You've been staring at web server code all this time and figuring out how to speak to it. Now, let's learn to listen.

In this level, you will write a simple server that'll receive the request for the flag! Simply copy the server code from, say, the very first module, remove anything extra, and build a web server that'll listen on port 1337 (instead of 80 --- you can't listen on port 80 as a non-administrative user) and on hostname localhost. When you're ready, run /challenge/client, and it will launch an internal web browser and visit http://localhost:1337/ with the flag!

You've followed redirects --- now make one happen! Have your webserver redirect /challenge/client to the right location in /challenge/server. You'll need three terminal windows for this:

  1. The first terminal window runs /challenge/server, which listens on port 80 and prepares to give the flag.
  2. The second terminal window runs your webserver implementation, which listens on port 1337 and prepares to redirect the client.
  3. The third terminal window runs /challenge/client.

It's complex, but you can do it!

In the beginning of the web, HTML, though Hyper, was pretty static. It described its layouts, and that was it. Sometime in the 1990s, the movers and shakers of the internet thought "What if web pages could execute logic?", and JavaScript was born.

JavaScript is a programming language that allows web pages to dynamically make decisions and carry out actions. It is, hands down (and unfortunately, because it's terrible) the most important programming language out there (though luckily not the most used), and try as we might to avoid it (did we mention that it's terrible), we have to account for it in any discussion of web security.

HTML specifies JavaScript to be executed through the <script> tag. This tag tells the browser that what is inside that tag is JavaScript, and the browser executes it. There are many resources online for how to write script tags, and how to write JavaScript, and we'll leave their finding as an exercise for you, the learner. Here, we'll practice something very specific: using JavaScript to redirect the browser to a different web page.

As previously, the client browser will print out the page it receives, but it will start by going to http://challenge.localhost/~hacker/solve.html. Here, we harken back to the olden days of shared servers: http://challenge.localhost/~hacker/anything will be served out of the public_html subdirectory of your home directory! Create a /home/hacker/public_html/solve.html, write the JavaScript you need to redirect the browser, and get the flag!


HINT: The JavaScript object you want is window.location. You can assign a string to it to redirect the browser to a new location.

HINT: Debugging this can be tricky with the built-in browser. Try it using the dojo's Firefox! You can't get the final flag with it, but you can at least tell if your redirect is working!

JavaScript can do many things in the context of the web page, and can, thus, lead to unexpected situations and security compromises. You'll explore some of these situations in the Web Security module, but we'll lay the groundwork here.

In this level, /challenge/client will no longer print the web page, and /challenge/server will not serve up an HTML page of the flag, but a JavaScript script that sets a global flag variable to the value of the flag. You'll need to make a web page to include this script in your page (we'll leave it up to you to find the documentation for this --- hint: src is involved) and then create another script to somehow exfiltrate this information. Exfiltration is the art of smuggling sensitive data out right under the nose of its owners: in this case, /challenge/client and /challenge/server. Your JavaScript running on your page, of course, has acess to the flag variable, but you'll need to somehow communicate it out to the world. This can be done in a few different ways, but probably the easiest is to redirect (using your window.location trick from before!) the client browser to a URL that contains the flag (similar to how the client leaked it to you a few levels ago), and have that request go to somewhere where you can see the URL log (such as the log of /challenge/server or your own webserver!).

This sounds like a lot, but it's eminently doable. Our reference HTML solution file is just 150 bytes long! As before, remember: you can debug your solutions using your own browser (and can run it as root in practice mode to be able to include the flag script!).

Now, the hard part begins... Oftentimes, what you need to exfiltrate is other data accessible to your JavaScript on the website, but you often have to make HTTP requests to retrieve it. In modern JavaScript, HTTP requests are made using the fetch() function. It works roughly as follows:

fetch("http://google.com").then(response => response.text()).then(website_content => ???);

The ???, of course, is the code that you want to execute on the website contents. This API looks so absolutely insane because JavaScript is insane, but also because it actually has a hard problem to solve. It has to execute logic in an environment where network errors, CPU load, laptop suspending and resuming, firewalls, and other crazy things can interfere with the loading and operation of the resources that it depends on. The above code uses JavaScript "promises", which is a complex programming pattern that lets you write logic that will be executed on data that is not yet available, when that data finally does become available. The .then() is the part of the promise that specifies what will be eventually executed. Here, the flow is roughly as follows:

  1. fetch() returns a promise and starts to fetch http://google.com. This might take a while, might never succeed, or might succeed immediately. At any rate, it initially returns a promise object that has a then() member function that will run when the response is available.
  2. The response becomes available and the promised code executes. This code takes the promised response and calls response.text(), which retrieves the full text contents returned by http://google.com. Because this might take a while to load fully, this also returns a promise, and that promise also has a .then() method that we can specify code for.
  3. Finally, all the content is available and our final promised code runs. This can be anything, but for most of our purposes, this is where we exfiltrate our data like you did in previous challenges.

This can be insanely hard to understand and debug. Please be ready to debug this in Firefox in practice mode.

In this level, the flag is no longer nicely wrapped in JavaScript. It's just boring old text. You'll need to fetch it and exfiltrate it to score. Good luck!

Of course, as with any GET request, you can add some parameters. Try that out now!

And, naturally, we can use fetch() to make POST requests. This lets our JavaScript pretend to submit forms and so on, which is pretty neat! Let's practice that in this level. You can look up how to pass advanced arguments to fetch() on your own, but we'll give you some hints for some things that should appear in your JavaScript verbatim:

  • {
  • method: "POST"
  • body
  • new URLSearchParams
  • }

Good luck!


NOTE: There are many ways to send POST parameters. In this module, we covered the sending of form data, but other types exist as well, and all have different ways of accessing them via flask. Make sure you're sending form data in your POST, not something else; otherwise, our server (the way it's implemented) won't see it!

This challenge will be the start of your SQL journey. In this challenge, and throughout this module, we'll use a SQL engine called SQLite. SQLite is an extremely lightweight SQL engine that, rather than using a complex SQL server process to host databases, simply interacts with database files directly. This makes it very convenient to prototype applications on, and we use it for almost all our SQL needs in the challenges on pwn.college, but you wouldn't want to use it for, say, a production website... In the challenge file (/challenge/sql), you'll notice our use of SQLite via the TemporaryDB class. Feel free to ignore the inner workings of that class --- we simply use it as a wrapper to execute SQL queries and get results. Focus on the rest of the code!

This challenge will start with a very simple query. The query we'll learn is SELECT. You can use SELECT to (😎) select data from tables in your databse. Its basic syntax is SELECT what FROM where, where what and where are things you specify. The where, typically, is a database table, and the what are the columns you want the query to fetch. If you don't want to worry about the column to SELECT, you can do SELECT *!

Read the code to understand the layout of the database you're querying, and select the flag!


NOTE: This challenge, and the other challenges in the series, will try to link to relevant SQLite documentation. This documentation can be rather dry and dense. Feel free to use other resources as well. There are LOTS of SQL guides on the internet: the only reason we made this one is to give an accelerated guide for the parts of SQL learners will need for pwn.college challenges!

Any non-trivial database will have enough data in it that one must be selective (🥁) about what you access. Luckily, the SELECT query can be filtered with the WHERE clause! This challenge will require you to filter your data, because now there's lots of junk in the database!

The challenge links to the SQLite documentation for the WHERE clause, and we'd like you to go and read it. The TLDR, to get you started, is that you can append WHERE condition to your query, where condition is some expression you specify, like some_column < 10 (for integer comparisons) or some_column = 'pwn' (for string comparisons) or the like.

You'll need to analyze the code to understand what differentiates the flag from the junk data, and then query on it! Hint: it's the new column we added. Can you make the right filter and filter your data to just the flag?

You've probably been using SELECT * because of our sublimital suggestion a few challenges ago. This challenge will force you to choose a single column. SELECT it by name and get the flag!

Here, we'll randomly tag the flag. Can you still filter it out?


HINT: It might be easier to exclude the garbage data with your filter rather than include the flag data.

Of course, you can also filter using string values. Here, the flag tag is a string. Can you still get the flag?

Let's move on to more advanced filtering. We got rid of the flag tag in this challenge, and you'll need to filter on the actual values of the flag data! Luckily, SQLite (and all SQL engines in general) provide some functions for working with strings, and you'll use the substr function here. substr(some_column, start, length) extracts length characters starting from start (the first character is at position 1, not 0 as it would be in a sane language) of column some_column. You can use the result of this anywhere the query accepts expressions, such as in the WHERE clause to compare the resulting value against a string as in the previous challenge!

Functionality like substr isn't just for filtering: you can also SELECT expressions such as these (in place of or in addition to where you typically specify columns)! This is super handy when you don't want (or, in the case of this challenge, cannot retrieve) all the data, but just want the result of some computation on your data. In this case, the challenge will simply not let you read the whole flag. Can you read it piecemeal?

So far, our WHERE conditions have been pretty simple. This challenge complicates it somewhat by injection decoy data into your database. Luckily, the flag tag is back.

You'll need to filter on both the flag tag and the flag value. Analogous to other programming languages, you can join together conditional expressions with boolean operators such as AND and OR. Craft a powerful expression and filter the flag from the decoys!

You've been able to rely on your WHERE clause to filter things down to exactly one result, but in this challenge, we've taken away the flag tags that you relied on to filter out decoy flags! Luckily, simple SQL queries tend to return data in the order that it was inserted into the database, and the real flag was inserted before the decoy flags (but after some of the garbage data). All you need is to LIMIT your query to just 1 result, and that result should be your flag! The challenge links you to the LIMIT documentation if you need it!

In actual security scenarios, there are times where the attacker lacks certain information, such as the names of tables that they want to query! Luckily, every SQL engine has some way to query metadata about tables (though, confusingly, every engine does this differently!). SQLite uses a special sqlite_master table, in which it stores information about all other tables. Can you figure out the name of the table that contains the flag, and query it?


30-Day Scoreboard:

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

Rank Hacker Badges Score