In this second module we will dive into data handling, web communication, and SQL basics.

Questions and Discussions (Discord)



Introduction to Module 2

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Dealing with Data

Computer software communicates with each other by exchanging variously-formatted data via various communication channels. Learning about this concurrently with learning about security concepts can be overwhelming, and thus, this module tries to prepare you for the latter by covering the former.

In this module, you will learn the different ways data is reasoned about by programs. In the future, this will help you carefully craft that data to break the recipient program's security!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Let's start your journey through encodings with something simple. This program takes a password, but you have no way to know what it is... unless you READ it!

In most cybersecurity analysis settings, you will be analyzing software that you did not write, like this program. Thus, the very first skill you will learn in this module is to read software to understand what is the data that it wants you to send. We'll start with this trivial Python program.

The program lives in /challenge/runme, and will request a tricky password before it gives you the flag. It's going to be the simplest program you read in your journey, as it just reads data over standard input and makes one simple check.

Read the program, understand the Python, and make the program give you the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Once more into the breach, dear hacker! Just to make sure you get the idea.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

The previous challenges were quite simple, as is this one. But it does one thing slightly differently: it does not ignore the Enter that you press on the terminal when entering your password. This causes your entered_password to contain a newline, and since correct_password has no newline, the comparison fails!

This sort of stuff --- errant delimiters in data --- happens ALL the time and can lead to crazy amounts of lost time. There are a few ways to get around it in this level:

  1. Look into ways to terminate your terminal input without pressing Enter. This is super searchable online!
  2. Recall, from the Linux Luminarium, how to redirect an echo (with arguments to disable newlines) to the stdin of /challenge/runme.
  3. Create a file without a newline, and remember your Linux Luminarium to redirect the file to stdin of /challenge/runme.

Good luck!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Let's explore some other ways programs might take security-relevant input. Here, the program does not read the password from the terminal. Can you still crack it?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Here's another slight twist on it. Can you still get it?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, life must get complex. You may have noticed the b letters in front of the password constants throughout this module. Python has two types of string-like constants: strings (specified as "asdf") and bytes (specified as b"asdf"). Let's talk about bytes in this level.

Bytes are what is actually stored in your computer's memory. As you might know, computers think in binary: just a bunch of ones and zeroes. For historical reasons, we express these ones and zeroes ("bits") in groups of 8, and each group of 8 (a "byte"). This number is purely arbitrary: early computers (pre-1960s or so) didn't have this grouping at all, or had other arbitrary groupings. It is very feasible for there to be an alternate universe in which a byte is 16, 32, or really any numbers of bits (though for math reasons, it'll likely remain a power-of-2).

A single binary digit (bit) can represent two values (0 and 1), two bits can represent four values (00, 01, 10, and 11), three bits can represent eight values (000, 001, 010, 011, 100, 101, 110, 111), and four bits can represent sixteen values. Comparatively, a single decimal digit can represent 10 values (from 0 to 9). Ten values are represented by roughly log2(10) == 3.3219... bits, and you get weird situations like binary 1001 being decimal 9, but binary 1100 (still 4 binary digits) being 12 (two decimal digits!). Another way of expressing this digit desynchronization between decimal and binary is that decimal does not have clean bit boundaries.

The lack of bit boundaries makes reasoning about the relationship between decimal and binary complex. For example, it is hard to spot-translate numbers between decimal and binary in general: we can work out that 97 is 110001, but it's hard to see that at a glance.

It's much easier to spot-translate between bases that have more alignment between digits. For example, a single hexadecimal (base 16) digit can represent 16 values (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f): the same number of values that binary can represent in 4 digits! This allows us to have a super simple mapping:

Hex Binary Decimal
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
a 1010 10
b 1011 11
c 1100 12
d 1101 13
e 1110 14
f 1111 15

This mapping from a hex digit to 4 bits is something that's easily memorizable (most important: memorize 1, 2, 4, and 8, and you can quickly derive the rest). Better yet, two hex digits is 8 bits, which is one byte! Unlike decimal, where you'd have to memorize 16 mappings for 4 bits and 256 mappings for 8 bits, with hexadecimal, you only have to memorize 16 mappings for 4 bits and the same amount of mappings for 8 bits, since it's just two hexadecimal digits concatenated! Some examples:

Hex Binary Decimal
00 0000 0000 0
0e 0000 1110 14
3e 0011 1110 62
e3 1110 0011 227
ee 1110 1110 238

Now you're starting to see the beauty. This gets even more obvious when you expand beyond one byte of input, but we'll let you find that out through future challenges!

Now, let's talk about notation. How do you differentiate 11 in decimal, 11 in binary (which equals 3 in decimal), and 11 in hex (which equals 17 in decimal)? For numerical constants, Python's notation is to prepend binary data with 0b, hexadecimal with 0x, and keep decimal as is, resulting in 11 == 0b1011 == 0xb, 3 == 0b11 == 0x3, and 17 == 0b10001 == 0x11. But for bytes, as in this challenge, you can specify them using escape sequences. The escape sequence starts with \x and is followed by two hex digits, resulting in a single byte with that value being put in the bytes constant!

Armed with this knowledge, go and hex the challenge and get the flag!


Fun facts: Some other Pythonisms that might be useful:

  • If you print(n) a number or convert it to a string with str(n), the number will be represented in base 10.
  • You can get a hexadecimal string representation of a number using hex(n).
  • You can get a binary string representation of a number using bin(n).
  • Converting a string to a number with int(s) will read it as a base 10 number by default.
  • You can specify a different base to use with a second argument: int(s, 16) will interpret the string as hex, int(s, 2) will interpret it as binary.
  • You can try to auto-identify the number base using int(s, 0), which requires a prefix on the string (0b or binary, 0x for hex, nothing for decimal).

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

You're not limited to two hex digits! Like decimal numbers, you can add arbitrary amounts of them to represent more and more bytes. Every two hex digits are one additional byte. One hex digit, for those curious, is called a nibble (har har!), but this is not used when specifying data. We almost always work with data on the byte level, not less.

What you will do in this level is hex encode arbitrary data. That is, you will figure out what value you want your data to have at the end, encode that value in hex, and send the hex bytes. Wow, your first encoding!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, let's decode some hex, rather than encoding it. Can you figure out what the program needs?


NOTE: One of the toughest parts of this challenge is to send raw binary data to it stdin. There are a few ways to do this:

  1. Write a python script to output data to stdout and pipe that to the challenge's stdin! This would involve using the raw byte interface to stdout: sys.stdout.buffer.write().
  2. Write a python script to run the challenge and interact with it directly. Our recommendation is to use pwntools for this: import pwn, p = pwn.process("/challenge/runme"), p.write(), and p.readall(). A pwn.college alumni has created an awesome pwntools cheat sheet that you may reference.
  3. For an increasingly hacky solution, echo -e -n "\xAA\xBB" will print out bytes to stdout that you can pipe.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

How many bases can you hold in your head? Here, we explore binary encoding of input.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now it's your turn!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, let's talk about strings! In Python, strings are meant for human consumption. A string is a sequence of characters that a human might write down, read, speak, and dream. This includes things like letters of the alphabet but also things like 🕴️. When one thinks about how many different letters of different alphabets and different emoji and so on there are, it's clear that there are thousands of different options for each character of a string.

The representation of a human-readable character as a bunch of bytes in memory is yet another Encoding (here, used as a noun). A character, such as 🐉, is "encoded" (here, used as a verb) by bytes in memory, and those bytes "decode" to that character. In this use of "encoding" and "decoding", the data actually stays the same, but its interpretation changes: the encoding is applied by, say, your commandline terminal to translate the bytes being sent by the program into the characters and emoji you see on the screen.

In Python, you convert a str to its equivalent bytes by doing my_string.encode(). If you have a bunch of bytes that you want to interpret as a string, you can do my_bytes.decode(). But how are string characters mapped to byte values?

Back in its early days (say, pre-2000), when computing was less international and people still typed :-) instead of 🙂, people didn't really worry about the limited number of characters that a single byte could represent. Thus, early encodings simply encoded each character to be a single byte, with a resulting limit of 256 possible characters. Because early computing was predominantly US- and Western Europe-based, the most popular such encoding, specifically designed to represent characters in the Latin alphabet with various byte values, was ASCII, dating back to 1963 (ancient history by computing standards!).

ASCII is pretty simple: every character is one byte, uppercase letters are 0x40+letter_index (e.g., A is 0x41, F is 0x46, and Z is 0x5a), lowercase letters are 0x60+letter_index (a is 0x61, f is 0x66, and z is 0x7a), and numbers (yes, the numeric characters you're seeing are not bytes of those values, they are ASCII-encoded number characters) are 0x30+number, so 0 is 0x30 and 7 is 0x37. Useful special characters are sprinkled around the mapping as well: forward slash (/ is 0x2f), space is 0x20, and newline is 0x0a. Because early computing pioneers were making stuff up as they went along, some of the ASCII characters aren't really characters: 0x07 is a bell; it literally makes your terminal beep when it is "printed" out! Other "control characters" do other whacky things: 0x08, for example, deletes the last character on the terminal instead of being a character itself.

Byte values below 0x80 (128), considered "standard ASCII", were pretty universally defined even for non-English countries. You can see this whole standard ASCII definition with man ascii! You can also use standard ASCII in python to encode strings: my_string.encode("ascii"). But beware, standard ASCII doesn't define values above 0x80, so if you decode bytes that have those values, you will get an exception! This, for example, won't work: b"\x80".decode("ascii").

Values above 0x80 ("extended ASCII") were used by different countries for their own characters, leading to some chaos due to colliding byte values. In the US, the typical "extended ASCII" encoding was called Latin 1, and it defined a character for each of the 256 possible byte values. This is useful for us because we can use "latin1" to easily convert between Python's bytes and strings, including: b"\x80".decode("latin1").

In this challenge, we want you to give us ASCII-encoded hex values (fun fact: specifying byte values in hexadecimal is called "hex encoding"!), and we'll match them against the password. Good luck!


NOTE: As you read the challenge to understand what value you need to send, you'll notice that some parts of the bytes constant specified for correct_password looks ... weird. Each byte in correct_password represents a byte in memory, but they often still have useful, human-relevant information. Printing every byte with escape sequences, though it would be valid, would not be as useful for humans, even if bytes aren't really meant for human consumption. Thus, the Python developers decided to represent bytes as ... standard ASCII! Python bytes are specified using ASCII characters, with the weirder "non-printable" ones (e.g., anything over 0x80 and a few others are specified using \x escape sequences). This can work for normal characters as well: \x41 happily encodes A. Some other special characters have their own bespoke escape sequences: for example, \n encodes a newline character (equivalent to \x0a). You can see other escape sequences in man ascii. Because \ is used as an escape sequence, Python (and other languages that use the escape sequence concept, which is almost everything) must specify an actual backslash as an escape sequence as well (specifically, \\ encodes a \ byte with a value of 0x5c).

Okay, that was a lot. Go put it into practice!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Okay, now that we understand how bytes are presented to us humans, we can play around with more encoding practice! Let's multi-encode our bytes!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Once computing went international and emojis were added, people needed to be able to use more than 256 possible characters at a time. In the modern era, this has been largely solved by the UTF-8 encoding. UTF-8 is a specific multi-byte encoding of Unicode, a global standardized character set containing essentially all characters known to humanity, plus the fun emoji that you know and love. There are many ways to encode Unicode, and UTF-8 is one of them. Unicode (character set) is to UTF-8 (encoding) as English (character set) is to standard ASCII (encoding).

Conveniently, UTF-8 is backwards-compatible with standard ASCII (e.g., standard ASCII byte values represent the same character in UTF-8 as in ASCII), but in certain situations will use more than one byte to represent a single character. This allows UTF-8 to have essentially limitless character options (it can always interpret more bytes!): currently, it supports well over 1,000,000 characters!

UTF-8 is (by default) how Python's strings are specified, so you can do stuff like my_string = "💥"). You can convert that into the actual byte representation (as it's stored in bytes in memory) by doing my_string.encode("utf-8") which, in the case of the emoji in question, results in the bytes b'\xf0\x9f\x92\xa5'. Those four bytes represent that emoji in UTF-8.

In this challenge, you will learn to craft emoji bytes. We want you to create raw bytes representing UTF-8 emoji, hex-encode them, and send those hex values to us. Can you do it?


DOJO NOTE: Due to a bug with unicode displays in the GUI Desktop terminal, we recommend that you use the VSCode Workspace for this challenge (and any other emoji-dependent challenges!).

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

UTF-8 is the current king of encodings. It is used, for example, by the vast majority of websites on the internet.

But it's not the only game in town. Outside of the web, other encodings are present in significant numbers. For various (misguided) technical reasons, Windows systems often use a different Unicode encoding: UTF-16. This encoding represents the same Unicode characters using different byte values! Needless to say, this leads to much confusion, and occasionally, security vulnerabilities.

A common way encoding mixups lead to security vulnerabilities is by incorrectly decoding data to perform security checks, then correctly (and differently) decoding it later to actually carry out security-sensitive actions. If security checks are performed on bad data, then dangerous data can be missed.

This is the case in this challenge. Can you get the flag?


DOJO NOTE: Due to a bug with unicode displays in the GUI Desktop terminal, we recommend that you use the VSCode Workspace for this challenge (and any other emoji-dependent challenges!).

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

So far, we have seen a few types of encodings: UTF-8, UTF-16, extended ASCII (latin-1), and hex encoding. This encoding translates data, whether that's a concept such as a 🎈 emoji character or an actual byte in memory into other bytes. What happens when you mess with the encoded data? Nothing good! In UTF-8, 🎈 encodes to:

hacker@dojo:~$ ipython
In [1]: "🎈".encode("utf-8")
Out[1]: b'\xf0\x9f\x8e\x88'

If we mess with the resulting bytes, and then decode them, we would (of course) get something different:

In [2]: b'\xf0\x9f\x8e\xaa'.decode("utf-8")
Out[2]: '🎪'

In [3]: b'\xf0\x9f\x8e\x42'.decode("utf-8")
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[3], line 1
----> 1 b'\xf0\x9f\x8e\x42'.decode("utf-8")

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-2: invalid continuation byte

The first modification resulted in a different emoji, and the second one errored out. Depending on the encoding, not all byte values can be decoded properly! For UTF-8, this is due to a complex algorithm to specify the data. For hex encoding, this is due to only numbers 0 through 9 and letters A through F being valid in hexadecimal!

All this being said, any encoding can be messed with to some extent, as we can see in the first example above. When a security flaw allows data to be corrupted, this can enable an attacker to carefully transform data to their purposes. We'll learn about how to protect data from this later in pwn.college, but for now, let's practice this concept by seeing what happens when we mess with some hex!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

ASCII and UTF-8 are encodings meant for very specific data: text (or text-like characters). Hex encoding is more general, and you can apply it to any data. The reason that we might use an encoding like hex is to transfer information via some medium where it is hard to write arbitrary binary code, such as a piece of paper or certain communication protocols. However, it is horribly inefficient: it doubles the size of the data by outputting two ASCII hex digits for every byte!

Hex is inefficient for a similar reason that it convenient: there are only 4 bits available per digit, and since each output character digit takes 8 bits to display (in ASCII), the data size doubles. Luckily, we can increase the efficiency of an encoding by increasing the number of bits we can convey per output character.

The name "base64" comes from the fact that there are 64 characters used in each output character. These can actually vary, but the standard base64 encoding uses an "alphabet" of the uppercase letters A through Z, the lowercase letters a through z, the digits 0 through 9, and the + and / symbols. This results in 64 total output symbols, and each symbol can encode 2**6 (2 to the power of 6) possible input symbols, or 6 bits of data. That means that to encode a single byte (8 bits) of input, you need more than one base64 output character. In fact, you need two: one that encodes the first 6 bits and one that encodes the remaining 2 (with 4 bits of that second output character being unused). To mark these unused bits, base64 encoded data appends an = for every two unused bits. For example:

hacker@dojo:~$ echo -n A | base64
QQ==
hacker@dojo:~$ echo -n AA | base64
QUE=
hacker@dojo:~$ echo -n AAA | base64
QUFB
hacker@dojo:~$ echo -n AAAA | base64
QUFBQQ==
hacker@dojo:~$

As you can see, 3 bytes (3*8 == 24 bits) encode precisely into 4 base64 characters (4*6 == 24 bits).

base64 is a popular encoding because it can represent any data without using "tricky" characters such as newlines, spaces, quotes, semicolons, unprintable special characters, and so on. Such characters can cause trouble in certain scenarios, and base64-encoding the data avoids this nicely.

You've also explored other "base" encodings: base2 is binary, and base16 is hex!

Now, go and decode your way to the flag!


HINT: You can use Python's base64 module (note: the base64 decoding functions in this module consume and return Python bytes) or the base64 command line utility to do this!

FUN FACT: The flag data in pwn.college{FLAG} is actually base64-encoded ciphertext. You're well on the way to being able to build something like the dojo!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Needless to say, you can also _en_code things to base64! Go do that now!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Security-oblivious developers often use encoding-based obfuscation in lieu of encryption. This sort of obfuscation typically fails to prevent determined hackers from accessing the data in question, especially once they read the software logic implementing it. Put yourself in the shoes of such a hacker, and get this flag.

Remember: "Good Artists Copy, Great Artists Steal!" When you're doing security analysis and need to interact with bespoke software, ripping the implementations of custom communication protocols out of that software is a good way to reach interoperability. Give it a try here!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Can you go further?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Talking Web

HTTP (Hypertext Transfer Protocol) is the lingua franca of the open Internet: the common tongue through which web applications, servers, and clients communicate. This module delves deep into the intricate skills of crafting, decoding, and manipulating HTTP requests and responses. By the end of this journey, you won't be solely reliant on your web browser to make HTTP requests on your behalf. You'll master the skills to speak directly with web servers, opening a new world of potential.

You will learn about:

  • Headers: Metadata fields that carry vital information about the request or response.
  • Paths: The specific locations or resources you're aiming to access.
  • Arguments: Data points that can alter or dictate the behavior of your request.
  • Form Data: Data transferred from web forms.
  • JSON: A popular data interchange format that's lightweight and human-readable.
  • Cookies: Small data fragments stored on the user's computer, crucial for session management and tracking.
  • Redirects: Methods web services use to direct your browser from one location to another.

As you push through these challenges, you won't be hacking blind:

hacker@talking-web-level-1:~$ /challenge/run
* Serving Flask app 'run'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://challenge.localhost:80
Press CTRL+C to quit

This output, made available through the challenge, directs you into the core of the web server's activities. Don't ignore it: the server's responses are often hints, meant to nudge you towards the right path when it is unclear.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Obviously, as you're accessing this website in your web browser, this isn't your first HTTP request. But it's your first HTTP request for a pwn.college challenge! Run /challenge/server, fire up Firefox in the dojo workspace (you'll need to use the GUI Desktop for this!), and visit the URL that it's listening on for the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Awesome, you got the hang of the basic process. There's one more thing you need to do, though: you must read and understand the source code of the challenge! Web servers route HTTP requests to different endpoints: http://challenge.localhost/pwn might go to the endpoint that handles the request path /pwn, and http://challenge.localhost/college might go to the endpoint that handles the request path college. This challenge has a randomly-chosen endpoint name. You must read the code in /challenge/server, understand it, and figure out which endpoint to visit in the browser!


Confused? Our web servers are implemented using the flask library. Read their documentation to build up understanding of the code, or experiment with it!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

HTTP is the HyperText Transfer Protocol. HyperText, named in the techno-optimism of the late 20th century, is text that carries additional data regarding how it should be understood, not just what it means. In modern times, this is done through a variety of means: HTTP is used to transport lots of different types of resources, and your web browser combines them to construct the websites that you see and interact with. The oldest of these is the HyperText Markup Language, or HTML.

HTML describes, in a way that the browser can interpret, the elements that should (initially) appear on a web page. We'll dive into HTML subtleties in later modules, but here, we'll practice piercing the veil of the website and looking at the HTML behind it all. You'll need to, as before, find the endpoint and access it in the in-dojo browser. However, the HTML sent over will hide the flag. You'll need to figure out how to view the Page Source of the HTML, rather than the rendered result, to access this hidden data.


HINT: click Firefox's Hamburger menu (≡), then go to More Tools.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

HTTP facilitates the transfer of both data (e.g., the HTML that /challenge/server sends you) and metadata (data about the data). The latter is sent via headers: fields in an HTTP request or response that give additional instructions to the server or browser. In this case, the flag is in a header. Can you find it?


HINT: you can inspect headers using Firefox's Web Developer Tools (≡, then More Tools). The Network tab of the tools displays all of the HTTP connections (you might need to reload the page after opening the Web Developer Tools for the connection to show up). Each of these connections has a Headers subtab, which shows headers that your browser sent alongside its request (Request Headers) and the headers that it received alongside the response (Response Headers). Find the flag header there!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

You've learned how to HTTP (though, of course, you've probably been HTTPing for most of your life!). Now, let's learn how to really HTTP. The HTTP protocol itself, as in the exact data that is sent over the network, is actually surprisingly human-readable and human-writable. In this challenge, you'll learn to write it. This challenge requires you to use a program called "netcat" (command name: nc), which is a simple program that communicates over a network connection. Netcat's basic usage involves two arguments: the hostname (where the server is listening on, such as www.google.com for Google), and the port (the standard HTTP port is 80).

When it starts up, netcat connects to the server and gives you a raw channel to communicate with it. You'll be talking directly with the web server, with no intermediary! How cool is that?

Recall the lectures, find the format of an HTTP request, and make a GET request to the / endpoint (we'll do more endpoints later) to get the flag!


HINT: Can't tell if netcat is connecting or not? Use the -v flag to turn on some verbosity!

HINT: Typed your GET request and nothing happens after you hit Enter? HTTP requests are terminated by two newlines. Try hitting Enter again!

A thought... Until this moment, have you ever truly HTTPed?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Okay, you got the basics of netcat down. Now make a GET request to a specific path! As always, check the /challenge/server code to understand more.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Next, we'll practice making HTTP requests with one of the most common commandline tools for HTTP: curl. Unlike netcat, curl is made specifically for HTTP, and you don't have to write raw HTTP commands. Instead, you must learn to use the right program options to achieve what you want. Here, you must simply make a GET request to the right endpoint!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Finally, we'll learn the fourth tool in our HTTP toolbox: Python's requests library. This, along with the browser, will likely be the two most heavily used tools in your HTTP toolbox. Requests lets you script complex web interactions, and this will be necessary to pull off tricky hacks later. For now, things are simple: pull up Python, import requests, and GET the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Unfortunately, most of the modern internet runs on the infrastructure of a handful of companies, and a given server run by these companies might be responsible for serving up websites for dozens of different domain names. How does the server decide which website to serve? The Host header.

The Host header is a request header sent by the client (e.g., browser, curl, etc), typically equal to the domain name entered in the HTTP request. When you go to https://pwn.college, your browser automatically sets the Host header to pwn.college, and thus our server knows to give you the pwn.college website, rather than something else.

Until now, the challenges you've been interacting with have been Host-agnostic. Now they start checking. Set the right Host header and get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, let's learn to set the Host header in curl! Read its man page to figure out how to set headers.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

And, finally, you can learn how Hosts are actually sent over the network in netcat. This might be a bit trickier. You can actually use curl as a source of information here! Curl's -v option causes it to print out the exact headers it's sending over (and the ones it receives!). Observe it, copy that with netcat, and get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Recall how HTTP requests contain fields separated by spaces? For example: GET /solve HTTP/1.1. What if the path (e.g., instead of /solve) has spaces inside it? This is a reasonable thing to happen, as these paths often reference directories, and directories may have spaces in their names!

Left to their own devices, spaces would mess up the HTTP request. Consider an HTTP server trying to make sense of GET /solve my challenge HTTP/1.1. A clever server might be able to deal with it, but it's not impossible that a version that simply reads one word at a time would read my instead of HTTP/1.1 and panic!

To avoid such situations, URLs are encoded using URL Encoding. This is a simple encoding compared to what you've seen in Dealing with Data. Any tricky characters (such as spaces) are simply hex-encoded, with a % plopped in front of them. Of course, because % thus becomes a tricky character in itself, it must also be encoded. In the above example, /solve my challenge would become /solve%20my%20challenge, as the hex value of the ASCII space character is 0x20.

Anyways, now we'll practice. We stuck spaces in the endpoints. Can you still get the flag?


INFO: You'll find that you need to encode URLs with curl as well (though we won't make you jump through that hoop), in the exact same way. Python's requests library, however, will automatically urlencode things for you. So useful!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Like a function call in a programming language or a command execution on the shell, HTTP requests can include parameters. GET requests send parameters alongside the path in the URL, in a part of the URL called the Query String. In this challenge, you'll learn how to craft this query string. Read the challenge source to understand what parameter you need, and send it over! You can use any client you want: the process is basically the same in all of them.


SECURITY NOTE: It's tempting to think of HTTP parameters as similar to parameters to a function call. However, keep in mind: when you're writing C or Python or Java code, an attacker (typically) can't just call random functions in your program with random parameters. But with HTTP, they can. They can just make HTTP requests wherever they want! This has caused quite a few security issues...

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Of course, you can pass in multiple parameters; you just need to separate each of them with &: what=pwn&where=college. Try it now, in netcat.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Specifying multiple HTTP parameters in curl is a bit of a special case, because & means something special in the shell (it launches a command in the background), and if you're not careful, the shell will trip over your &! Make sure to put the whole URL, including the query string, in quotes to avoid this situation. Try that now.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

HTTP GET requests are typically used for retrieving data, and the parameters typically represent data identifiers and various customizations for its retrieval and display. Storying data is usually done with an HTTP POST request. In the old days, POST requests typically resulted from people filling out and submitting HTML forms. This still occurs, but there are plenty of other ways POST requests are created (some of which we'll cover later).

For now, let's practice the oldie and goodie. http://challenge.localhost has a form for you. Fill it out in the browser and submit it for the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, let's try this with curl. Look at the man page to figure out how to make a post request (HINT: the most relevant option is -d).


NOTE: Remember what we said about attackers being able to trigger whatever HTTP requests they wanted? Note how this challenge doesn't even have any functionality to make the form, but you can still hit it with the POST request!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, we try this with netcat. This is MUCH harder, and is somewhat archaic for historical reasons. Check out the simplest URL-encoded form submission example from Mozilla, and adapt it for your usecase.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now let's try this with requests! Look at the documentation to find out how to do this.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, try to make your browser do a POST request without the website providing a form. Hint: can you bring your own form to the table?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Let's play around with multiple form fields!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

... and with netcat!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Sometimes, resources on the web move. A website might get redesigned, we might rename a pwn.college module, etc. In these (and other!) cases, the webserver can redirect clients to the new URL. This is done via a special HTTP request, as you'll discover here. Can you still find the flag?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, let's try curl. Curl has a very useful commandline option to automatically follow redirects. It's -L. Try it out, and see how easy this becomes!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

And now, Python. Python's requests library automatically follows redirects, so this should be quite easy. Give it a try!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Include a cookie from HTTP response using curl

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Include a cookie from HTTP response using nc

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Include a cookie from HTTP response using python


HINT: If you aren't already using it, check out requests.Session!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Make multiple requests in response to stateful HTTP responses using python

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

You've been staring at web server code all this time and figuring out how to speak to it. Now, let's learn to listen.

In this level, you will write a simple server that'll receive the request for the flag! Simply copy the server code from, say, the very first module, remove anything extra, and build a web server that'll listen on port 1337 (instead of 80 --- you can't listen on port 80 as a non-administrative user) and on hostname localhost. When you're ready, run /challenge/client, and it will launch an internal web browser and visit http://localhost:1337/ with the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

You've followed redirects --- now make one happen! Have your webserver redirect /challenge/client to the right location in /challenge/server. You'll need three terminal windows for this:

  1. The first terminal window runs /challenge/server, which listens on port 80 and prepares to give the flag.
  2. The second terminal window runs your webserver implementation, which listens on port 1337 and prepares to redirect the client.
  3. The third terminal window runs /challenge/client.

It's complex, but you can do it!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

In the beginning of the web, HTML, though Hyper, was pretty static. It described its layouts, and that was it. Sometime in the 1990s, the movers and shakers of the internet thought "What if web pages could execute logic?", and JavaScript was born.

JavaScript is a programming language that allows web pages to dynamically make decisions and carry out actions. It is, hands down (and unfortunately, because it's terrible) the most important programming language out there (though luckily not the most used), and try as we might to avoid it (did we mention that it's terrible), we have to account for it in any discussion of web security.

HTML specifies JavaScript to be executed through the <script> tag. This tag tells the browser that what is inside that tag is JavaScript, and the browser executes it. There are many resources online for how to write script tags, and how to write JavaScript, and we'll leave their finding as an exercise for you, the learner. Here, we'll practice something very specific: using JavaScript to redirect the browser to a different web page.

As previously, the client browser will print out the page it receives, but it will start by going to http://challenge.localhost/~hacker/solve.html. Here, we harken back to the olden days of shared servers: http://challenge.localhost/~hacker/anything will be served out of the public_html subdirectory of your home directory! Create a /home/hacker/public_html/solve.html, write the JavaScript you need to redirect the browser, and get the flag!


HINT: The JavaScript object you want is window.location. You can assign a string to it to redirect the browser to a new location.

HINT: Debugging this can be tricky with the built-in browser. Try it using the dojo's Firefox! You can't get the final flag with it, but you can at least tell if your redirect is working!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

JavaScript can do many things in the context of the web page, and can, thus, lead to unexpected situations and security compromises. You'll explore some of these situations in the Web Security module, but we'll lay the groundwork here.

In this level, /challenge/client will no longer print the web page, and /challenge/server will not serve up an HTML page of the flag, but a JavaScript script that sets a global flag variable to the value of the flag. You'll need to make a web page to include this script in your page (we'll leave it up to you to find the documentation for this --- hint: src is involved) and then create another script to somehow exfiltrate this information. Exfiltration is the art of smuggling sensitive data out right under the nose of its owners: in this case, /challenge/client and /challenge/server. Your JavaScript running on your page, of course, has acess to the flag variable, but you'll need to somehow communicate it out to the world. This can be done in a few different ways, but probably the easiest is to redirect (using your window.location trick from before!) the client browser to a URL that contains the flag (similar to how the client leaked it to you a few levels ago), and have that request go to somewhere where you can see the URL log (such as the log of /challenge/server or your own webserver!).

This sounds like a lot, but it's eminently doable. Our reference HTML solution file is just 150 bytes long! As before, remember: you can debug your solutions using your own browser (and can run it as root in practice mode to be able to include the flag script!).

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Now, the hard part begins... Oftentimes, what you need to exfiltrate is other data accessible to your JavaScript on the website, but you often have to make HTTP requests to retrieve it. In modern JavaScript, HTTP requests are made using the fetch() function. It works roughly as follows:

fetch("http://google.com").then(response => response.text()).then(website_content => ???);

The ???, of course, is the code that you want to execute on the website contents. This API looks so absolutely insane because JavaScript is insane, but also because it actually has a hard problem to solve. It has to execute logic in an environment where network errors, CPU load, laptop suspending and resuming, firewalls, and other crazy things can interfere with the loading and operation of the resources that it depends on. The above code uses JavaScript "promises", which is a complex programming pattern that lets you write logic that will be executed on data that is not yet available, when that data finally does become available. The .then() is the part of the promise that specifies what will be eventually executed. Here, the flow is roughly as follows:

  1. fetch() returns a promise and starts to fetch http://google.com. This might take a while, might never succeed, or might succeed immediately. At any rate, it initially returns a promise object that has a then() member function that will run when the response is available.
  2. The response becomes available and the promised code executes. This code takes the promised response and calls response.text(), which retrieves the full text contents returned by http://google.com. Because this might take a while to load fully, this also returns a promise, and that promise also has a .then() method that we can specify code for.
  3. Finally, all the content is available and our final promised code runs. This can be anything, but for most of our purposes, this is where we exfiltrate our data like you did in previous challenges.

This can be insanely hard to understand and debug. Please be ready to debug this in Firefox in practice mode.

In this level, the flag is no longer nicely wrapped in JavaScript. It's just boring old text. You'll need to fetch it and exfiltrate it to score. Good luck!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Of course, as with any GET request, you can add some parameters. Try that out now!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

And, naturally, we can use fetch() to make POST requests. This lets our JavaScript pretend to submit forms and so on, which is pretty neat! Let's practice that in this level. You can look up how to pass advanced arguments to fetch() on your own, but we'll give you some hints for some things that should appear in your JavaScript verbatim:

  • {
  • method: "POST"
  • body
  • new URLSearchParams
  • }

Good luck!


NOTE: There are many ways to send POST parameters. In this module, we covered the sending of form data, but other types exist as well, and all have different ways of accessing them via flask. Make sure you're sending form data in your POST, not something else; otherwise, our server (the way it's implemented) won't see it!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

SQL Playground

Modern society runs on the internet, and the internet runs on databases. Databases hold massive amounts of data on everything from your pwn.college scores (yes, we have a database!) to all of Wikipedia to less important things such as your credit score. If you can describe it, it exists in a database somewhere.

Databases come in all shapes and sizes, but arguably the most common ones, and definitely the most traditional ones, store data entries in structured tables. These Structured tables can be Queried using a specialized Languaged called the Structued Query Language, or SQL (typically pronounced like "sequel").

The (mis)use of SQL leads to all sorts of potential security issues, as we'll explore later on this platform. For now, this module will teach you (or, rather, force you to learn) SQL through a series of challenges that will expose you to the parts of the language that will become relevant later.

Welcome to the SQL playground.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

This challenge will be the start of your SQL journey. In this challenge, and throughout this module, we'll use a SQL engine called SQLite. SQLite is an extremely lightweight SQL engine that, rather than using a complex SQL server process to host databases, simply interacts with database files directly. This makes it very convenient to prototype applications on, and we use it for almost all our SQL needs in the challenges on pwn.college, but you wouldn't want to use it for, say, a production website... In the challenge file (/challenge/sql), you'll notice our use of SQLite via the TemporaryDB class. Feel free to ignore the inner workings of that class --- we simply use it as a wrapper to execute SQL queries and get results. Focus on the rest of the code!

This challenge will start with a very simple query. The query we'll learn is SELECT. You can use SELECT to (😎) select data from tables in your databse. Its basic syntax is SELECT what FROM where, where what and where are things you specify. The where, typically, is a database table, and the what are the columns you want the query to fetch. If you don't want to worry about the column to SELECT, you can do SELECT *!

Read the code to understand the layout of the database you're querying, and select the flag!


NOTE: This challenge, and the other challenges in the series, will try to link to relevant SQLite documentation. This documentation can be rather dry and dense. Feel free to use other resources as well. There are LOTS of SQL guides on the internet: the only reason we made this one is to give an accelerated guide for the parts of SQL learners will need for pwn.college challenges!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Any non-trivial database will have enough data in it that one must be selective (🥁) about what you access. Luckily, the SELECT query can be filtered with the WHERE clause! This challenge will require you to filter your data, because now there's lots of junk in the database!

The challenge links to the SQLite documentation for the WHERE clause, and we'd like you to go and read it. The TLDR, to get you started, is that you can append WHERE condition to your query, where condition is some expression you specify, like some_column < 10 (for integer comparisons) or some_column = 'pwn' (for string comparisons) or the like.

You'll need to analyze the code to understand what differentiates the flag from the junk data, and then query on it! Hint: it's the new column we added. Can you make the right filter and filter your data to just the flag?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

You've probably been using SELECT * because of our sublimital suggestion a few challenges ago. This challenge will force you to choose a single column. SELECT it by name and get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Here, we'll randomly tag the flag. Can you still filter it out?


HINT: It might be easier to exclude the garbage data with your filter rather than include the flag data.

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Of course, you can also filter using string values. Here, the flag tag is a string. Can you still get the flag?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Let's move on to more advanced filtering. We got rid of the flag tag in this challenge, and you'll need to filter on the actual values of the flag data! Luckily, SQLite (and all SQL engines in general) provide some functions for working with strings, and you'll use the substr function here. substr(some_column, start, length) extracts length characters starting from start (the first character is at position 1, not 0 as it would be in a sane language) of column some_column. You can use the result of this anywhere the query accepts expressions, such as in the WHERE clause to compare the resulting value against a string as in the previous challenge!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

Functionality like substr isn't just for filtering: you can also SELECT expressions such as these (in place of or in addition to where you typically specify columns)! This is super handy when you don't want (or, in the case of this challenge, cannot retrieve) all the data, but just want the result of some computation on your data. In this case, the challenge will simply not let you read the whole flag. Can you read it piecemeal?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

So far, our WHERE conditions have been pretty simple. This challenge complicates it somewhat by injection decoy data into your database. Luckily, the flag tag is back.

You'll need to filter on both the flag tag and the flag value. Analogous to other programming languages, you can join together conditional expressions with boolean operators such as AND and OR. Craft a powerful expression and filter the flag from the decoys!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

You've been able to rely on your WHERE clause to filter things down to exactly one result, but in this challenge, we've taken away the flag tags that you relied on to filter out decoy flags! Luckily, simple SQL queries tend to return data in the order that it was inserted into the database, and the real flag was inserted before the decoy flags (but after some of the garbage data). All you need is to LIMIT your query to just 1 result, and that result should be your flag! The challenge links you to the LIMIT documentation if you need it!

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

In actual security scenarios, there are times where the attacker lacks certain information, such as the names of tables that they want to query! Luckily, every SQL engine has some way to query metadata about tables (though, confusingly, every engine does this differently!). SQLite uses a special sqlite_master table, in which it stores information about all other tables. Can you figure out the name of the table that contains the flag, and query it?

Connect with SSH

Link your SSH key, then connect with: ssh hacker@pwn.college

30-Day Scoreboard:

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

Rank Hacker Badges Score