pwn.college

Let's learn to write text!

Unsurprisingly, your program writes text to the screen by invoking a system call. Specifically, this is the write system call, and its syscall number is 1. However, the write system call also needs to specify, via its parameters, what data to write and where to write it to.

You may remember, from the Practicing Piping module of the Linux Luminarium dojo, the concept of File Descriptors (FDs). As a reminder, each process starts out with three FDs:

FD 0: Standard Input is the channel through which the process takes input. For example, your shell uses Standard Input to read the commands that you input.
FD 1: Standard Output is the channel through which processes output normal data, such as the flag when it is printed to you in previous challenges or the output of utilities such as ls.
FD 2: Standard Error is the channel through which processes output error details. For example, if you mistype a command, the shell will output, over standard error, that this command does not exist.

It turns out that, in your write system call, this is how you specify where to write the data to! The first (and only) parameter to your exit system call was your exit code (mov rdi, 42), and the first (but, in this case, not only!) parameter to write is the file descriptor. If you want to write to standard output, you would set rdi to 1. If you want to write to standard error, you would set rdi to 2. Super simple!

This leaves us with what to write. Now, you could imagine a world where we specify what to write through yet another register parameter to the write system call. But these registers don't fit a ton of data, and to write out a long story like this challenge description, you'd need to invoke the write system call multiple times. Relatively speaking, this has a lot of performance cost --- the CPU needs to switch from executing the instructions of your program to executing the instructions of Linux itself, do a bunch of housekeeping computation, interact with your hardware to get the actual pixels to show up on your screen, and then switch back. This is slow, and so we try to minimize the number of times we invoke system calls.

Of course, the solution to this is to write multiple characters at the same time. The write system call does this by taking two parameters for the "what": a where (in memory) to start writing from and a how many characters to write. These parameters are passed as the second and third parameters to write. In the kinda-C syntax that we learned from strace, this would be:

write(file_descriptor, memory_address, number_of_characters_to_write)

For a more concrete example, if you wanted to write 10 characters starting from some memory address to standard output (file descriptor 1), this would be:

write(1, memory_address, 10);

Wow, that's simple! Now, how do we actually specify these parameters?

We'll pass the first parameter of a system call, as we reviewed above, in the rdi register.
We'll pass the second parameter via the rsi register. The agreed-upon convention in Linux is that rsi is used as the second parameter to system calls.
We'll pass the third parameter via the rdx register. This is the most confusing part of this entire module: rdi (the register holding the first parameter) has such a similar name to rdx that it's really easy to mix up and, unfortunately, the naming is this way for historic reasons and is here to stay. Oh well... It's just something we have to be careful about. Maybe a mnemonic like "rdi is the initial parameter while rdx is the xtra parameter"? Or just think of it as having to keep track of different friends with similar names, and you'll be fine.

And, of course, the write syscall index into rax itself: 1. Other than the rdi vs rdx confusion, this is really easy!

Now, you know how to set the system call number and how to set the rest of the registers. But where in memory is the data you need to write?

In this challenge, your program is invoked with a command-line argument, something like:

/tmp/your-program H

Recall that when a program is run with arguments, the stack stores pointers to each argument. These are addresses stored in memory: [rsp+16] doesn't contain the argument text directly --- it contains the address where that text lives.

So, to get the memory address of the first argument, you simply load the pointer from the stack, as you've done before!

mov rsi, [rsp+16]

This puts the memory address of the first argument's text into rsi --- exactly what write needs as its second parameter!

Your program will be invoked with a single character as its first argument. Call write to write that single character (for now! We'll do multiple-character writes later) to standard output, and we'll give you the flag!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Okay, our previous solution wrote output but then crashed. In this level, you will write output, and then not crash!

We'll do this by invoking the write system call, and then invoking the exit system call to cleanly exit the program. How do we invoke two system calls? Just like you invoke two instructions! First, you set up the necessary registers and invoke write, then you set up the necessary registers and invoke exit!

Your previous solution had 5 instructions (loading the first argument's address from the stack, setting rdi, setting rdx, setting rax, and syscall). This one should have those 5, plus three more for exit (setting rdi to the exit code, setting rax to syscall index 60, and syscall). For this level, let's exit with exit code 42!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Okay, we have one thing left for this run of challenges. You've written out a single byte, and now we'll practice writing out multiple bytes. In this level, the flag itself is passed as the first argument to your program! Can you write all 64 characters of it to stdout?

Hint: The only thing you should have to change compared to your previous solution is the value in rdx!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

You now know how to output data to stdout using write. But how does your program receive input data? It reads it from stdin!

Like write, read is a system call that shunts data around between file descriptors and memory, and its syscall number is 0. In read's case, it reads some amount of bytes from the provided file descriptor and stores them in memory. The C-style syntax is the same as write:

read(0, some_address, 5);

This will read 5 bytes from file descriptor 0 (stdin) into memory starting from some_address. So, if you type in (or pipe in) HELLO HACKERS into stdin, the above read call would result in the following memory configuration:

     Address     │ Contents
+───────────────────────────+
│ some_address   │ 48       │
│ some_address+1 │ 45       │
│ some_address+2 │ 4c       │
│ some_address+3 │ 4c       │
│ some_address+4 │ 4f       │
+───────────────────────────+

What are those numbers?? They are hexadecimal representations of ASCII-encoded letters. If those words don't make sense, please run through the first half or so of the Dealing with Data module and then come back here!

In this level, we will combine read with our previous write abilities. The flag will be piped into your program's stdin --- 128 bytes of it. Your program should:

first read 128 bytes from stdin to your program's memory
write those 128 bytes from that memory location to stdout
finally, exit with the exit code 42.

But what address should you use? You need somewhere that's valid and writable, and you already know about one such place: the stack! The rsp register points to the top of the stack, and there's plenty of writable space there. So you can just use rsp as your memory address: mov rsi, rsp.

DEBUGGING: Having trouble? Recall the Introspection module! Build your program and run it with strace to see what's happening at the system call level, or run it in gdb to inspect the values of registers and memory to see what's unexpected.

REMEMBER: You've basically already written steps 2 and 3 (though in the previous challenges, you loaded rsi from [rsp+16] --- here, you'll set it to rsp directly with mov rsi, rsp!). All you have to do is add step 1!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

In the previous level, you knew the input was exactly 128 bytes, so you could read 128 and write 128. Real input is rarely so tidy: often, you don't know up front how many bytes are coming.

Luckily, read tells you. When a system call returns, Linux places its result in rax. For read, that result is the number of bytes it actually read. Ask it to read 128 bytes but only 50 are available, and it reads those 50 and leaves 50 in rax.

So the idiom is: read into your buffer using a count comfortably larger than you expect, then write back exactly the number of bytes read returned. The only missing piece is that you need to move read's return value (rax) into write's size argument (rdx):

mov rdx, rax

Make sure to do this before clobbering rax with the syscall number of write!

This time the flag is piped in without padding. read it, write back exactly what you read, and exit with code 42 to get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

So far, your program has only interacted with stdin and stdout, but what about files on disk? To access a file, you first need to open it using the open system call.

The open system call (syscall number 2) takes a pointer to a filename string and returns a brand-new file descriptor referring to that file:

open("/flag", 0);

The second argument specifies additional modes and permissions for the file, but 0 requests the default: read-only.

The registers for open follow the same convention:

Register	Purpose
`rax`	`2` (syscall number for `open`)
`rdi`	pointer to the filename string in memory
`rsi`	`0` (read-only)

When open returns, rax contains the new file descriptor (fd) number. Recall that file descriptor 0 is stdin, file descriptor 1 is stdout, and file descriptor 2 is stderr. Other files that are open are just represented by other file descriptors, incrementing from 3 onwards! You'll use this fd as the first argument to read, just like you did for stdin earlier, but this time read will read from your file.

How to load the filename into memory? In this level, the path to the flag (/flag) will be passed as the first argument to your program. You already know how to load that: mov rdi, [rsp+16].

Your program should:

Load a pointer to the filename (stored at [rsp+16], the first argument) into rdi
Specify the default of read access for the second argument (set rsi to 0).
open it (syscall 2)
read from the returned fd into memory. The fd open returned is in rax; move it to rdi for read's first argument (do this before you set the syscall number for write!). Read a comfortably large count --- the flag is shorter.
write to stdout exactly the number of bytes read returned (mov rdx, rax, just like in read-exact)
exit with code 42 (syscall 60)

DEBUGGING: Having trouble? Use strace to see your system calls in action --- it will show you exactly what arguments each syscall receives and what it returns. If open is returning -1, double-check your filename pointer. If read returns 0, the file descriptor from open might be wrong.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

In the previous level, the filename was passed as an argument to your program. But what if you need to open a file whose path you already know? There are many ways to hardcode strings in your code, but for the purposes of the type of code you'll be writing in pwn.college, we are going to go with a "hacker" way designed more for use during software exploitation than real software development. We will hardcode the filename string directly into your program by writing it onto the stack, byte by byte!

The open syscall needs a pointer to the filename, so you need the bytes / f l a g stored somewhere in memory. You already know a writable memory address: rsp (the stack). You can write each character one byte at a time:

mov BYTE PTR [rsp], '/'
mov BYTE PTR [rsp+1], 'f'
mov BYTE PTR [rsp+2], 'l'
mov BYTE PTR [rsp+3], 'a'
mov BYTE PTR [rsp+4], 'g'
mov BYTE PTR [rsp+5], 0

A few things to note here:

BYTE PTR: When you write to a memory address like [rsp] using an immediate value (a number or character), the CPU doesn't know how many bytes you intend to write --- one? two? eight? BYTE PTR is a size directive that tells the assembler "I mean exactly one byte." Without it, the assembler won't know what you want and will refuse to assemble the instruction.
Single quotes: In assembly, a single-quoted character like 'f' represents that character's one-byte ASCII value. So 'f' is just a convenient way of writing 0x66, and '/' is 0x2f.
The null byte: The last byte we write is 0 --- a special null byte. This is how Linux knows where a string ends: it reads bytes starting from the pointer you give it and stops when it hits a 0 byte. Without it, open would keep reading past "flag" into whatever else is on the stack, and you'd be trying to open a file with a nonsense name!

After writing these bytes, rsp points to the null-terminated string "/flag", ready to pass to open.

Your turn! This time, no arguments are passed to your program. You must construct the filename yourself.

Your program should:

Write "/flag\0" onto the stack byte by byte using mov BYTE PTR [rsp+N], ...
open it (syscall 2): rdi = rsp (the string you just wrote), rsi = 0
read from the returned fd into memory (syscall 0), using a comfortably large count
write to stdout exactly the number of bytes read returned (syscall 1)
exit with code 42 (syscall 60)

DEBUGGING: Having trouble? Use strace to trace your syscalls. If open returns -1, your string pointer or encoding might be off. Try x/s $rsp in gdb to see what string is actually on the stack.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

In the previous level, you created the "/flag" filename by writing each byte onto the stack. That is a useful technique, but it is frustrating to write and hard to reason about (imagine trying to spot a typo in a long sentence written this way!). This challenge will show you a better way.

Luckily, your assembly can also contain bytes that are not meant to execute. For example, if you put those bytes after your final exit syscall, the CPU will stop before it reaches them. The bytes will still live in your program's memory, but will not crash your program by being interpreted as instructions.

For strings, the assembler gives you a convenient directive to specify these bytes:

_start:
    ...
    mov rax, 60
    syscall           // exit!
path:
    .asciz "/flag"    // never executed, but still there!

The .asciz directive emits the bytes of the string along with the terminating zero byte that Linux expects at the end of a filename. The path: label marks where those bytes start. In later challenges, when you see a compiled binary load a pointer to a stored string, you are seeing the same idea from the other side: the bytes are stored in the program, and an instruction computes their address at runtime.

That leaves one problem: to pass this path into the open syscall, you need to set its address in rdi. In the old days, programs would always be loaded to the same address in memory, and so you could hardcode this, as so:

_start:
    ...
    mov rdi, path    // this would tell the assembler to store the address of `path` in rdi
    ...
path:
    .asciz "/flag"

Unfortunately, THIS DOES NOT WORK in cybersecurity contexts! Modern software is compiled, for security reasons that we will cover in the Yellow belt, to be able to be loaded anywhere in memory. This means that, at the time of assembly of the software, the assembler doesn't know the right address. While this can be solved at start time for normal applications, modern CPUs have solved this problem in a different way: Instruction Pointer Relative Addressing.

On 64-bit x86, the instruction pointer (rip) is a register that always contains the address of the next instruction your CPU will execute. However, it is not a normal register, in the sense that its usage is more limited than something like rdi (note that this is a quirk of x86; many other architectures let you directly access their instruction pointer). That being said, 64-bit x86 does allow you to use addresses relative to rip for memory reads and writes. For example:

_start:
    ...
    mov rdi, [rip+path]
    ...
path:
    .asciz "/flag"

THIS IS STILL NOT WHAT WE WANT! Why? Because it reads the 8 bytes at [rip+path] into rdi rather than put the address of those bytes into rdi. rdi would end up holding the values 'f', and 'l', and so on, but the open syscall needs the address and not the values.

Luckily, there is an instruction that is almost a read, but instead does put the address that would have been read into rdi (or whatever other register). That instruction is load effective address (the word effective here refers to the CPU figuring out all the calculations it needs to do, such as adding an offset to the instruction pointer in this case):

_start:
    ...
    lea rdi, [rip+path]
    ...
path:
    .asciz "/flag"

This puts the address of the "/flag" string into rdi, rather than loading the contents of the string into rdi. Think of path as the address where the first byte of "/flag\0" lives: mov copies bytes from there, while lea copies the address so the kernel can walk those bytes until the null byte.

Now, a quick note about the math here: though we write [rip+path] above, what actually gets added to rip is the delta in addresses between rip (which, again, is pointing to the instruction after lea) and the "/flag" string. It's a weird syntax, and yet another little quirk of x86.

Use this in this challenge to set the path passed to open. Your program should open the stored filename, read from the returned fd into memory, write back exactly the number of bytes read returned (mov rdx, rax, as in read-exact), and exit with code 42. The new parts are:

Store the filename with .asciz after your code.
Load the filename address for open with lea rdi, [rip + path].

Run /challenge/check with your program, read the flag, and write it to stdout!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Output and Input

Computing 101.

System Calls

Writing Output

Connect with SSH

Chaining Syscalls

Connect with SSH

Writing Strings

Connect with SSH

Reading Data

Connect with SSH

Reading Exactly

Connect with SSH

Opening Files

Connect with SSH

Hardcoding the Filename

Connect with SSH

RIP-Relative Strings

Connect with SSH

30-Day Scoreboard:

Output and Input

Computing 101.

System Calls

System Calls

Writing Output 1 hacking, 10049 solves

Writing Output

Connect with SSH

Chaining Syscalls 1 hacking, 9962 solves

Chaining Syscalls

Connect with SSH

Writing Strings 9950 solves

Writing Strings

Connect with SSH

Reading Data 4 hacking, 8123 solves

Reading Data

Connect with SSH

Reading Exactly 2 hacking, 647 solves

Reading Exactly

Connect with SSH

Opening Files 2 hacking, 2449 solves

Opening Files

Connect with SSH

Hardcoding the Filename 2400 solves

Hardcoding the Filename

Connect with SSH

RIP-Relative Strings 1 hacking, 618 solves

RIP-Relative Strings

Connect with SSH

30-Day Scoreboard: