Computer Memory


Computing 101

Wow, you are a budding x86 assembly programmer! You've set registers, triggered system calls, and wrote your first program that cleanly exits. Now, we have one more big concept for you: memory.

You, as (presumably) a human being, have Short Term Memory and Long Term Memory. When performing specific computation, your brain loads information you've previously learned into your short term memory, then acts on that information, then eventually puts new resulting information into your long-term memory. Societally, we also invented other, longer-term forms of storage: oral histories, journals, books, and wikipedia. If there's not enough space in your long-term memory for some information, or the information is not important to commit to long-term memory, you can always go and look it up on wikipedia, have your brain stick it into long-term memory, and pull it into your short-term memory when you need it later.

This multi-level heirarchy of information access from "small but accessible" (your short term memory, which is right there when you need it but only stores 5 to 9 pieces of information to "large but slow" (remembering stuff from your massive long-term memory) to "massive but absolutely glacial" (looking stuff up on wikipedia) is actually the foundation of the Memory Hierarchy of modern computing. We've already learned about the "small but accessible" part of this in the previous module: those are registers, limited but FAST.

More spacious than even all the registers put together, but much much MUCH slower to access, is computer memory, and this is what we'll dig into with this module, giving you a glimpse into another level of the memory hierarchy.


Lectures and Reading


Challenges

As seen by your program, computer memory is a huge place where data is housed. Like houses on a street, every part of memory has a numeric address, and like houses on a street, these numbers are (mostly) sequential. Modern computers have enormous amounts of memory, and the view of memory of a typical modern program actually has large gaps (think: a portion of the street that hasn't had houses built on it, and so those addresses are skipped). But these are all details: the point is, computers store data, mostly sequentially, in memory.

In this level, we will practice accessing data stored in memory. How might we do this? Recall that to move a value into a register, we did something like:

mov rdi, 31337

After this, the value of rdi is 31337. Cool. Well, we can use the same instruction to access memory! There is another format of the command that, instead, uses the second parameter as an address to access memory! Consider that our memory looks like this:

  Address │ Contents
+────────────────────+
│ 31337   │ 42       │
+────────────────────+

To access the memory contents at memory address 31337, you would can do:

mov rdi, [31337]

When the CPU executes this instruction, it of course understands that 31337 is an address, not a raw value. If you think of the instruction as a person telling the CPU what to do, and we stick with our "houses on a street" analogy, then instead of just handing the CPU data, the instruction/person points at a house on the street. The CPU will then go to that address, ring its doorbell, open its front door, drag the data that's in there out, and put it into rdi. Thus, the 31337 in this context is the memory address and serves to point to the data stored at that memory address. After this instruction executes, the value stored in rdi will be 42!

Let's put this into practice! I've stored a secret number at memory address 133700, as so:

  Address │ Contents
+────────────────────+
│ 133700  │ ???      │
+────────────────────+

You must retrieve this secret number and use it as the exit code for your program. To do this, you must read it into rdi, whose value, if you recall, is the first parameter to exit and is used as the exit code. Good luck!

You look like you need just a tiny bit more practice. In this level, we put the secret value at 123400 instead of 133700, as so:

  Address │ Contents
+────────────────────+
│ 123400  │ ???      │
+────────────────────+

Go load it into rdi and exit with that as the exit code!

Did you prefer to access memory at 133700 or at 123400? Your answer might say something about your personality, but it's not super relevent from a technical perspective. In fact, in most cases, you don't deal with actual memory addresses when writing programs at all!

How is this possible? Well, typically, memory addresses are stored in registers, and we use the values in the registers to point to data in memory! Let's start with this memory configuration:

  Address │ Contents
+────────────────────+
│ 133700  │ 42       │
+────────────────────+

And consider this assembly snippet:

mov rdi, 133700

Now, what you have is the following situation:

  Address │ Contents
+────────────────────+
│ 133700  │ 42       │◂┐
+────────────────────+ │
                       │
 Register │ Contents   │
+────────────────────+ │
│ rdi     │ 133700   │─┘
+────────────────────+

Here, rdi now holds a value that corresponds with the address of the data that want to load! No, we load it:

mov rdi, [rax]

How interesting! Here, we are accessing memory, but instead of specifying a fixed address like 133700 during the dereference, we're using the value stored in rax as the memory address. By containing the memory address, rax is a pointer that points to the data we want to access! When we use rax in lieu of directly specifying the address that it stores to access the memory address that it references, we call this dereferencing the pointer. In the above example, we dereference rax to load the data it points to (the value 42 at address 133700) into rdi. Neat!

This also drives home another point: these registers are general purpose! Just because we've been using rax as the syscall index in our challenges so far doesn't mean that it can't have other uses as well. Here, it's used as a pointer to our secret data in memory.

Similarly, the data in the registers doesn't have an implicit purpose. If rax contains the value 133700 and we write mov rdi, [rax], the CPU uses the value as a memory address to dereference. But if we write mov rdi, rax in the same conditions, the CPU just happily puts 133700 into rdi. To the CPU, data is data; it only becomes differentiated when it's used in different ways.

In this challenge, we've initialized rax to contain the address of the secret data we've stored in memory. Dereference rax to the secret data into rdi and use it as the exit code of the program to get the flag!

In the previous level, you dereferenced rax to read data into rdi. The interesting thing here is that our choice of rax was pretty arbitrary. We could have used any other pointer, even rdi itself! Nothing stops you from dereferencing a register to overwrite its own content with the dereferenced value!

For example, here is us doing this exact thing with rax. I've annotated each line with comments:

mov [133700], 42
mov rax, 133700  # after this, rax will be 133700
mov rax, [rax]   # after this, rax will be 42

Throughout this snippet, rax goes from being used as a pointer to being used to hold the data that's been read from memory. The CPU makes this all work!

In this challenge, you'll explore this concept. Rather than initializing rax, as before, we've made rdi the pointer to the secret value! You'll need to dereference it to load that value into rdi, then exit with that value as the exit code. Good luck!

So now you can dereference pointers in memory like a pro! But pointers don't always point directly at the data you need. Sometimes, for example, a pointer might point to a collection of data (say, an entire book), and you'll need to reference partway into this collection for the specific data you need.

For example, if your pointer (say, rdi) points to a sequence of numbers in memory, as so:

  Address │ Contents
+────────────────────+
│ 133700  │ 50       │◂┐
│ 133701  │ 42       │ │
│ 133702  │ 99       │ │
│ 133703  │ 14       │ │
+────────────────────+ │
                       │
 Register │ Contents   │
+────────────────────+ │
│ rdi     │ 133700   │─┘
+────────────────────+

If you want the second number of that sequence, you could do:

mov rax, [rdi+1]

Wow, super simple! In memory terms, we call these number slots bytes: each memory address represents a specific byte of memory. The above example is accessing memory 1 byte after the memory address pointed to by rax. In memory terms, we call this 1 byte difference an offset, so in this example, there is an offset of 1 from the address pointed to by rdi.

Let's practice this concept. As before, we will initialize rdi to point at the secret value, but not directly at it. This time, the secret value will have an offset of 8 bytes from where rdi points, something analogous to this:

  Address │ Contents
+────────────────────+
│ 31337   │ 0        │◂┐
│ 31337+1 │ 0        │ │
│ 31337+2 │ 0        │ │
│ 31337+3 │ 0        │ │
│ 31337+4 │ 0        │ │
│ 31337+5 │ 0        │ │
│ 31337+6 │ 0        │ │
│ 31337+7 │ 0        │ │
│ 31337+8 │ ???      │ │
+────────────────────+ │
                       │
 Register │ Contents   │
+────────────────────+ │
│ rdi     │ 31337    │─┘
+────────────────────+

Of course, the actual memory address is not 31337. We'll choose it randomly, and store it in rdi. Go dereference rdi with offset 8 and get the flag!

Pointers can get even more interesting! Imagine that your friend lives in a different house on your street. Rather than remembering their address, you might write it down, and store the paper with their house address in your house. Then, to get data from your friend, you'd need to point the CPU at your house, have it go in there and find the friend's address, and use that address as a pointer to their house.

Similarly, since memory addresses are really just values, they can be stored in memory, and retrieved later! Let's explore a scenario where we store the value 133700 at the address 123400, and store the value 42 at the address 133700. Consider the following instructions:

mov rdi, 123400    # after this, rdi becomes 133700
mov rax, [rdi]     # here we dereference rdi, reading 42 into rax!

Wow! This storing of addresses is extremely common in programs. Addresses and data are stored, loaded, moved around, and, sometimes, mixed up with each other! When that happens, security issues can arise, and you'll romp through many such issues during your pwn.college journey.

For now, let's practice dereferencing an address stored in memory. I'll store a secret value at a secret address, then store that secret address at the address 567800. You must read the address, dereference it, get the secret value, and then exit with it as the exit code. You got this!

In the last few levels, you have:

  • Used an address that we told you (in one level, 133700, and in another, 123400) to load a secret value from memory.
  • Used an address that we put into rax for you to load a secret value from memory.
  • Used an address that we told you (in the last level, 567800) to load the address of a secret value from memory into a register, then used that register as a pointer to retrieve the secret value from memory!

Let's put those last two together. In this challenge, we stored our SECRET_VALUE in memory at the address SECRET_LOCATION_1, then stored SECRET_LOCATION_1 in memory at the address SECRET_LOCATION_2. Then, we put SECRET_ADDRESS_2 into rax! The result looks something like this, using 133700 for SECRET_LOCATION_1 and 123400 for SECRET_LOCATION_2 (not, in the real challenge, these values will be different and hidden from you!):

     Address │ Contents
   +────────────────────+
 ┌─│ 133700  │ 123400   │◂┐
 │ +────────────────────+ │
 └▸│ 123400  │ 42       │ │
   +────────────────────+ │
                          │
                          │
                          │
    Register │ Contents   │
   +────────────────────+ │
   │ rdi     │ 133700   │─┘
   +────────────────────+

Here, you will need to perform two memory reads: one dereferencing rax to read SECRET_LOCATION_1 from the location that rax is pointing to (which is SECRET_LOCATION_2), and the second one dereferencing whatever register now holds SECRET_LOCATION_1 to read SECRET_VALUE into rdi, so you can use it as the exit code!

That sounds like a lot, but you've done basically all of this already. Go put it together!

Okay, let's stretch that to one more depth! We've added an additional level of indirection in this challenge, so now you'll need three dereferences to find the secret value. Something like this:

     Address │ Contents
   +────────────────────+
 ┌─│ 133700  │ 123400   │◂──┐
 │ +────────────────────+   │
 └▸│ 123400  │ 100000   │─┐ │
   +────────────────────+ │ │
   │ 100000  │ 42       │◂┘ │
   +────────────────────+   │
                            │
                            │
    Register │ Contents     │
   +────────────────────+   │
   │ rdi     │ 133700   │───┘
   +────────────────────+

As you can see, we'll place the first address that you must dereference into rdi. Go get the value!


30-Day Scoreboard:

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

Rank Hacker Badges Score