Cryptography


CSE 365 - Fall 2024

Solve various cryptography challenges ranging from decoding base64 data to performing a simplified TLS handshake.


Lectures and Reading


Challenges

Strangely enough, we'll start our crypto journey with the humble Exclusive Or (XOR) operator. An XOR is one of the most common bitwise operators that you will encounter in your security journey, especially in cryptography. A couple of terms to unpack here...

Bitwise. Computers think in binary! That is, they conceptualize numbers in base 2, so something like 9 is expressed as 1001. An XOR operates on one pair of bits at a time, resulting in in 1 if the bits are different (one is 1 and the other is 0) or 0 if they are the same (both 1 or both 0). It is then applied to every bit pair independently, and the results are concatenated. For example, decimal 9 (1001) XORed with decimal 5 (0101) results in 1100 (decimal 12).

Cryptography. Why is XOR so common in crypto? In cryptography, it is common because it is self-inverse! That is (using ^ for XOR here, which is consistent with many programming languages), 5 ^ 9 == 12, and 12 ^ 9 == 5. If the number 9 is a key only known to you and me, I can send you messages by XORing them with 9, and you can recover the message with XORing them with 9 as well! Obviously, we can achieve this property with me adding 9 and you subtracting 9, without using XOR, but this requires more complex circuitry and extra bits (e.g., to handle "carrying the 1" in 1111 + 0001 == 10000), whereas XOR does not have this problem (1111 ^ 0001 == 1110).

In this level, you will learn to XOR! We'll give you a shared key, XOR a secret number with it, and expect you to recover the number.


HINT: Use Python's ^ operator to XOR integers!

Reasoning about binary data (base 2) using decimal (base 10) numbers is confusing, because decimal does not have clean bit boundaries. That is, a single decimal digit can represent 10 values (from 0 to 9). A single binary digit (bit) can represent two values (0 and 1), two bits can represent four values (00, 01, 10, and 11), three bits can represent eight values (000, 001, 010, 011, 100, 101, 110, 111), and four bits can represent sixteen values. Ten values are represented by roughly log2(10) == 3.3219... bits, and you get weird situations like binary 1001 being decimal 9, but binary 1100 (still 4 binary digits) being 12 (two decimal digits!). This variability makes it hard to spot-translate numbers between decimal and binary in general: we can work out that 97 is 110001, but it's hard to see that at a glance.

It's much easier to spot-translate between bases that have more alignment between digits. For example, a single hexadecimal (base 16) digit can represent 16 values (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f): the same number of values that binary can represent in 4 digits! This allows us to have a super simple mapping:

Hex Binary Decimal
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
a 1010 10
b 1011 11
c 1100 12
d 1101 13
e 1110 14
f 1111 15

This mapping from a hex digit to 4 bits is something that's easily memorizable (most important: memorize 1, 2, 4, and 8, and you can quickly derive the rest). Better yet, two hex digits is 8 bits, which is one byte! Unlike decimal, where you'd have to memorize 16 mappings for 4 bits and 256 mappings for 8 bits, with hexadecimal, you only have to memorize 16 mappings for 4 bits and the same amount of mappings for 8 bits, since it's just two hexadecimal digits concatenated! Some examples:

Hex Binary Decimal
00 0000 0000 0
0e 0000 1110 14
3e 0111 1110 62
e3 1110 0111 227
ee 1110 1110 238

Now you're starting to see the beauty. This gets even more obvious when you expand beyond one byte of input, but we'll let you find that out through future challenges!

Now, let's talk about notation. How do you differentiate 11 in decimal, 11 in binary (which equals 3 in decimal), and 11 in hex (which equals 17 in decimal)? Python's notation is to prepend binary data with 0b, hexadecimal with 0x, and keep decimal as is, resulting in 11 == 0b1011 == 0xb, 3 == 0b11 == 0x3, and 17 == 0b10001 == 0x11.

Some other Pythonisms that might be useful:

  • If you print(n) a number or convert it to a string with str(n), the number will be represented in base 10.
  • You can get a hexadecimal string representation of a number using hex(n).
  • You can get a binary string representation of a number using bin(n).
  • Converting a string to a number with int(s) will read it as a base 10 number by default.
  • You can specify a different base to use with a second argument: int(s, 16) will interpret the string as hex, int(s, 2) will interpret it as binary.
  • You can try to auto-identify the number base using int(s, 0), which requires a prefex on the string (0b or binary, 0x for hex, nothing for decimal).

Armed with this knowledge, go and hex the challenge and get the flag!

Computers might think in binary, but on your terminal, you see letters! How does this happen? Through encodings.

An encoding is a mapping from some sequence of binary digits (bits) to a letter. The mapping itself is just something made up by some people somewhere, and there have been many such mappings throughout history. For example, the mapping that powers the modern internet, including the all-important emojis that you send to your friends and earn by completing pwn.college dojos, is UTF-8. UTF-8 describes how one or more bytes (each byte is 8 bits) corresponds to a character. Luckily for our purposes as an English-language educational resource, the encoding of the Latin alphabet in UTF-8 is quite simple, and harkens back to another encoding standard called ASCII.

ASCII is, by computing standards, an ancient encoding, dating back to 1963. Modern ASCII is pretty simple: every character is one byte (8 bits), uppercase letters are 0x40+letter_index (e.g., A is 0x41, F is 0x46, and Z is 0x5a), lowercase letters are 0x60+letter_index (a is 0x61, f is 0x66, and z is 0x7a), and numbers (yes, the numeric characters you're seeing are not bytes of those values, they are ASCII-encoded number characters) are 0x30+number, so 0 is 0x30 and 7 is 0x37. Special characters are sprinkled around the mapping as well: forward slash (/ is 0x2f), space is 0x20, and newline is 0x0a. You can see the whole ASCII table with man ascii!

One cool thing is that, since ASCII puts byte values to characters, we can do operations like XOR! This has obvious implications for cryptography.

In this level, we'll explore these implications byte by byte. The challenge will give you one letter a time, along with a key to "decrypt" (XOR) the letter with. You give us the result of the XOR. For example:

hacker@dojo:~$ /challenge/run
Challenge number 0...
- Encrypted Character: A
- XOR Key: 0x01
- Decrypted Character?

How would you approach this? You can man ascii and find the entry for A:

Oct   Dec   Hex   Char
──────────────────────
101   65    41    A

So A is 0x41 in hex. You would XOR that with 0x01 The result here would be: 0x41 ^ 0x01 == 0x40, and, according to man ascii:

Oct   Dec   Hex   Char
──────────────────────
100   64    40    @

It's the @ character!

hacker@dojo:~$ /challenge/run
Challenge number 0...
- Encrypted Character: A
- XOR Key: 0x01
- Decrypted Character? @
Correct! Moving on.

Now it's your turn! Can you XOR things up and get the flag?

Okay, now you know how to XOR ASCII characters. This is a critical step as we build up to our first cryptosystem, but now, we need to XOR entire ASCII strings! Let's try this.

Like Python provides the ^ operator to XOR integers, a Python library called PyCryptoDome provides a function called strxor to XOR two strings of characters together. You can import it in Python using from Crypto.Util.strxor import strxor.

XORing two strings is done byte by byte, just like XORing two bytes is done bit by bit. So, to draw on an earlier example:

hacker@dojo:~$ python
>>> from Crypto.Util.strxor import strxor
>>> strxor(b"AAA", b"16/")
b'pwn'

You can verify this yourself with the ASCII table: A ^ 1 is p, A ^ 6 is w, and A ^ / is n. We just decrypted the ciphertext AAA with the key 16/ to retrieve the plaintext pwn.

In this challenge, you'll do this several times in a row: like the previous challenge, but with strings! Good luck!


CAVEAT: What are these bs prepended to the quotes? Python's default string representation (e.g., "AAA") is Unicode, and unlike, say, the Latin alphabet, Unicode encompasses all characters known to humanity (including the Latin alphabet)! This means a single character can have thousands of different values (when this text was written, Unicode encompassed 154,998 characters!), from "A" to "💩".

Unfortunately, a single byte of 8 bits can only hold 2**8 == 256 different values, which is enough for ASCII (not that many letters/numbers/etc in the Latin alphabet), but not enough for Unicode. Unicode is encoded using different encodings, such as the UTF-8 we mentioned earlier. UTF-8 is designed to be backwards-compatible with ASCII "A" is just 0x41, something like "💩" is four bytes: f0 9f 92 a9!

Basically, ASCII is to The Latin Alphabet as UTF-8 is to Unicode, and in the same way that the Latin alphabet is a subset of Unicode, ASCII is a subset of UTF-8. Wild.

Anyways, Python's normal strings (and, typically, input() you get from the terminal) are Unicode, but some functions, such as strxor, consume and produce bytes. You can specify them directly, as I did above, by prepending your quotes with b (for bytes) and using ASCII or hex encoding (e.g., b"AAA" and b"A\x41\x41" are equivalent), or you can encode a Unicode string into bytes using UTF-8, as such: "AAA".encode() == b"AAA" or "💩".encode() == b"\xf0\x9f\x92\xa9". You can also decode the resulting bytes back into Unicode strings: b"AAA".decode() == "AAA" or b"\xf0\x9f\x92\xa9".decode() == "💩".

This is further complicated by the fact that UTF-8 can't turn any arbitrary bytes into Unicode. For example, b'\xb0'.decode() raises an exception. You can fix this by abandoning the default UTF-8 and using a pre-Unicode non-encoding encoding like "latin"/ISO-8859-1, from the ancient days of computing, as so: b'\xb0'.decode('latin'). While ISO-8859-1 originally predated Unicode, its Python implementation converts to Unicode strings. However, keep in mind that this encoding is different from UTF-8: b"\xb0".encode('latin").decode() == b'\xc2\xb0'. You must, instead, be consistent and decode and encode with the same encoding: b"\xb0".encode('latin").decode(latin1) == b"\xb0".

Anyways, all this sounds terrifying, but it's mostly a warning for the future. For this level, we VERY carefully chose the characters so that you don't run into these issues.

CAUTION: Python's strings-vs-bytes situation is terrible and will byte (haha!) you eventually. There's no way to avoid pitfalls --- they still get us after years and years of using Python, so you will just have to learn to pick yourself up, brush yourself off, fix your code, and carry on. With enough experience under your belt, you will improve from losing entire freaking days to bugs caused by string/bytes mixups to merely entire freaking hours.

In this challenge you will decode base64-encoded data. Despite base64 data appearing "mangled", it is not an encryption scheme. It is an encoding, much like ASCII. It is simply a popular way of encoding raw bytes.

The name "base64" comes from the fact that there are 64 characters used in the output. These can actually vary, but the standard base64 encoding uses an "alphabet" of the uppercase letters A through Z, the lowercase letters a through z, the digits 0 through 9, and the + and / symbols. This results in 64 total output symbols, and each symbol can encode 2**6 (2 to the power of 6) possible input symbols, or 6 bits of data. That means that two encode a single byte (8 bits) of input, you need more than one base64 output character. In fact, you need two: one that encodes the first 6 bytes and one that encodes the remaining 2 (with 4 bytes of that second output character being unused). To mark these unused bytes, base64 encoded data appends an = for every two unused bytes. For example:

hacker@dojo:~$ echo -n A | base64
QQ==
hacker@dojo:~$ echo -n AA | base64
QUE=
hacker@dojo:~$ echo -n AAA | base64
QUFB
hacker@dojo:~$ echo -n AAAA | base64
QUFBQQ==
hacker@dojo:~$

As you can see, 3 bytes (3*8 == 24 bits) encode precisely into 4 base64 characters (4*6 == 24 bits).

base64 is a popular encoding because it can represent any data without using "tricky" characters such as newlines, spaces, quotes, semicolons, unprintable special characters, and so on. You might recall that such characters can cause trouble in certain scenarios, and base64-encoding the data avoids this nicely. For the purposes of this module, base64 will be used to give you random and encrypted data, all of which is terribly unprintable and much easier to input and output as base64.

Other popular encodings are base2 (typically just 0 and 1), base10 (typically 0 through 9), base16 (0 through 9 and A through F). This last one is especially interesting: a single character in base16 encodes 4 bits, making a single byte of binary data representable using two base16 characters. This is used all over computing: for example, the %20 that you've seen in URL encoding is base16 for the ASCII space character. In binary, ASCII's space is 0010 0000, which translates to 20 in base16.

Anyways, that was a long tangent. Go run /challenge/run and decode the flag!


HINT: You can use Python's base64 module (note: the base64 decoding functions in this module consume and return Python bytes) or the base64 command line utility to do this!

FUN FACT: The flag data in pwn.college{FLAG} is actually base64-encoded ciphertext. You're well on the way to being able to build something like the dojo!

In this challenge you will decrypt a secret encrypted with a one-time pad. Although simple, this is the most secure encryption mechanism, if a) you can securely transfer the key and b) you only ever use the pad once. It's also the most simple encryption mechanism: you simply XOR the bits of the plaintext with the bits of the key one by one!

This challenge encrypts the flag with a one-time pad and then gives you the key. Luckily, a one-time pad is a symmetric cryptosystem: that is, you use the same key to encrypt and to decrypt, so you have everything you need to decrypt the flag!


Fun fact: the One-time Pad is the only cryptosystem that humanity has been able to prove is perfectly secure. If you securely transfer the key, and you only use it for one message, it cannot be cracked even by attackers with infinite computational power! We have not been able to make this proof for any other cryptosystem.

The previous challenge gave you the one time pad to decrypt the ciphertext. If you did not know the one time pad, and it was only ever used for one message, the previous challenge would be unsolvable! In this level, we'll explore what happens if the latter condition is violated. You don't get the key this time, but we'll let you encrypt as many messages as you want. Can you decrypt the flag?


Hint: think deeply about how XOR works, and consider that it is a distributative, commutative, and associative operation...

Hint: we recommend writing your solution in Python and using the strxor function that we use in the challenge! It makes life much simpler.

So, One Time Pads fail when you reuse them. This is suboptimal: given how careful one has to be when transferring keys, it would be better if the key could be used for more than just a single message!

Enter: the Advanced Encryption Standard, AES. AES is relatively new: coming on the scene in 2001. Like a One-time Pad, AES is also symmetric: the same key is used to encrypt and decrypt. Unlike a One-time Pad, AES maintains security for multiple messages encrypted with the same key.

In this challenge you will decrypt a secret encrypted with Advanced Encryption Standard (AES).
AES is what is called a "block cipher", encrypting one plaintext "block" of 16 bytes (128 bits) at a time. So AAAABBBBCCCCDDDD would be a single block of plaintext that would be encrypted into a single block of ciphertext.

AES must operate on complete blocks. If the plaintext is shorter than a block (e.g., AAAABBBB), it will be padded to the block size, and the padded plaintext will be encrypted.

Different AES "modes" define what to do when the plaintext is longer than one block. In this challenge, we are using the simplest mode: "Electronic Codebook (ECB)". In ECB, each block is encrypted separately with the same key and simply concatenated together. So if you are encrypting something like AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHH, it will be split into two plaintext blocks (AAAABBBBCCCCDDDD and EEEEFFFFGGGGHHHH), encrypted separately (resulting, let's imagine, in UVSDFGIWEHFBFFCA and LKXBFVYASLJDEWEU), then concatenated (resulting the ciphertext UVSDFGIWEHFBFFCALKXBFVYASLJDEWEU).

This challenge will give you the AES-encrypted flag and the key used to encrypt it. We won't learn about the internals of AES, in terms of how it actually encrypts the raw bytes. Instead, we'll learn about different applications of AES, and how they break down in practice. If you're interested in learning about AES internals, we can highly recommend CryptoHack, an amazing learning resource that focuses on the nitty gritty details of crypto!

Now, go decrypt the flag and score!


HINT: We use the PyCryptoDome library to implement the encryption in this level. You'll want to read its documentation to figure out how to implement your decryption!

Though the core of the AES crypto algorithm is thought to be secure (not proven to be, though: no one has managed to do that! But no one has managed to significantly break the crypto in the 20+ years of its use, either), this core only encrypts 128-bit (16 byte) blocks at a time. To actually use AES in practice, one must build a cryptosystem on top of it.

In the previous level, we used the AES-ECB cryptosystem: an Electronic Codebook Cipher where every block is indendently encrypted by the same key. This system is quite simple but, as we will discover here, extremely susceptible to a certain class of attack.

Cryptosystems are held to very high standard of ciphertext indistinguishability. That is, an attacker that lacks the key to the cryptosystem should not be able to distinguish between pairs of ciphertext based on the plaintext that was encrypted. For example, if the attacker looks at ciphertexts UVSDFGIWEHFBFFCA and LKXBFVYASLJDEWEU, and is able to determine that the latter was produced from the plaintext EEEEFFFFGGGGHHHH (or, in fact, figure out any information about the plaintext at all!), the cryptosystem is considered broken. This property must hold even if the attacker already knows part or all of the plaintext, a setting known as the Known Plaintext Attack, or can even control part or all of the plaintext, a setting known as the Chosen Plaintext Attack!

ECB is susceptible to both known and chosen plaintext attack. Because every block is encrypted with the same key, with no other modifications, an attacker can observe identical ciphertext across different blocks that have identical plaintext. Moreover, if the attacker can choose or learn the plaintext associated with some of these blocks, they can carefully build a mapping from known-plaintext to known-ciphertext, and use that as a lookup table to decrypt other matching ciphertext!

In this level, you will do just this: you will build a codebook mapping from ciphertext to chosen plaintext, then use that to decrypt the flag. Good luck!


HINT: You might find it helpful to automate interactions with this challenge. You can do so using the pwntools Python package. Check out this pwntools cheatsheet from a fellow pwn.college student!

Okay, now we'll try that attack in a slightly more realistic scenario. Can you remember your SQL to carry out the attack and recover the flag?


HINT: Remember that you can make select return chosen plaintext by doing SELECT 'my_plaintext'!

Okay, now let's complicate things slightly to increase the realism. It's rare that you can just craft queries for the plaintext that you want. However, it's less rare that you can isolate the tail end of some data into its own block, and in ECB, this is bad news. We'll explore this concept in this challenge, replacing your ability to query substrings of the flag with just an ability to encrypt some bytes off the end.

Show us that you can still solve this!


HINT: Keep in mind that, once you recover some part of the end of the flag, you can build a new codebook with additional prefixes of the known parts, and repeat the attack on the previous byte!

Okay, now let's complicate things slightly. It's not so common that you can just chop off the end of interesting data and go wild. However, much more common is the ability to prepend chosen plaintext to a secret before it's encrypted. If you carefully craft the prepended data so that it pushes the end of the secret into a new block, you've just successfully isolated it, accomplishing the same as if you were chopping it off!

Go ahead and do that in this challenge. The core attack is the same as before, it just involves more data massaging.


HINT: Keep in mind that a typical pwn.college flag is somewhere upwards of 50 bytes long. This is four blocks (three full and one partial), and the length can vary slightly. You will need to experiment with how many bytes you must prepend to push even one of the end characters to its own block.

HINT: Keep in mind that blocks are 16 bytes long! After you leak the last 16 bytes, you'll be looking at the second-to-last block, and so on.

The previous challenge ignored something very important: padding. AES has a 128-bit (16 byte) block size. This means that input to the algorithm must be 16 bytes long, and any input shorter than that must be padded to 16 bytes by having data added to the plaintext before encryption. When the ciphertext is decrypted, the result must be unpadded (e.g., the added padding bytes must be removed) to recover the original plaintext.

How to pad is an interesting question. For example, you could pad with null bytes (0x00). But what if your data has null bytes at the end? They might be erroneously removed during unpadding, leaving you with a plaintext different than your original! This would not be good.

One padding standard (and likely the most popular) is PKCS7, which simply pads the input with bytes all containing a value equal to the number of bytes padded. If one byte is added to a 15-byte input, it contains the value 0x01, two bytes added to a 14-byte input would be 0x02 0x02, and the 15 bytes added to a 1-byte input would all have a value 0x0f. During unpadding, PKCS7 looks at the value of the last byte of the block and removes that many bytes. Simple!

But wait... What if exactly 16 bytes of plaintext are encrypted (e.g., no padding needed), but the plaintext byte has a value of 0x01? Left to its own devices, PKCS7 would chop off that byte during unpadding, leaving us with a corrupted plaintext. The solution to this is slightly silly: if the last block of the plaintext is exactly 16 bytes, we add a block of all padding (e.g., 16 padding bytes, each with a value of 0x10). PKCS7 removes the whole block during unpadding, and the sanctity of the plaintext is preserved at the expense of a bit more data.

Anyways, the previous challenge explicitly disabled this last case, which would have the result of popping in a "decoy" ciphertext block full of padding as you tried to push the very first suffix byte to its own block. This challenge pads properly. Watch out for that "decoy" block, and go solve it!


NOTE: The full-padding block will only appear when the last block of plaintext perfectly fills 16 bytes. It'll vanish when one more byte is appended (replaced with the padded new block containing the last byte of plaintext), but will reappear when the new block reaches 16 bytes in length.

This is the miniboss of AES-ECB-CPA. You don't get an easy way to build your codebook anymore: you must build it in the prefix. If you pad your own prefixed data yourself, you can control entire blocks, and that's all you need! Other than that, the attack remains the same. Good luck!


NOTE: Keep in mind that, unlike the previous levels, this level takes data in base64!

Okay, time for the AES-ECB-CPA final boss! Can you carry out this attack against an encrypted secret storage web server? Let's find out!

Okay, hopefully we agree that ECB is a bad block cipher mode. Let's explore one that isn't so bad: Cipher Block Chaining (CBC). CBC mode encrypts blocks sequentially, and before encrypting plaintext block number N, it XORs it with the previous ciphertext block (number N-1). When decrypting, after decrypting ciphertext block N, it XORs the decrypted (but still XORed) result with the previous ciphertext block (number N-1) to recover the original plaintext block N. For the very first block, since there is no "previous" block to use, CBC cryptosystems generate a random initial block called an Initialization Vector (IV). The IV is used to XOR the first block of plaintext, and is transmitted along with the message (often prepended to it). This means that if you encrypt one block of plaintext in CBC mode, you might get two blocks of "ciphertext": the IV, and your single block of actual ciphertext.

All this means that, when you change any part of the plaintext, those changes will propagate through to all subsequent ciphertext blocks because of the XOR-based chaining, preserving ciphertext indistinguishability for those blocks. That will stop you from carrying out the chosen-plaintext prefix attacks from the last few challenges. Moreover, every time you re-encrypt, even with the same key, a new (random) IV will be used, which will propagate changes to all of the blocks anyways, which means that even your sampling-based CPA attacks from the even earlier levels will not work, either.

Sounds pretty good, right? The only relevant disadvantage that CBC has over EBC is that encryption has to happen sequentially. With ECB, you could encrypt, say, only the last part of the message if that's all you have to send. With CBC, you must encrypt the message from the beginning. In practice, this does not tend to be a problem, and ECB should never be used over CBC.

This level is just a quick look at CBC. We'll encrypt the flag with CBC mode. Go and decrypt it!

CBC-based cryptosystems XOR the previous block's ciphertext to recover the plaintext of a block after decryption. This done for many reasons, including:

  1. This XOR is what separates it from ECB mode, and we've seen how fallible ECB is.
  2. If it XORed the plaintext of the previous block instead of the ciphertext, the efficacy would be dependent on the plaintext itself (for example, if the plaintext was all null bytes, the XOR would have no effect). Aside from reducing the chaining effectiveness, this could leak information about the plaintext (big no no in cryptosystems)!
  3. If it XORed the plaintext of the previous block instead of the ciphertext, the "random access" property of CBC, where the recipient of a message can decrypt starting from any block, would be lost. The recipient would have to recover the previous plaintext, for which they would have to recover the one before that, and so on all the way to the IV.

Unfortunately, in situations where the message could be modified in transit (think: Intercepting Communications), a crafty attacker could directly influence the resulting decrypted plaintext of block N by XORing carefully-chosen values into the ciphertext of block N-1. This would corrupt block N-1 (because it would decrypt to garbage), but depending on the specific situation, this might be acceptable. Moreover, doing this to the IV allows the attacker to XOR the plaintext of the first block without corrupting any block!

In security terms, CBC preserves (imperfectly, as we'll see in the next few challenges) Confidentiality, but does not preserve Integrity: the messages can be tampered with by an attacker!

We will explore this concept in this level, where a task dispatcher will dispatch encrypted tasks to a task worker. Can you force a flag disclosure?

So now you can modify AES-CBC encrypted data without knowing the key! But you got lucky: sleep and flag! were the same length. What if you want to achieve a different length?


HINT: Don't forget about the padding! How does the padding work?

So you can manipulate the padding... If you messed up somewhere along the lines of the previous challenge and created an invalid padding, you might have noticed that the worker crashed with an error about the padding being incorrect!

It turns out that this one crash completely breaks the Confidentiality of the AES-CBC cryptosystem, allowing attackers to decrypt messages without having the key. Let's dig in...

Recall that PKCS7 padding adds N bytes with the value N, so if 11 bytes of padding were added, they have the value 0x0b. During unpadding, PKCS7 will read the value N of the last byte, make sure that the last N bytes (including that last byte) have that same value, and remove those bytes. If the value N is bigger than the block size, or the bytes don't all have the value N, most implementations of PKCS7, including the one provided by PyCryptoDome, will error.

Consider how careful you had to be in the previous level with the padding, and how this required you to know the letter you wanted to remove. What if you didn't know that letter? Your random guesses at what to XOR it with would cause an error 255 times out of 256 (as long as you handled the rest of the padding properly, of course), and the one time it did not, by known what the final padding had to be and what your XOR value was, you can recover the letter value! This is called a Padding Oracle Attack, after the "oracle" (error) that tells you if your padding was correct!

Of course, once you remove (and learn) the last byte of the plaintext, the second-to-last byte becomes the last byte, and you can attack it! And when you recover the entire last block, you can simply discard it, making the second-to-last block the last block.

So, what are you waiting for? Go recover the flag!


HINT: You'll need to slightly adjust this attack for the 16th byte of a block, since there is no padding at all initially, but I trust in your ability to do so!

HINT: The previous challenges had just one ciphertext block, and you messed with its decryption by changing the IV. This level has multiple blocks. Keep in mind that to mess with the decryption of block N, you must modify ciphertext N-1. For the first block, this is the IV, but not for the rest!

FUN FACT: The only way to prevent a Padding Oracle Attack is to avoid having a Padding Oracle. Depending on the application, this can be surprisingly tricky: a failure state is hard to mask completely from the user/attacker of the application, and for some applications, the padding failure is the only source of an error state! Moreover, even if the error itself is hidden from the user/attacker, it's often inferrable indirectly (e.g., by detecting timing differences between the padding error and padding success cases).

You're not going to believe this, but... a Padding Oracle Attack doesn't just let you decrypt arbitrary messages: it lets you encrypt arbitrary data as well! This sounds too wild to be true, but it is. Think about it: you demonstrated the ability to modify bytes in a block by messing with the previous block's ciphertext. Unfortunately, this will make the previous block decrypt to garbage. But is that so bad? You can use a padding oracle attack to recover the exact values of this garbage, and mess with the block before that to fix this garbage plaintext to be valid data! Keep going, and you can craft fully controlled, arbitrarily long messages, all without knowing the key! When you get to the IV, just treat it as a ciphertext block (e.g., plop a fake IV in front of it and decrypt it as usual) and keep going! Incredible.

Now, you have the knowledge you need to get the flag for this challenge. Go forth and forge your message!


FUN FACT: Though the Padding Oracle Attack was discovered in 2002, it wasn't until 2010 that researchers figured out this arbitrary encryption ability. Imagine how vulnerable the web was for those 8 years! Unfortunately, padding oracle attacks are still a problem. Padding Oracle vulnerabilities come up every few months in web infrastructure, with the latest (as of time of writing) just a few weeks ago!

So, you now (hopefully!) understand the use of AES and the various hurdles, but there has been one thing that we have not considered. If person A (commonly refered to as Alice) wants to encrypt some data and send it to person B (commonly refered to as Bob) using AES, they must first agree on a key. If Alice and Bob see each other in person, one might write the key down and hand it to the other. But this rarely happens --- typically, the key must be established remotely, with Alice and Bob on either end of a (not yet encrypted!) network connection. In these common cases, Alice and Bob must securely generate a key even if they are being eavesdropped upon (think: network sniffing)! Fun fact: typically, the eavesdropper is referred to as Eve.

An "oldie but goodie" algorithm for generating a secret key on a non-secret communication channel is the Diffie-Hellman Key Exchange! DHKE uses the power of mathematics (specifically, Finite Fields) to come up with a key. Let's take it step by step:

  1. First, Alice and Bob agree on a large prime number p to define their Finite Field (e.g., all further operations occur modulo p: a context where numbers go from 0 to p-1, and then loop around), along with a root g, and exchange them in the open, content to let Eve see them.
  2. Then, Alice and Bob each generate a secret number (a for Alice's and b for Bob's). These numbers are never shared.
  3. Alice computes A = (g ** a) mod p (g to the a power modulo p) and Bob computes B = (g ** b) mod p. Alice and Bob exchange A and B in the open.
  4. At this point, Eve will have p, g, A, and B, but will be unable to recover a or b. If it wasn't for the finite field, recovering a and b would be trivial via a logarithm-base-g: log_g(A) == a and log_g(B) == b. However, this does not work in a Finite Field under a modulo because, conceptually, we have no efficient way to determine how many times the g ** a computation "looped around" from p-1 to 0, and this is needed to compute the logarithm. This logarithm-in-a-finite-field problem is called the Discrete Logarithm, and there is no efficient way to solve this without using a quantum computer. Quantum computers' ability to solve this problem is the most immediate thing that makes them so dangerous to cryptography.
  5. Alice calculates s = (B ** a) mod p, and since B was (g ** b) mod p, this results in s = ((g ** b) ** a) mod p or, applying middle school math, s = (g ** (b*a)) mod p. Bob calculates s = (A ** b) mod p, and since A was (g ** a) mod p, this results in s = (g ** (a*b)) mod p. Since a*b == b*a, the s values computed by both Bob and Alice are equal!
  6. Eve cannot compute s because Eve lacks a or b. Eve could compute A ** B == g ** a ** g ** b, which reduces to something like g ** (a*(g**b)) and doesn't get Eve any closer to s! Eve could also compute A * B == (g ** a) * (g ** b) == g ** (a+b), but again, this is not the s == g ** (a*b) that Bob and Alice arrived at. Eve is out of luck!

Because A and B are public, they are termed public keys, with a and b being private keys. Furthermore, you may noticed in this level that the prime number p that we use is hardcoded and, in fact, there are recommended DHKE for many bitsizes. The standardization of these primes allows Alice and Bob to just publish A and B (though, in practice, p is also transmitted to support the use of different ps in certain scenarios).

In this challenge you will perform a Diffie-Hellman key exchange. Good luck!

You might have noticed that DH doesn't actually allow you to encrypt data directly: all it does is facilitate the generation of the same secret value for both Alice and Bob. This value cannot be chosen, what Alice and Bob get for s is uniquely determined by the values of a, b, p, and g!

This single-secret nature isn't necessarily a drawback of DHKE. That's just what it's for: letting you exchange a secret for further use.

So how do Alice and Bob actually exchange information using DHKE? Well, the hint is in the name: Diffie-Hellman Key Exchange. That secret value, of course, can be used as a key for, e.g., a symmetric cipher, and information can be encrypted with that cipher between Alice and Bob!

Armed with your knowledge of DHKE, you will now build your first cryptosystem that resembles something real! You'll use DHKE to negotiate an AES key, and the challenge will use that key to encrypt the flag. Decrypt it, and win!

Diffie-Hellman allow Alice and Bob to generate a single (but uncontrolled) shared secret with no pre-shared secret information. Next, we'll learn about another cryptosystem, RSA (Rivest–Shamir–Adleman), that allows Alice and Bob to generate arbitrary amounts of controlled messages, with no pre-shared secret information!

RSA uses a clever interaction of modular exponentiation (which you've experienced in DH) and Euler's Theorem to give Bob or Alice asymmetric control over an entire finite field. Alice generates two primes, p and q, and keeps them secret, then multiplies them to create n = p*q, which Alice publishes to define a Finite Field modulo n. Euler's Theorem and knowledge of p and q gives Alice, and only Alice, full abilities within this specific field (which is a difference from DH, where all actors have equal capabilities in the field!).

Euler's Theorem tells us that operations in an exponent of a field modulo p*q (e.g., the e*d of m**(e*d) mod n) take place in the field of (p-1)*(q-1). The why of this theorem is some advanced math stuff that, to be honest, few people understand, but the results are interesting. Computing (p-1)*(q-1) is trivial for Alice (armed with knowledge of p and q) but impossible to anyone else (assuming that p and q are large), because the human race lacks an efficient algorithm to factor products of large prime numbers!

Recall that e*d in the exponent of m**(e*d) mod n? For any e, knowing (p-q)*(q-1) allows Alice to compute a d such that e*d == 1. While this seems silly, it is the core of RSA. Alice chooses a number e (typically fairly small to reduce computation costs, but not too small to cause certain security issues) and computes the corresponding multiplicative inverse d. This leads to encryption of plaintext m (m**e mod n == c) and decryption! c**d mod n == (m**e)**d mod n == m**(e*d) mod n == m**1 mod n == m. Rather than DH's single and uncontrolled s, RSA messages m can be chosen arbitrarily (up to the size of n, as the field is unable to represent larger numbers).

RSA is asymmetric. Alice shares n and e as the public key, and keeps d as the private key. Knowing n and e, Bob can encrypt messages and send them to Alice, and only Alice can decrypt them. Since e*d == d*e, Alice can use d to encrypt messages, but anyone can decrypt them, since e is public. This might sound silly, but it is useful for, e.g., attesting that a given message could come only from Alice, since knowledge of d is required for this.

To respond to Bob, Alice would need Bob's own public key, which would be Bob's n (different from Alice's n, using Bob's own secret p and q!) and e (which is typically the same smallest-safe value, currently 65537 but subject to change as new attacks are found).

In this challenge you will decrypt a secret encrypted with RSA (Rivest–Shamir–Adleman). You will be provided with both the public key and private key this time, to get a feel for how all this works. Go for it!

Alice's superpower under modulo n comes from knowledge of p and q, and, thus, the ability to compute the multiplicative inverse of e in the exponent. One worry of everyone who uses RSA is that their n will get factored, and attackers will gain p and q.

This is not an unreasonable worry. While we believe that factoring is hard, we have no actual proof that it is. It is not outside of the realm of possibility that, tomorrow, Euler 2.0 will publish an algorithm for doing just this. However, we do know that functional quantum computers can factor: Euler 2.0 (actually, Peter Shor) already came up with the algorithm! When quantum computers get to a sufficient power level, RSA is cooked.

In this challenge, we give you the quantum computer (or, at least, we give you n's factors)! Use them to decrypt the flag that we encrypted with RSA (Rivest–Shamir–Adleman).

So by using d, Alice can encrypt data that (because n and e are in the public key) anyone can decrypt... This might seem silly, but it actually enables a capability that we haven't yet seen in the module: the ability to attest to multiple people that a message came from Alice. This can serve as a sort of cryptographic version of a pen-and-ink signature and, in fact, it is called a signature!

This level will explore one application (and pitfall) of RSA signatures. Recall that c == m**e mod n, and recall from middle school that (x**e)*(y**e) == (x*y)**e. This holds just as well in mod n, and you can probably see the issue here...

This level gives you a signing oracle. Go use it to craft a flag command!

As you saw, raw RSA signatures are a bad idea, as they can be forged. In practice, what people sign are cryptographic hashes of things. A hash is a one-way function that takes an arbitrary amount of input (e.g., bytes or gigabytes or more) and outputs a short (e.g., 32 bytes) of output hash. Any changes in the input to the hash will diffuse all over the resulting cryptographic hash in a way that is not reversible.

Thus, secure hashes are a good representation for the original data: if Alice signs a hash of a message, that message can be seen as being signed as well. Better yet, since hashes are not controllably reversible or modifiable, an attacker being able to modify a hash does not allow them to forge a signature on a new message.

The bane of cryptographic hashing algorithms is collision. If an attacker can craft two messages that hash to the same thing, the security of any system that depends on the hash (such as the RSA signature scheme described above) might be compromised. For example, consider that the security of bitcoin depends fully on the collision resistance of SHA256...

While full collisions of SHA256 don't exist, some applications use partial hash verification. This is not a great practice, as it makes it easier to brute-force a collision.

In this challenge you will do just that, hashing data with a Secure Hash Algorithm (SHA256). You will find a small hash collision. Your goal is to find data, which when hashed, has the same hash as the secret. Only the first 3 bytes of the SHA256 hash will be checked.

In this challenge you will hash data with a Secure Hash Algorithm (SHA256). You will compute a small proof-of-work. Your goal is to find response data, which when appended to the challenge data and hashed, begins with 2 null-bytes.

In this challenge you will complete an RSA challenge-response. You will be provided with both the public key and private key.

In this challenge you will complete an RSA challenge-response. You will provide the public key.

In this challenge you will work with public key certificates. You will be provided with a self-signed root certificate. You will also be provided with the root private key, and must use that to sign a user certificate.

In this challenge you will perform a simplified Transport Layer Security (TLS) handshake, acting as the server. You will be provided with Diffie-Hellman parameters, a self-signed root certificate, and the root private key. The client will request to establish a secure channel with a particular name, and initiate a Diffie-Hellman key exchange. The server must complete the key exchange, and derive an AES-128 key from the exchanged secret. Then, using the encrypted channel, the server must supply the requested user certificate, signed by root. Finally, using the encrypted channel, the server must sign the handshake to prove ownership of the private user key.


30-Day Scoreboard:

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

Rank Hacker Badges Score