Nibbling on Numbers


Computing 101.

A byte is eight bits; half of one (four bits) is a nibble. This module nibbles at numbers from the bit up: how a pile of bits comes to mean a positive or a negative number in binary, hexadecimal, and decimal. Understanding this is critical to truly knowing how the CPU processes bits into something more meaningful!


As you know, bytes are what is actually stored in your computer's memory. As you might also know, computers think in binary: just a bunch of ones and zeroes. For historical reasons, we express these ones and zeroes ("bits") in groups of 8, and each group of 8 (a "byte"). This number is purely arbitrary: early computers (pre-1960s or so) didn't have this grouping at all, or had other arbitrary groupings. It is very feasible for there to be an alternate universe in which a byte is 16, 32, or really any numbers of bits (though for math reasons, it'll likely remain a power-of-2).

A single binary digit (bit) can represent two values (0 and 1), two bits can represent four values (00, 01, 10, and 11), three bits can represent eight values (000, 001, 010, 011, 100, 101, 110, 111), and four bits can represent sixteen values. Comparatively, a single decimal digit can represent 10 values (from 0 to 9). Ten values are represented by roughly log2(10) == 3.3219... bits, and you get weird situations like binary 1001 being decimal 9, but binary 1100 (still 4 binary digits) being 12 (two decimal digits!). Another way of expressing this digit desynchronization between decimal and binary is that decimal does not have clean bit boundaries.

The lack of bit boundaries makes reasoning about the relationship between decimal and binary complex. For example, it is hard to spot-translate numbers between decimal and binary in general: we can work out that 97 is 1100001, but it's hard to see that at a glance.

It's much easier to spot-translate between bases that have more alignment between digits. For example, a single hexadecimal (base 16) digit can represent 16 values (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f): the same number of values that binary can represent in 4 digits! This allows us to have a super simple mapping:

Hex Binary Decimal
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
a 1010 10
b 1011 11
c 1100 12
d 1101 13
e 1110 14
f 1111 15

This mapping from a hex digit to 4 bits is something that's easily memorizable (most important: memorize 1, 2, 4, and 8, and you can quickly derive the rest). Better yet, two hex digits is 8 bits, which is one byte! Unlike decimal, where you'd have to memorize 16 mappings for 4 bits and 256 mappings for 8 bits, with hexadecimal, you only have to memorize 16 mappings for 4 bits and the same amount of mappings for 8 bits, since it's just two hexadecimal digits concatenated! Some examples:

Hex Binary Decimal
00 0000 0000 0
0e 0000 1110 14
3e 0011 1110 62
e3 1110 0011 227
ee 1110 1110 238

Now you're starting to see the beauty. This gets even more obvious when you expand beyond one byte of input, but we'll let you find that out through future challenges!

Now, let's talk about notation. How do you differentiate 11 in decimal, 11 in binary (which equals 3 in decimal), and 11 in hex (which equals 17 in decimal)? For numerical constants, we sometimes prepend binary data with 0b, hexadecimal with 0x, and keep decimal as is, resulting in 11 == 0b1011 == 0xb, 3 == 0b11 == 0x3, and 17 == 0b10001 == 0x11.

Computers store data as bytes, which are made of bits. The registers you're used to so far, such as rax, are 64 bits wide, meaning that they can hold values from 0 (all 0 bits) to 2^64-1 (all 1 bits). But what if you wanted to store negative numbers?

Early approaches to storing negative numbers included the use of a sign bit: a positive decimal 5 might be stored as the byte 00000101, while a negative 5 would be 10000101. This makes sense, but it has a very important problem: math.

Deep inside your computer's CPU is a subcomponent called an Arithmetic Logical Unit (ALU), which is responsible for arithmetic. This critical component needs to be very optimized, so the less complexity is needed to, say, add units, the better. At the same time, the math needs to check out. This leads to two issues:

  1. A signed bit system has two different values for zero: 00000000 (positive zero) and 10000000 (negative zero). This is terrible for many reasons, including having to complicate the ALU.
  2. In a signed bit system, different arithmetic algorithms need to be used for signed versus unsigned numbers. You can add 00000010 (decimal 2) and 00000001 (decimal 1) easily with normal binary math, but 10000010 (decimal -2) and 00000001 (decimal 1) need to instead be modeled partially as subtractive.

This is not great. Luckily, some smart minds came up with a brilliant scheme called Twos Complement. Twos complement solves both problems by modeling half of the bit space as negative in a way compatible with unsigned arithmetic. Consider this subtraction:

  00000001    a positive 1
- 00000010    a negative 2

There aren't enough bits to service this subtraction, so we introduce a borrow bit:

  1 00000001    a positive 1 (with a borrow bit)
- 0 00000010    a negative 2
  ----------
- 0 11111111    the result of the normal *sign-agnostic* subtraction

So 11111111 is -1: add it to 1 and they cancel, with the leftover bit falling off the end of the byte and vanishing. This has a few advantages:

  1. The top bit is still the sign bit: if it's set, the value is negative! Very easy to test.
  2. There is only one zero: 00000000.
  3. Arithmetic works exactly normally for both signed and unsigned numbers.

The downside is the slight human complexity: the actual magnitude (e.g., value after the sign) of a negative number is the unsigned byte value minus 256. In the above 11111111, it's 255 - 256 = -1. A bit complex, but you'll get the idea eventually. Interestingly, if you treat the value as unsigned, you can happily use it as 255 in the arithmetic you want, and everything will still work out!

Now, put this to use. This challenge will force you to understand twos-complement on bytes. Do it, and get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

You've done two's complement one byte at a time, but nothing about it is special to 8 bits. It works at any bit-width!

In this level, we'll practice twos complement on 16-bit values. Remember, rax and its friends are 64 bits wide, and they're also twos-complement, so you have a ways to scale up!

A 16-bit value can be as big as 2^16-1 when unsigned (65535), but if you interpret that binary value (1111111111111111) as a signed integer, you will -1! To interpret 1000000000000 (unsigned 32768) as signed, you must subtract 65536 from it, resulting in -32768, which is the smalllest signed value that can be expressed in 16 bits (with the largest signed value, 0111111111111111 being 32767).

This might be starting to get slightly confusing and, indeed, the different maximum values of signed versus unsigned numbers lead to all sorts of bugs and security vulnerabilities!

The size of these numbers makes manual math difficult, and we don't expect you to do these conversions in your head. Our advice: do the unsigned conversion using a tool, then do the twos complement subtraction if it's signed. One tool you can use is the Python programming language. You don't have to program anything in it yet (though we'll get there), just use its interactive mode as a calculator. For example, to convert the binary 1001111101011100, you can do:

hacker@dojo$ ipython
In [1]: 0b1001111101011100
Out[1]: 40796

In [2]: 40796 - 65536
Out[2]: -24740

In [3]: exit
hacker@dojo$

A few notes:

  • In [1] is the input prompt for the ipython interactive python interpreter, and Out [1] is the result of the expression entered into In [1].
  • 0b is Python's prefix to differentiate numbers written in binary from decimal (e.g., 0b101 is 5 and 101 is 101).

Now, go get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Let's go wider! We'll try 4 bytes, 32 bits. The maximum unsigned value of this, 11111111 11111111 11111111 11111111, is 2**32-1. or 4,294,967,295. The maximum signed value, 01111111 11111111 11111111 11111111, is 2**31-1, which is 2,147,483,647, and the minimum signed value, 10000000 0000000 0000000 0000000 0000000, is -2**31, which is -2,147,483,648.

Anyways, go tackle this one and get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

The first three levels had you read bits as a signed number. Now let's go the other way: given a number, write its two's-complement bits.

For a zero-or-positive number, that's just its plain binary, padded with leading zeros to fill the byte. For a negative number, recall the rule from the very first level --- the n-bit two's-complement pattern of a negative value is the same bits as the unsigned number (value + 2ⁿ). In a byte (8 bits), that means adding 256:

  -5   ->   -5 + 256 = 251   ->   11111011
  42   ->                         00101010

(Equivalently: flip the bits of +5 and add one, which gives you the same answer, 11111011. Either way of thinking about it works.)

Run /challenge/encode. It gives you numbers (some negative, some not) and you write each one as 8-bit two's-complement binary. Encode them all to get the flag.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Time to put the reading above into practice.

The key fact: a single hex digit is exactly 4 bits, so a byte --- 8 bits --- is just two hex digits. To turn a byte from binary into hex, split its 8 bits into two groups of 4 and look each group up:

11100011     the byte, in binary
1110 0011    the byte, in binary, split into two groups of 4 bits
 e    3      each group of 4 bits -> one hex digit
->  0xe3     the hex!

Run /challenge/convert. It shows you bytes in binary; write each one back in hexadecimal. (You can include the 0x prefix or leave it off --- both work.) Get them all and you'll have the flag.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Like decimal numbers, you can add arbitrary amounts of them to represent more and more bytes. Every two hex digits are one additional byte. One hex digit, for those curious, is called a nibble (har har!), but this is not used when specifying data. We almost always work with data on the byte level, not less.

We'll practice this in this challenge. Split each byte's 8 bits into two groups of 4, turn each group into its hex digit, and write the bytes left to right. For example:

1110 0011   0101 1010   1001 0000
 e    3      5    a      9    0
-> 0xe35a90

Now it's your turn: run /challenge/convert and go earn that flag!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

So far you've encoded binary data into hex. Now let's go the other way and decode it: that is, turn hex back into the bits it stands for.

It's the same mapping, just run in reverse. Each hex digit becomes its 4 bits, written out in order. Since a byte is two hex digits, decoding one byte means expanding two hex digits into 8 bits:

0x   0      a
     0000   1010
->   00001010

This is exactly what a program does when it receives hex-encoded data: it turns each pair of hex digits back into the byte it represents before working with it. Write out the full 8 bits, leading zeros and all --- each hex digit is always exactly 4 of them.

Run /challenge/convert and get the flag!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Of course, the same value can be interpreted/reasoned about in multiple ways. We'll explore that here across the four interpretations we've studied (unsigned decimal, signed decimal, hex, and twos-complement binary). Get every conversion right, one after another, to earn the flag.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

30-Day Scoreboard:

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

Rank Hacker Badges Score