Numbers as Strings


Computing 101.

You've learned to read and write registers, walk through memory, and package your code as a function inside a shared library. Now let's put those skills together and build some real algorithms.

Through this module, we'll gradually build on our solutions until we create code that addresses a very common program need: turning text into numbers. Along the way, we'll learn how to write reusable assembly code, how text and numbers relate to each other, and how to reason about algorithms (and their failings!). You'll grow in knowledge, and in flags!


Computers receive a lot of their input as text. When you pass 12345 to a program as a command-line argument, your code doesn't receive the number 12345 --- it receives five separate ASCII bytes: '1', '2', '3', '4', '5'. The CPU can't add or multiply that text directly; first, someone has to turn those characters into the numbers they represent. That job is traditionally done by a function called atoi (ASCII to integer) --- and we'll build it from the ground up, starting here with a single digit.

The key insight is how digits are encoded. You've seen ASCII in prior levels, and we'll talk about ASCII numbers (the text encoding of numerical values) here. In ASCII, the character '0' is the byte 0x30, '1' is 0x31, and so on up to '9' at 0x39. The digits are consecutive, so the value of a digit character is simply the character minus '0':

'7'  ->  0x37 - 0x30  =  7

In this level, you must implement a function that converts a text string containing one digit into the number. Your function (which must be called atoi_digit) receives a pointer in rdi to a single digit character, and must return that digit's value (0 through 9) in rax.

Now, since we're making an atoi_digit function instead of solve, you'll need a .global atoi_digit so the challenge can find it! Then, build it into a shared library and hand it to the grader:

hacker@dojo:~$ as -o your-solve.o your-solve.s
hacker@dojo:~$ ld -shared -o your-solve.so your-solve.o
hacker@dojo:~$ /challenge/check your-solve.so

Decode the digit, return its value, and grab the flag.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

You can decode one digit with atoi_digit. A two-digit number is just two of those, combined by place value: in "42", the 4 is in the tens place and the 2 is in the ones place, so the value is 4 * 10 + 2 = 42. This is the algorithm we'll use to compute it in this level.

Here, you'll write two functions:

  • atoi_digit(s) --- exactly as before: the value of the single digit at s. You can (and should!) reuse your solution from the previous challenge.
  • atoi(s) --- takes a pointer to a two-character number and returns its value, by decoding each character with atoi_digit and combining them as first * 10 + second.

Both are real functions the grader calls, so both must follow the calling convention. That means that if you use any callee-saved registers, you must properly restore them before returning. And, since you're also calling atoi_digit from atoi, you must be careful to properly handle any caller-saved registers as well.

As before, each function takes its argument in rdi and returns its result in rax.

Now, how do you multiply? x86's multiply instruction is imul. It has a few different ways to use it, but we'll use it like we used add: imul rax, 10 multiplies rax by 10 in place (rax = rax * 10), so scaling the tens digit up by a place is a single instruction. Of course, imul can use other registers than rax: for example, imul rbx, 10 multiplies rbx by 10.

One more thing! In your assembly, you will need a .global atoi_digit and a .global atoi (along with the respective functions actually implemented with those labels) so that the solver can find it. Build and submit as before, with both atoi_digit and atoi functions:

hacker@dojo:~$ /challenge/check your-solve.so

Multiply by ten, add the ones, and the flag is yours.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Your two-digit atoi did first * 10 + second. What if there are more than two digits? Of course, you'd keep a running total, and for each new digit do total = total * 10 + digit. That repetition is a loop, which you've read before, but will write here!

Read the digits left to right:

"123":
  total = 0
  '1':  total =  0*10 + 1  =   1
  '2':  total =  1*10 + 2  =  12
  '3':  total = 12*10 + 3  = 123

You would do this until the end of the string, which, by the convention of the C programming language (and used here), is represented by a byte with a value of 0x00 (that is, binary 00000000 or decimal value 0). Note that this is distinct from the character '0', which, again, has a value of 0x30 (binary 00110000).

So, your loop is: look at the next byte, if it's 0, jump beyond the loop (look back at the Looping challenge for reference), otherwise multiply the total by 10, call atoi_digit to convert the digit, and add it to the total, then loop back to the head of the loop. Easy!

Your atoi receives a pointer to the string in rdi and must return the integer value in rax. Loop the digits, return the number, and score!


Debugging: This can get tricky to get right. To debug this challenge, our advice is to add a _start to your code that fakes the call, as so:

.global _start
_start:
    push 0x333231   // "123" on the stack -- little-endian, so 0x31 ('1') is the first byte, and the high zero bytes terminate it
    mov rdi, rsp    // a pointer to that string, as the first argument to atoi
    int3            // this is optional, if you want gdb to break here without having to set a breakpoint!
    call atoi       // there we go!

    mov rdi, rax    // atoi's result comes back in rax; exit with it so you can read it back with `echo $?`
    mov rax, 60     // exit
    syscall

Assemble and link it as a normal executable (no -shared --- this version has an entry point), then load it in gdb:

hacker@dojo:~$ as -o debug.o debug.s
hacker@dojo:~$ ld -o debug debug.o
hacker@dojo:~$ gdb ./debug
(gdb) run

Execution stops at your int3, and from there you can step through with the techniques you learned in Software Introspection, watching rdi walk the string and your running total build up in rax, until things work!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Your atoi handles positive numbers. But numbers can be negative, and a negative number arrives with a leading minus sign: the string "-42" is the four bytes '-', '4', '2', NUL.

Extend your converter to handle that sign. If the very first character is '-' (ASCII 0x2d), remember that the result should come out negative, step past the sign, and convert the digits that follow exactly as before. Then negate your total at the end. Of course, a positive number has no sign character, so it should still convert just like the previous level.

There are two ways you can negate a number: neg rax turns a register into its negative, and imul rax, -1 does the same. Pick the one you like!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Real input is messy. A number embedded in a larger string isn't always followed by a tidy NUL --- it might be followed by a space, a letter, a comma, or anything else: "42abc", "100 200", "7,".

A proper atoi reads digits until it sees something that isn't a digit, then stops, whatever that non-digit is (including 0x00). Instead of "stop at the 0 byte", the rule becomes "stop at the first byte that isn't '0'-'9'".

A handy one-shot test for a character c: compute c - 0x30, then check whether the result is in the range 0-9 using an unsigned comparison. Anything that isn't a digit --- punctuation, letters, a space, even the 0 value (which becomes a negative twos-complement number when you subtract '0', or 0x30 from it, and thus is a very large number when interpreted as an unsigned value), falls outside of this range.

To do an unsigned check, use the ja instruction, which stands for "jump if the last comparison was above (e.g., greater when unsigned)". You must do the cmp (again, look back earlier in this dojo), and then:

ja  done

...

done:
   ret

Otherwise, keep your solution from the prior level: a leading '-' still means negative, math still works as you expect, etc, and you get the flag when you solve it!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Until now, you've been writing a loadable library: a function the challenge loaded and called for you. This time, you'll write a whole program --- one that starts at _start, runs on its own, and exits when it's done.

Your program gets the number as a command-line argument. When a program starts, the stack holds its arguments: argc (the count) sits at [rsp], and the argument pointers follow it --- argv[0] (the program's own name) at [rsp + 8], and argv[1] (the first real argument) at [rsp + 16]. So the number you want is the string pointed to by [rsp + 16].

Read it, convert it with your atoi, and hand the value back the way a program does: instead of returning it in rax, exit with it as your exit code, using the exit syscall with the value in rdi. An exit code is a single byte, so the number you're given will be between 0 and 255.

This time, assemble and link it as a normal program (no -shared), then submit it:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ /challenge/check prog

Convert the argument, exit with its value, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

You've taken text input and converted it to a number, but real programs also have to output numbers as text. This inverse of atoi is called itoa (integer to ASCII). Here, we'll start building it the same way, first with one digit, then moving on!

In the reverse of atoi, a digit's character is just its value plus '0' (0x30). So if '7' (the ASCII character) became 7 (the value) by subtracting 0x30, then the same way, 7 (the value) would become '7' (the ASCII character) by adding 0x30.

7  ->  7 + 0x30  =  0x37  =  '7'

We'll start with itoa_digit. Your itoa_digit gets a value in rdi (a single digit, 0-9) and returns its ASCII character in rax. Remember to .global itoa_digit so the challenge can find it.

Build and submit it as a library, add 0x30, return the character, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

One digit was easy. A two-digit number like 42 needs splitting into its tens (4) and ones (2) --- and splitting is division: 42 / 10 = 4 (the quotient), and 42 % 10 = 2 (the remainder).

x86 gives you both results from one div, but div is a fussy instruction worth learning carefully. div rcx divides the 128-bit value resulting by concatenating rdx:rax by rcx, leaving the quotient in rax and the remainder in rdx. Two things follow from that:

  • It divides rdx:rax, not just rax, so you must clear rdx first (xor rdx, rdx) --- otherwise div treats leftover garbage as the high half of your number (and may crash).
  • The divisor comes from a register, not an immediate, so load the 10 into one (e.g., mov rcx, 10; div rcx).
  • You don't control the dividend: it's always rdx:rax.

After the div, rax holds the tens and rdx holds the ones. Turn each into a character the way itoa_digit did (add 0x30) and store the two of them.

Write itoa(value, buf), which we'll call from the challenge. This function should take a value (10-99) in rdi and a pointer to the "output" buffer in rsi. You should parse the number in rdi into two characters (using div and then your old itoa_digit function) and write these two characters to that buffer. Then return the number of characters written (in this case, 2). Remember to .global itoa.

Writing characters. Your itoa_digit function from the last level returned the result (in rax), and you didn't have to deal writh writing it to a buffer. Now, you do. Your actual character is one byte (8 bits), whereas the register you're holding it in is 64 bits (8 bytes) long. You just want the last ("least significant") byte, and you can directly access it through partial register alises, depending on the register:

register least significant byte
rax al
rbx bl
rcx cl
rdx dl
rsi sil
rdi dil
rbp bpl
rsp spl
r8 r8b
r9 r9b
r10 r10b
r11 r11b
r12 r12b
r13 r13b
r14 r14b
r15 r15b

So, if your character is in rax, and the buffer is pointed to by rsi, you'll need to do mov [rsi], al.

This is tricky, but do it carefully, and the flag is your reward!


Debugging: This can get tricky to get right. To debug this challenge, our advice is to add a _start in your code, as so:

.global _start
_start:
    mov rdi, 42     // you'll pass 42 as the first argument to your function
    push 0          // this pushes eight 0 bytes to the stack, clearing what will be your output buffer
    mov rsi, rsp    // the output buffer as the second argument to itoa
    int3            // this is optional, if you want gdb to break here without having to set a breakpoint!
    call itoa       // there we go!

    mov rax, 60     // exit cleanly, like a cultured individual
    syscall

Assemble and link it as a normal executable (no -shared --- this version has an entry point), then load it in gdb:

hacker@dojo:~$ as -o debug.o debug.s
hacker@dojo:~$ ld -o debug debug.o
hacker@dojo:~$ gdb ./debug
(gdb) run

Execution stops at your int3, and from there you can step through with the techniques you learned in Software Introspection, looking at memory on the stack, registers, etc, until things work!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Your two-digit itoa always wrote two characters. But 7 isn't 07, and 0 isn't 00 --- numbers tend to be written with no leading zeros, in as many digits as it actually has. In this level, we'll strip the leading zeroes from our translation on the path to a nice itoa!

We'll still deal with values 99 and less, so a single div. Divide by 10 as before: the quotient is the tens digit, the remainder is the ones. If the quotient is 0, there is no tens digit, and we can drop the leading zero and output just the remainder. For example:

7:   7 / 10 = 0 rem 7   ->  quotient 0, so write just "7"
42:  42 / 10 = 4 rem 2  ->  quotient 4, so write "42"

One value still needs care: 0 itself. Its quotient is 0 too, but writing "nothing" is wrong, so we write a single '0'.

The rest is the same: Write itoa(value, buf) for value in 0-99: write its decimal text (no leading zeros) to buf, and return how many characters you wrote (1 or 2) in rax.


Note: The "check if the quotient is 0" test will be useful in the next level, where we'll finally support longer numbers! Keep it in mind!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Now any length. It's the same div-by-10 step as last level, just repeated: each div peels off the lowest digit (the remainder) and shrinks the number (the quotient), and you keep going until the quotient reaches 0 --- however many digits that takes.

123:  123 % 10 = 3,  123 / 10 = 12
       12 % 10 = 2,   12 / 10 = 1
        1 % 10 = 1,    1 / 10 = 0   (stop)

But notice the catch: the digits come out backwards --- ones first (3, 2, 1), the reverse of how you write them (1, 2, 3). So you can't just append them as you go. The usual fixes: stash each digit as it comes and write them out in reverse (the stack is perfect for this --- push them as they fall out, pop them to write, and LIFO reverses them for free), or write them into the buffer from the back toward the front.

And 0 is the same special case you handled last level: the loop runs zero times for it, so write a plain "0" yourself.

Write itoa(value, buf) for any non-negative value (in rdi, buffer in rsi): write its decimal digits to the buffer and return how many you wrote, in rax. Remember to .global itoa.

Build and submit as before:

hacker@dojo:~$ as -o your-solve.o your-solve.s
hacker@dojo:~$ ld -shared -o your-solve.so your-solve.o
hacker@dojo:~$ /challenge/check your-solve.so

Reverse the digits, return the length, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Your itoa handles non-negative numbers. But a sum can be negative (your atoi reads negative numbers, after all), and a negative number is written with a leading -.

The trick is to peel the sign off first, then let the work you already did handle the rest:

  • If the input value is negative, write a '-', move your buffer pointer one past it (e.g., add rsi, 1), and neg the input value to get its magnitude.
  • You can cmp rdi, 0 to compare, and jl is_negative (jl jumps if the previous compared left value was less than the right one, signed).
  • Run your existing digit loop on that (now non-negative) magnitude.
  • The total length is the digits you wrote, plus 1 for the sign.

A non-negative number has no sign, so it still prints exactly as before.

Extend itoa(value, buf) to handle negative values too. The calling convention is the same: value argument in rdi, buffer argument in rsi, total length returned in rax. Remember to .global itoa.

Build and submit as before:

hacker@dojo:~$ as -o your-solve.o your-solve.s
hacker@dojo:~$ ld -shared -o your-solve.so your-solve.o
hacker@dojo:~$ /challenge/check your-solve.so

Handle the sign, return the length, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

The boss: put it all together.

Read the numbers from argv, convert each one with your atoi, add them into a total, turn that total back into text with your itoa, and write it to standard output.

  • Walk argv[1] through argv[argc - 1].
  • argc is at [rsp]
  • the argv pointers start at [rsp + 8] and 8 bytes long, so if you have one loaded into rdi (e.g., mov rdi, rsp; add rdi, 8), you can go to the next one by doing add rdi, 8).
  • run atoi on each, summing as you go

The numbers, or even the overall sum, might be negative, which is exactly why your atoi and itoa handle the sign. Then itoa the total into a scratch buffer (e.g., on the stack) and write that many bytes to file descriptor 1.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ /challenge/check prog

Sum them, convert the total, print it, and you're done!


Debugging: Don't forget about gdb! Insert int3, use breakpoint in gdb, stepi the instructions, and try to deeply understand failures if they occur so that you can fix it!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

You can turn text into a number with your atoi, and a number back into text with your itoa. Now you'll use both at once to build a small command-line calculator. It reads an expression from argv --- a left operand, an operator, and a right operand --- as in prog 6 + 7. The operands are strings you already know how to handle: atoi each one. The operator is the new piece: a single character you branch on to decide what to compute, quitting on any operator you don't support. Then itoa the result and write it to standard output, exactly as your summing program did. We'll add one operator at a time --- addition, then subtraction, then multiplication, then the bitwise operators, then division and modulo, and finally the unary operators --- each reusing the atoi and itoa you've already written.

You can read a number from text with your atoi, and write one back to text with your itoa. Now you'll put both to work in a single program: a calculator.

A calculator reads an expression like 6 + 7 and prints the answer. We'll hand you that expression three pieces at a time, on the command line:

prog 6 + 7

So argv[1] is the left operand ("6"), argv[2] is the operator ("+"), and argv[3] is the right operand ("7"). Each one is a string, just like the arguments you've already been reading off the stack. Recall that argc sits at [rsp], and the argument pointers follow: argv[0] at [rsp + 8], argv[1] at [rsp + 16], argv[2] at [rsp + 24], and argv[3] at [rsp + 32].

The operand strings are easy: atoi each one to get its value, exactly as before. The operator is the new piece. It's a string too, but a one-character one, so the character you care about is its first byte: argv[2][0]. Load that pointer and read the byte it points at, and you have the operator as a single character to branch on.

For this level there's only one operator to handle, '+':

"6"   ->  atoi  ->   6
"7"   ->  atoi  ->   7
6  +  7  =  13
13   ->  itoa  ->  "13"

But a real operator might be anything the user typed, and you only know how to add. So check the operator byte first: if it's '+', do the addition; if it's anything else, you don't support it, so quit by exiting with a nonzero code (no answer to print). That refusal is part of the job --- recognizing the one operator you handle, and bailing out on the rest.

When the operator is '+': atoi both operands, add them, itoa the sum into a scratch buffer, and write those bytes to file descriptor 1 (standard output). Reserve that buffer on the stack (a sub rsp, 0x80 makes room, and rsp then points at it), the same way you stored the /flag string on the stack back in Hello, Hackers. Then exit cleanly with code 0.

This is a whole program, so assemble and link it as one (no -shared), then submit it:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 6 + 7
13
hacker@dojo:~$ /challenge/check prog

Read the operands, dispatch on the operator, print the sum, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Now, let's teach the calculator to subtract!

Add a second branch on the operator byte: if it's '-', sub the right operand from the left instead of adding. Everything else is the same dispatch you already wrote --- '+' adds, '-' subtracts, and any other operator still makes you quit with a nonzero exit code.

The one thing to watch: a difference can be negative.

"3"   ->  atoi  ->   3
"10"  ->  atoi  ->  10
3  -  10  =  -7
-7   ->  itoa  ->  "-7"

That's exactly the case your signed itoa already handles --- it writes the leading '-' and the magnitude for you. So feed the difference straight into the same itoa you've been using, and the sign takes care of itself.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 3 - 10
-7
hacker@dojo:~$ /challenge/check prog

Add the subtract branch, print the signed result, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Now, let's teach the calculator to multiply!

Multiplication has its own instruction: imul. Just as you used add for '+' and sub for '-', you'll use imul for '*'. You've already met it back in atoi-two-digits, where imul rax, 10 scaled your running total by ten; here it multiplies your two operands the same way.

"6"   ->  atoi  ->   6
"7"   ->  atoi  ->   7
6  *  7  =  42
42   ->  itoa  ->  "42"

Add a third branch on the operator byte: if it's '*', imul the operands; '+' and '-' work as before, and any other operator still makes you quit with a nonzero exit code.

One shell wrinkle: * is special to the shell (it expands to filenames), so quote it when you run your program by hand --- ./prog 6 '*' 7 or ./prog 6 "*" 7. The character your program receives is still a plain '*'.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 6 '*' 7
42
hacker@dojo:~$ /challenge/check prog

Add the multiply branch, print the product, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Now, let's teach the calculator the bitwise operators!

Every operator so far has been arithmetic. The next three combine their operands bit by bit, and you've met all of them back in assembly-assortment:

  • ^ is XOR: each result bit is 1 when exactly one input bit is 1.
  • | is OR: each result bit is 1 when either input bit is 1.
  • & is AND: each result bit is 1 only when both input bits are 1.

Add three more branches to your dispatch --- '^'xor, '|'or, '&'and --- alongside '+', '-', and '*'. Any operator you still don't recognize makes you quit with a nonzero exit code.

A bitwise result is just a 64-bit number, so you print it like every other answer: feed it to your signed itoa and write the text.

"12"  ->  atoi  ->  12     (0000 1100)
"10"  ->  atoi  ->  10     (0000 1010)
12  ^  10  =  6            (0000 0110)
12  |  10  =  14           (0000 1110)
12  &  10  =  8            (0000 1000)

Two of these are special to the shell --- | pipes commands together and & runs one in the background --- so quote them when you run your program by hand (^ is fine bare):

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 12 '|' 10
14
hacker@dojo:~$ /challenge/check prog

Add the three bitwise branches, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

Every operator so far has been binary --- two operands with an operator between them. The last two are unary: a single operand, with the operator in front.

  • - negates: - 5 is -5.
  • ~ flips every bit (bitwise NOT): ~ 5 is -6, because two's complement makes ~x equal -x - 1.

The new idea is telling the two shapes apart. A binary call passes three arguments after the program name (prog A OP B); a unary call passes two (prog OP A). So the argument count decides which you're reading, and you already know where it lives: argc sits at [rsp]. Branch on it first:

  • argc == 4: the binary dispatch you already wrote (operator in argv[2]).
  • argc == 3: the new unary dispatch, operator in argv[1] and operand in argv[2].

This split is exactly what lets - mean two things: binary - subtracts (12 - 5 = 7), unary - negates (- 5 = -5). The argument count tells them apart.

For the unary operators: neg the operand for -, and not it for ~. Print the result with your signed itoa, like any other answer.

- 5   ->  neg  ->  -5
~ 5   ->  not  ->  -6

Add the argc split and the two unary branches; an unrecognized unary operator quits, just like a binary one. The shell expands a bare ~ to your home directory, so quote it (- you can type straight):

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog - 5
-5
hacker@dojo:~$ ./prog '~' 5
-6
hacker@dojo:~$ /challenge/check prog

Split on the argument count, handle both unary operators, and score!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

30-Day Scoreboard:

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

Rank Hacker Badges Score