Numbers as Strings

Computing 101.

You've learned to read and write registers, walk through memory, and package your code as a function inside a shared library. Now let's put those skills together and build some real algorithms. When you need to debug one of these .so submissions, refer back to Writing from a Shared Library for the shared-library debugging pattern.

Through this module, we'll gradually build on our solutions until we create code that addresses a very common program need: turning text into numbers. Along the way, we'll learn how to write reusable assembly code, how text and numbers relate to each other, and how to reason about algorithms (and their failings!). You'll grow in knowledge, and in flags!

Computers receive a lot of their input as text. When you pass 12345 to a program as a command-line argument, your code doesn't receive the number 12345 --- it receives five separate ASCII bytes: '1', '2', '3', '4', '5'. The CPU can't add or multiply that text directly; first, someone has to turn those characters into the numbers they represent. That job is traditionally done by a function called atoi (ASCII to integer) --- and we'll build it from the ground up, starting here with a single digit.

The key insight is how digits are encoded. You've seen ASCII in prior levels, and we'll talk about ASCII numbers (the text encoding of numerical values) here. In ASCII, the character '0' is the byte 0x30, '1' is 0x31, and so on up to '9' at 0x39. The digits are consecutive, so the value of a digit character is simply the character minus '0':

'7'  ->  0x37 - 0x30  =  7

In this level, you must implement a function that converts a text string containing one digit into the number. Your function (which must be called atoi_digit) receives a pointer in rdi to a single digit character, and must return that digit's value (0 through 9) in rax.

Now, since we're making an atoi_digit function instead of solve, you'll need a .global atoi_digit so the challenge can find it! Then, build it into a shared library and hand it to the grader:

hacker@dojo:~$ as -o your-solve.o your-solve.s
hacker@dojo:~$ ld -shared -o your-solve.so your-solve.o
hacker@dojo:~$ /challenge/check your-solve.so

Decode the digit, return its value, and grab the flag.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

You can decode one digit with atoi_digit. A two-digit number is just two of those, combined by place value: in "42", the 4 is in the tens place and the 2 is in the ones place, so the value is 4 * 10 + 2 = 42. This is the algorithm we'll use to compute it in this level.

Here, you'll write two functions:

atoi_digit(s) --- exactly as before: the value of the single digit at s. You can (and should!) reuse your solution from the previous challenge.
atoi(s) --- takes a pointer to a two-character number and returns its value, by decoding each character with atoi_digit and combining them as first * 10 + second.

Both are real functions the grader calls, so both must follow the calling convention. That means that if you use any callee-saved registers, you must properly restore them before returning. And, since you're also calling atoi_digit from atoi, you must be careful to properly handle any caller-saved registers as well.

As before, each function takes its argument in rdi and returns its result in rax.

Now, how do you multiply? x86's multiply instruction is imul. It has a few different ways to use it, but we'll use it like we used add: imul rax, 10 multiplies rax by 10 in place (rax = rax * 10), so scaling the tens digit up by a place is a single instruction. Of course, imul can use other registers than rax: for example, imul rbx, 10 multiplies rbx by 10.

One more thing! In your assembly, you will need a .global atoi_digit and a .global atoi (along with the respective functions actually implemented with those labels) so that the solver can find it. Build and submit as before, with both atoi_digit and atoi functions:

hacker@dojo:~$ /challenge/check your-solve.so

Multiply by ten, add the ones, and the flag is yours.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Your two-digit atoi did first * 10 + second. What if there are more than two digits? Of course, you'd keep a running total, and for each new digit do total = total * 10 + digit. That repetition is a loop, which you practiced in Writing Loops and will adapt here.

Read the digits left to right:

"123":
  total = 0
  '1':  total =  0*10 + 1  =   1
  '2':  total =  1*10 + 2  =  12
  '3':  total = 12*10 + 3  = 123

You would do this until the end of the string, which, by the convention of the C programming language (and used here), is represented by a byte with a value of 0x00 (that is, binary 00000000 or decimal value 0). Note that this is distinct from the character '0', which, again, has a value of 0x30 (binary 00110000).

So, your loop is: look at the next byte, if it's 0, jump beyond the loop (look back at Writing Loops for reference), otherwise convert the digit just like atoi_digit did, multiply the total by 10, add the digit, and loop back to the head of the loop.

Your atoi receives a pointer to the string in rdi and must return the integer value in rax. Loop the digits and return the number.

Debugging: This can get tricky to get right. To debug this challenge, our advice is to add a _start to your code that fakes the call, as so:

.global _start
_start:
    push 0x333231   # "123" on the stack -- little-endian, so 0x31 ('1') is the first byte, and the high zero bytes terminate it
    mov rdi, rsp    # a pointer to that string, as the first argument to atoi
    int3            # this is optional, if you want gdb to break here without having to set a breakpoint!
    call atoi       # there we go!

    mov rdi, rax    # atoi's result comes back in rax; exit with it so you can read it back with `echo $?`
    mov rax, 60     # exit
    syscall

Assemble and link it as a normal executable (no -shared --- this version has an entry point), then load it in gdb:

hacker@dojo:~$ as -o debug.o debug.s
hacker@dojo:~$ ld -o debug debug.o
hacker@dojo:~$ gdb ./debug
(gdb) run

Execution stops at your int3, and from there you can step through with the techniques you learned in Software Introspection, watching rdi walk the string and your running total build up in rax, until things work!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Your atoi handles positive numbers. But numbers can be negative, and a negative number arrives with a leading minus sign: the string "-42" is the four bytes '-', '4', '2', NUL.

Extend your converter to handle that sign. If the very first character is '-' (ASCII 0x2d), remember that the result should come out negative, step past the sign, and convert the digits that follow exactly as before. Then negate your total at the end. Of course, a positive number has no sign character, so it should still convert just like the previous level.

There are two ways you can negate a number: neg rax turns a register into its negative, and imul rax, -1 does the same. Pick the one you like!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Real input is messy. A number embedded in a larger string isn't always followed by a tidy NUL --- it might be followed by a space, a letter, a comma, or anything else: "42abc", "100 200", "7,".

A proper atoi reads digits until it sees something that isn't a digit, then stops, whatever that non-digit is (including 0x00). Instead of "stop at the 0 byte", the rule becomes "stop at the first byte that isn't '0'-'9'".

A handy one-shot test for a character c: compute c - 0x30, then check whether the result is in the range 0-9 using an unsigned comparison. Anything that isn't a digit --- punctuation, letters, a space, even the 0 value (which becomes a negative twos-complement number when you subtract '0', or 0x30 from it, and thus is a very large number when interpreted as an unsigned value), falls outside of this range.

To do an unsigned check, use the ja instruction, which stands for "jump if the last comparison was above (e.g., greater when unsigned)". You must do the cmp (again, look back earlier in this dojo), and then:

ja  done

...

done:
   ret

Otherwise, keep your solution from the prior level: a leading '-' still means negative, math still works as you expect, etc, and you get the flag when you solve it!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Until now, you've been writing a loadable library: a function the challenge loaded and called for you. This time, you'll write a whole program --- one that starts at _start, runs on its own, and exits when it's done.

Your program gets the number as a command-line argument. When a program starts, the stack holds its arguments: argc (the count) sits at [rsp], and the argument pointers follow it --- argv[0] (the program's own name) at [rsp + 8], and argv[1] (the first real argument) at [rsp + 16]. So the number you want is the string pointed to by [rsp + 16].

Read it, convert it with your atoi, and hand the value back the way a program does: instead of returning it in rax, exit with it as your exit code, using the exit syscall with the value in rdi. An exit code is a single byte, so the number you're given will be between 0 and 255.

This time, assemble and link it as a normal program (no -shared), then submit it:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ /challenge/check prog

Convert the argument and exit with its value.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

You've taken text input and converted it to a number, but real programs also have to output numbers as text. This inverse of atoi is called itoa (integer to ASCII). Here, we'll start building it the same way, first with one digit, then moving on!

In the reverse of atoi, a digit's character is just its value plus '0' (0x30). So if '7' (the ASCII character) became 7 (the value) by subtracting 0x30, then the same way, 7 (the value) would become '7' (the ASCII character) by adding 0x30.

7  ->  7 + 0x30  =  0x37  =  '7'

We'll start with itoa_digit. Your itoa_digit gets a value in rdi (a single digit, 0-9) and returns its ASCII character in rax. Remember to .global itoa_digit so the challenge can find it.

The previous level was a whole executable. This level returns to the shared-library workflow from the earlier atoi functions:

hacker@dojo:~$ as -o your-solve.o your-solve.s
hacker@dojo:~$ ld -shared -o your-solve.so your-solve.o
hacker@dojo:~$ /challenge/check your-solve.so

Add 0x30, return the character, and claim the flag.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

One digit was easy. A two-digit number like 42 needs splitting into its tens (4) and ones (2) --- and splitting is division: 42 / 10 = 4 (the quotient), and 42 % 10 = 2 (the remainder).

x86 gives you both results from one div, but div is a fussy instruction worth learning carefully. div rcx divides the 128-bit value resulting by concatenating rdx:rax by rcx, leaving the quotient in rax and the remainder in rdx. Three things follow from that:

It divides rdx:rax, not just rax, so you must clear rdx first (xor rdx, rdx) --- otherwise div treats leftover garbage as the high half of your number (and may crash).
The divisor comes from a register, not an immediate, so load the 10 into one (e.g., mov rcx, 10; div rcx).
You don't control the dividend: it's always rdx:rax.

After the div, rax holds the tens and rdx holds the ones. Turn each into a character the way itoa_digit did (add 0x30) and store the two of them.

Write itoa(value, buf), which we'll call from the challenge. This function should take a value (10-99) in rdi and a pointer to the "output" buffer in rsi. Split the number in rdi with div, convert the two digits as above, and write their characters to that buffer. Then return the number of characters written (in this case, 2). Remember to .global itoa.

Writing characters. Your itoa_digit function from the last level returned the result (in rax), and you didn't have to deal with writing it to a buffer. Now, you do. Your actual character is one byte (8 bits), whereas the register you're holding it in is 64 bits (8 bytes) long. You just want the last ("least significant") byte, and you can directly access it through partial register aliases, depending on the register:

register	least significant byte
`rax`	`al`
`rbx`	`bl`
`rcx`	`cl`
`rdx`	`dl`
`rsi`	`sil`
`rdi`	`dil`
`rbp`	`bpl`
`rsp`	`spl`
`r8`	`r8b`
`r9`	`r9b`
`r10`	`r10b`
`r11`	`r11b`
`r12`	`r12b`
`r13`	`r13b`
`r14`	`r14b`
`r15`	`r15b`

So, if your character is in rax, and the buffer is pointed to by rsi, you'll need to do mov [rsi], al.

This is tricky, but do it carefully, and the flag is your reward!

Debugging: This can get tricky to get right. To debug this challenge, our advice is to add a _start in your code, as so:

.global _start
_start:
    mov rdi, 42     # you'll pass 42 as the first argument to your function
    push 0          # this pushes eight 0 bytes to the stack, clearing what will be your output buffer
    mov rsi, rsp    # the output buffer as the second argument to itoa
    int3            # this is optional, if you want gdb to break here without having to set a breakpoint!
    call itoa       # there we go!

    mov rax, 60     # exit cleanly, like a cultured individual
    syscall

Assemble and link it as a normal executable (no -shared --- this version has an entry point), then load it in gdb:

hacker@dojo:~$ as -o debug.o debug.s
hacker@dojo:~$ ld -o debug debug.o
hacker@dojo:~$ gdb ./debug
(gdb) run

Execution stops at your int3, and from there you can step through with the techniques you learned in Software Introspection, looking at memory on the stack, registers, etc, until things work! You can also debug the native harness that loads your .so:

hacker@dojo:~$ gdb --args /challenge/harness your-solve.so 42
(gdb) run

The first argument after /challenge/harness is your library, and the second is the stand-in number passed to itoa.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Your two-digit itoa always wrote two characters. But 7 isn't 07, and 0 isn't 00 --- numbers tend to be written with no leading zeros, in as many digits as it actually has. In this level, we'll strip the leading zeroes from our translation on the path to a nice itoa!

We'll still deal with values 99 and less, so a single div. Divide by 10 as before: the quotient is the tens digit, the remainder is the ones. If the quotient is 0, there is no tens digit, and we can drop the leading zero and output just the remainder. For example:

7:   7 / 10 = 0 rem 7   ->  quotient 0, so write just "7"
42:  42 / 10 = 4 rem 2  ->  quotient 4, so write "42"

One value still needs care: 0 itself. Its quotient is 0 too, but writing "nothing" is wrong, so we write a single '0'.

The rest is the same: Write itoa(value, buf) for value in 0-99: write its decimal text (no leading zeros) to buf, and return how many characters you wrote (1 or 2) in rax.

Note: The "check if the quotient is 0" test will be useful in the next level, where we'll finally support longer numbers! Keep it in mind!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Now any length. It's the same div-by-10 step as last level, just repeated: each div peels off the lowest digit (the remainder) and shrinks the number (the quotient), and you keep going until the quotient reaches 0 --- however many digits that takes.

123:  123 % 10 = 3,  123 / 10 = 12
       12 % 10 = 2,   12 / 10 = 1
        1 % 10 = 1,    1 / 10 = 0   (stop)

But notice the catch: the digits come out backwards --- ones first (3, 2, 1), the reverse of how you write them (1, 2, 3). So you can't just append them as you go. The usual fixes: stash each digit as it comes and write them out in reverse (the stack is perfect for this --- push them as they fall out, pop them to write, and LIFO reverses them for free), or write them into the buffer from the back toward the front.

And 0 is the same special case you handled last level: the loop runs zero times for it, so write a plain "0" yourself.

Write itoa(value, buf) for any non-negative value (in rdi, buffer in rsi): write its decimal digits to the buffer and return how many you wrote, in rax. Remember to .global itoa.

Build and submit as before:

hacker@dojo:~$ as -o your-solve.o your-solve.s
hacker@dojo:~$ ld -shared -o your-solve.so your-solve.o
hacker@dojo:~$ /challenge/check your-solve.so

Reverse the digits and return the length.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Your itoa handles non-negative numbers. But a sum can be negative (your atoi reads negative numbers, after all), and a negative number is written with a leading -.

The trick is to peel the sign off first, then let the work you already did handle the rest:

If the input value is negative, write a '-', move your buffer pointer one past it (e.g., add rsi, 1), and neg the input value to get its magnitude.
You can cmp rdi, 0 to compare, and jl is_negative (jl jumps if the previous compared left value was less than the right one, signed).
Run your existing digit loop on that (now non-negative) magnitude.
The total length is the digits you wrote, plus 1 for the sign.

A non-negative number has no sign, so it still prints exactly as before.

Extend itoa(value, buf) to handle negative values too. The calling convention is the same: value argument in rdi, buffer argument in rsi, total length returned in rax. Remember to .global itoa.

Build and submit as before:

hacker@dojo:~$ as -o your-solve.o your-solve.s
hacker@dojo:~$ ld -shared -o your-solve.so your-solve.o
hacker@dojo:~$ /challenge/check your-solve.so

Handle the sign and return the length.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

The boss: put it all together.

Read the numbers from argv, convert each one with your atoi, add them into a total, turn that total back into text with your itoa, and write it to standard output. With no number arguments, print 0.

argc is at [rsp]; the 8-byte argv pointer entries begin at [rsp + 8], and [rsp + 16] is the entry for argv[1].
Keep a register pointing to each table entry from argv[1] through argv[argc - 1].
Dereference that table entry to obtain the string pointer that atoi expects.
Advance the register by 8 bytes after each number.

Remember, atoi is a function call, so any loop state you still need afterward must be preserved according to the calling convention rules you practiced earlier.

The numbers, or even the overall sum, might be negative, which is exactly why your atoi and itoa handle the sign. Then itoa the total into a scratch buffer (e.g., on the stack) and write that many bytes to file descriptor 1.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ /challenge/check prog

Sum them, convert the total, print it, and you're done!

Debugging: Don't forget about gdb! Insert int3, use breakpoint in gdb, stepi the instructions, and try to deeply understand failures if they occur so that you can fix it!

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

The Calculator

You can turn text into a number with your atoi, and a number back into text with your itoa. Now you'll use both at once to build a small command-line calculator. It reads an expression from argv --- a left operand, an operator, and a right operand --- as in prog 6 + 7. The operands are strings you already know how to handle: atoi each one. The operator is the new piece: a single character you branch on to decide what to compute, quitting on any operator you don't support. Then itoa the result and write it to standard output, exactly as your summing program did. We'll add one operator group at a time --- addition, then subtraction, then multiplication, then the bitwise operators, and finally the unary operators --- each reusing the atoi and itoa you've already written.

You can read a number from text with your atoi, and write one back to text with your itoa. Now you'll put both to work in a single program: a calculator.

A calculator reads an expression like 6 + 7 and prints the answer. We'll hand you that expression three pieces at a time, on the command line:

prog 6 + 7

So argv[1] is the left operand ("6"), argv[2] is the operator ("+"), and argv[3] is the right operand ("7"). Each one is a string, just like the arguments you've already been reading off the stack. Recall that argc sits at [rsp], and the argument pointers follow: argv[0] at [rsp + 8], argv[1] at [rsp + 16], argv[2] at [rsp + 24], and argv[3] at [rsp + 32].

The operand strings are easy: atoi each one to get its value, exactly as before. The operator is the new piece. It's a string too, but a one-character one, so the character you care about is its first byte: argv[2][0]. Load that pointer and read the byte it points at, and you have the operator as a single character to branch on.

For this level there's only one operator to handle, '+':

"6"   ->  atoi  ->   6
"7"   ->  atoi  ->   7
6  +  7  =  13
13   ->  itoa  ->  "13"

But a real operator might be anything the user typed, and you only know how to add. So check the operator byte first: if it's '+', do the addition; if it's anything else, you don't support it, so quit by exiting with a nonzero code (no answer to print). That refusal is part of the job --- recognizing the one operator you handle, and bailing out on the rest.

When the operator is '+': atoi both operands, add them, itoa the sum into a scratch buffer, and write those bytes to file descriptor 1 (standard output). Reserve that buffer on the stack (a sub rsp, 0x80 makes room, and rsp then points at it), the same way you stored the /flag string on the stack back in Hello, Hackers. Then exit cleanly with code 0.

This is a whole program, so assemble and link it as one (no -shared), then submit it:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 6 + 7
13
hacker@dojo:~$ /challenge/check prog

Read the operands, dispatch on the operator, and print the sum.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Now, let's teach the calculator to subtract!

Add a second branch on the operator byte: if it's '-', sub the right operand from the left instead of adding. Everything else is the same dispatch you already wrote --- '+' adds, '-' subtracts, and any other operator still makes you quit with a nonzero exit code.

The one thing to watch: a difference can be negative.

"3"   ->  atoi  ->   3
"10"  ->  atoi  ->  10
3  -  10  =  -7
-7   ->  itoa  ->  "-7"

That's exactly the case your signed itoa already handles --- it writes the leading '-' and the magnitude for you. So feed the difference straight into the same itoa you've been using, and the sign takes care of itself.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 3 - 10
-7
hacker@dojo:~$ /challenge/check prog

Add the subtract branch and print the signed result.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Now, let's teach the calculator to multiply!

Multiplication has its own instruction: imul. Just as you used add for '+' and sub for '-', you'll use imul for '*'. You've already met it back in atoi-two-digits, where imul rax, 10 scaled your running total by ten; here it multiplies your two operands the same way.

"6"   ->  atoi  ->   6
"7"   ->  atoi  ->   7
6  *  7  =  42
42   ->  itoa  ->  "42"

Add a third branch on the operator byte: if it's '*', imul the operands; '+' and '-' work as before, and any other operator still makes you quit with a nonzero exit code.

One shell wrinkle: * is special to the shell (it expands to filenames), so quote it when you run your program by hand --- ./prog 6 '*' 7 or ./prog 6 "*" 7. The character your program receives is still a plain '*'.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 6 '*' 7
42
hacker@dojo:~$ /challenge/check prog

Add the multiply branch and print the product.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Now, let's teach the calculator the bitwise operators!

Every operator so far has been arithmetic. The next three combine their operands bit by bit, and you've met all of them back in assembly-assortment:

^ is XOR: each result bit is 1 when exactly one input bit is 1.
| is OR: each result bit is 1 when either input bit is 1.
& is AND: each result bit is 1 only when both input bits are 1.

Add three more branches to your dispatch --- '^' → xor, '|' → or, '&' → and --- alongside '+', '-', and '*'. Any operator you still don't recognize makes you quit with a nonzero exit code.

A bitwise result is just a 64-bit number, so you print it like every other answer: feed it to your signed itoa and write the text.

"12"  ->  atoi  ->  12     (0000 1100)
"10"  ->  atoi  ->  10     (0000 1010)
12  ^  10  =  6            (0000 0110)
12  |  10  =  14           (0000 1110)
12  &  10  =  8            (0000 1000)

Two of these are special to the shell --- | pipes commands together and & runs one in the background --- so quote them when you run your program by hand (^ is fine bare):

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 12 '|' 10
14
hacker@dojo:~$ /challenge/check prog

Add the three bitwise branches.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Every operator so far has been binary --- two operands with an operator between them. The last two are unary: a single operand, with the operator in front.

- negates: - 5 is -5.
~ flips every bit (bitwise NOT): ~ 5 is -6, because two's complement makes ~x equal -x - 1.

The new idea is telling the two shapes apart. A binary call passes three arguments after the program name (prog A OP B); a unary call passes two (prog OP A). So the argument count decides which you're reading, and you already know where it lives: argc sits at [rsp]. Branch on it first:

argc == 4: the binary dispatch you already wrote (operator in argv[2]).
argc == 3: the new unary dispatch, operator in argv[1] and operand in argv[2].

This split is exactly what lets - mean two things: binary - subtracts (12 - 5 = 7), unary - negates (- 5 = -5). The argument count tells them apart.

For the unary operators: neg the operand for -, and not it for ~. Print the result with your signed itoa, like any other answer.

- 5   ->  neg  ->  -5
~ 5   ->  not  ->  -6

Add the argc split and the two unary branches; an unrecognized unary operator quits, just like a binary one. The shell expands a bare ~ to your home directory, so quote it (- you can type straight):

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog - 5
-5
hacker@dojo:~$ ./prog '~' 5
-6
hacker@dojo:~$ /challenge/check prog

Split on the argument count and handle both unary operators.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

printf

Programs rarely print just one fixed string. They print messages assembled from fixed words and changing values: answer: 13, hello, hacker, opened 3 files, and so on. You could write each message by hand, but then every new message shape needs its own copy loops, number conversions, and write calls. A format string is a compact recipe for building those messages. Ordinary characters in the recipe are printed as-is, while special markers say where the next value should go. The code that follows such a recipe is called a formatter. printf is the traditional name for a formatter that prints its result.

This mini-module will have you build printf. Your version will be special: it's yours! Many other versions of printf exist as well. For example, the "standard C library", which includes useful functions to use when writing applications in the C programming language, includes an implementation, and your commandline has it too:

hacker@dojo:~$ printf Hello
Hello
hacker@dojo:~$

We'll build up that idea in small steps: literal text, the ASCII newline byte, escaped syntax characters, decimal numbers, repeated values, strings, and finally arbitrary bytes.

You can already write bytes to standard output. Now you will put that in a loop and start building a small printf-style program.

The input is a format string in argv[1]. For this first level, there are no special markers yet. Every byte in the format string is ordinary text, so your job is to write those bytes to standard output.

argv[1]:  "score: "
output:   "score: "

You can write one byte at a time as you scan, or find a run of ordinary bytes and write the whole run at once. Either way, stop when you reach the format string's NUL byte, then exit cleanly.

Build and submit it as an executable:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 'score: '
score: hacker@dojo:~$ /challenge/check prog

Note that in the above, prog doesn't print a terminal null byte, and the command prompt starts on the same line. That's okay --- the next level will teach your formatter how to write a newline.

When testing, be aware that the commandline also has a built-in printf utility. If you name your program printf, make sure to run it via a path (e.g., ./printf) to avoid the built-in one.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Literal output can copy visible characters, but text also needs a way to name bytes that are awkward to type directly. A newline is the classic example: it moves the terminal to the next line instead of drawing a visible symbol.

You've seen ASCII before: it assigns byte values to text characters and text controls. The ASCII newline byte is 0x0a, which is decimal 10.

The common standard for writing special characters in a format string is the \ prefix, and a newline is written as \n. In this level, the two input bytes \n in the format string mean "write one output byte with value 0x0a". The backslash starts an escape sequence, and the next byte says which special byte to write.

argv[1]:  "hello\nworld"
output:   "hello"
          "world"

When your scan sees a backslash followed by n, skip both bytes and write one byte with value 0x0a. Keep supporting ordinary text.

When testing your program yourself, beware that some shell syntax can interpret \n before your program sees it. Use plain quotes around the format string so your program receives the two bytes \ and n, as in ./prog 'hello\nworld'.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 'hello\nworld'
hello
world
hacker@dojo:~$ /challenge/check prog

Turn \n into a newline byte.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Now your format string has syntax in it. Backslash starts escape sequences such as \n, and percent will start format markers such as %d.

That creates a practical problem: sometimes the output should contain a real backslash or a real percent sign. The usual formatter convention is to double the syntax byte. The two input bytes \\ write one output backslash byte, and the two input bytes %% write one output percent byte.

argv[1]:  "path\\file"
output:   "path\file"

argv[1]:  "progress: 100%%"
output:   "progress: 100%"

When your scan sees \\, skip both input bytes and write one backslash byte. When your scan sees %%, skip both input bytes and write one percent byte. Keep supporting ordinary text and \n escapes.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 'progress: 100%%'
progress: 100%
hacker@dojo:~$ /challenge/check prog

Turn doubled syntax bytes into one literal byte.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Literal output, newline escapes, and escaped syntax give you the scan-and-write loop. Now add the first marker: %d.

The % byte says "this is a marker". The d byte says "take the next argument value, treat it as a signed decimal number, and print that number here." The next command-line value starts at argv[2].

argv[1]:  "value=%d"
argv[2]:  "-42"
output:   "value=-42"

When your scan reaches %d, skip both marker bytes, convert the next argv string with atoi, convert the resulting number back to text with signed itoa, and write those digits immediately. Then continue scanning the format string. For this level, the format string has at most one %d marker. Keep handling ordinary text, \n, \\, and %% as before.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 'value=%d' -42
value=-42
hacker@dojo:~$ /challenge/check prog

Replace decimal markers as you scan.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

One %d marker lets the format string include one changing number. Real messages often need more than one value.

Now support several %d markers in the same format string. Each %d consumes the next command-line value, so your program needs to remember which argv entry comes next.

argv[1]:  "opened %d files and skipped %d"
argv[2]:  "7"
argv[3]:  "3"
output:   "opened 7 files and skipped 3"

The first %d uses argv[2], the second uses argv[3], and so on. After printing one number, continue scanning the format string after that marker.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 'opened %d files and skipped %d' 7 3
opened 7 files and skipped 3
hacker@dojo:~$ /challenge/check prog

Consume the decimal arguments in order.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

Now add %s, the marker for inserting a string. This is like %d, except the next argv value is already text, so you do not need atoi or itoa.

For %s, take the next command-line string, find its length, and write its bytes. For %d, keep doing the number conversion from the previous levels. Each marker consumes the next command-line value in order. The format string is the only string with formatter syntax. If the argument for %s contains bytes like \ or %, copy them literally instead of treating them as escapes or markers.

argv[1]:  "%s has %d flags"
argv[2]:  "hacker"
argv[3]:  "3"
output:   "hacker has 3 flags"

argv[1]:  "%s"
argv[2]:  "LITERAL\WITH\SLASHES"
output:   "LITERAL\WITH\SLASHES"

This means your program now tracks two positions: where you are in the format string, and which argv value should be consumed next.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog '%s has %d flags' hacker 3
hacker has 3 flags
hacker@dojo:~$ /challenge/check prog

Consume values in order and write each piece on demand.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

\n gives one named ASCII byte: newline, 0x0a. For arbitrary bytes, use a hex escape: \xNN.

You've seen hex before: two hex digits describe one byte. Here, the \x starts the escape, and the next two hex digits are the byte value to write.

format bytes:  "\x41"
hex value:      0x41
output byte (ascii):    "A"

Be careful about the conversion step: the format string contains ASCII characters, not numeric hex values yet. For \x4a, your program sees four input bytes: backslash, x, 4, and a. After recognizing the \x, the conversion uses the ASCII bytes for 4 and a. First convert each ASCII hex character into a 4-bit number, called a nibble. For 0 through 9, subtract the ASCII value of 0 to get 0 through 9. For a through f, subtract the ASCII value of a and add 10; for A through F, subtract the ASCII value of A and add 10. Then put the first nibble in the high half of the byte (using left shift, which you learned earlier!) and the second nibble in the low half: (first << 4) | second.

format text:      \ x 4 a
hex digits:           4 10
combined byte:        (4 << 4) | 10 = 0x4a
output byte:          "J"

When your scan sees \xNN, convert the two hex digits into one byte and write that byte. Keep supporting ordinary text, %d, %s, \n, \\, and %%.

Build and submit as before:

hacker@dojo:~$ as -o prog.o prog.s
hacker@dojo:~$ ld -o prog prog.o
hacker@dojo:~$ ./prog 'byte=\x2a'
byte=*
hacker@dojo:~$ /challenge/check prog

Decode the hex byte escape and write it.

Connect with SSH

Link your SSH key, then connect with: ssh [email protected]

sudo

30-Day Scoreboard:

This scoreboard reflects solves for challenges in this module after the module launched in this dojo.

7-Day | 30-Day | All-Time

Hackers Crews

Rank		Hacker	Badges		Score

Numbers as Strings

Computing 101.

(review) Assembly

(review) Assembly

A Single Digit 367 solves

A Single Digit

Connect with SSH

Two Digits 338 solves

Two Digits

Connect with SSH

String to Integer 1 hacking, 320 solves

String to Integer

Connect with SSH

Negative Numbers 310 solves

Negative Numbers

Connect with SSH

Where the Number Ends 308 solves

Where the Number Ends

Connect with SSH

A Whole Program 303 solves

A Whole Program

Connect with SSH

A Single Digit, Back to Text 1 hacking, 292 solves

A Single Digit, Back to Text

Connect with SSH

Divide and Remainder 286 solves

Divide and Remainder

Connect with SSH

Drop the Leading Zero 1 hacking, 277 solves

Drop the Leading Zero

Connect with SSH

Integer to String 267 solves

Integer to String

Connect with SSH

Negative Numbers, Back to Text 1 hacking, 260 solves

Negative Numbers, Back to Text

Connect with SSH

Sum Them All 2 hacking, 237 solves

Sum Them All

Connect with SSH

The Calculator

Addition 1 hacking, 212 solves

Addition

Connect with SSH

Subtraction 209 solves

Subtraction

Connect with SSH

Multiplication 206 solves

Multiplication

Connect with SSH

Bitwise Operators 1 hacking, 205 solves

Bitwise Operators

Connect with SSH

Unary Operators 2 hacking, 203 solves

Unary Operators

Connect with SSH

printf

Literal Output 193 solves

Literal Output

Connect with SSH

Newline Escapes 188 solves

Newline Escapes

Connect with SSH

Escaped Syntax 180 solves

Escaped Syntax

Connect with SSH

Decimal Markers 1 hacking, 164 solves

Decimal Markers

Connect with SSH

Multiple Decimal Markers 162 solves

Multiple Decimal Markers

Connect with SSH

String Markers 159 solves

String Markers

Connect with SSH

Hex Byte Escapes 159 solves

Hex Byte Escapes

Connect with SSH

30-Day Scoreboard: