At the very beginning of the binary, it asks us to input a 10 bytes string of which md5 is equal to 9f46a92422658f61a80ddee78e7db914… I tried searching this md5 on some online database but got no answer. Since it is also impossible to brute force an 80 bits input, I was totally stuck at this stage.
When I was going to give up, one of my teammates reminded me that the input is also being used in the latter part of the program. Maybe we could find other constraints to reduce the search space of the input. It turned out to be the right direction, so let’s see what the remaining program does.
Roughly speaking, there are two segments of data. They are both operated with some sort of “decryption” in sub_D1D with our input as the key and then copied to a mmaped memory with the execute permission in sub_1467. Finally, in sub_C45, we can choose option 3 to jump to the first segment of the decrypted data and execute them as instructions.
Now, we know that at least the first segment of data should be decrypted to legal x86 instructions. Let’s see what sub_D1D actually does.
The decompiled code is quite complicated, but we can see that the user’s input is only involved in some xor operations. Maybe we can find out what this code snippet does by observing the input and the output of it. To do this, I first patched the md5 checking and input '\x00'*10 to see what is the effect without user’s input through gdb.
If we observe the input and the output carefully, we would find that the code is only shuffling the input. Then, it is time to exam the effect of the user’s input. Using the same procedure, we can easily find that every byte of user’s input is xor with some fixed bytes of data, so I wrote a script to visualize this operation.
# Launch the process and gdb. Then, dump the data after decryption. for i in range(11): gdb = process(['gdb', '-q']) gdb.recvuntil(gdb_prompt, drop=True)
# Compare the original data with data decrypted by differnt keys. print' 0 1 2 3 4 5 6 7 8 9' for i in range(len(decrypted[0])): print decrypted[0][i], for j in range(1, 11): if decrypted[0][i] != decrypted[j][i]: print colored(decrypted[j][i], 'red'), else: print decrypted[j][i], print''
The output would looks like (only portion):
Data affected by the nth byte of user’s input are highlighted. Now, we know how the “decryption” works. If we also know what the decrypted code should be, we can derive the input inversely. Unfortunately, we have no clue about the decrypted code. However, it is reasonable to guess that the prologue and epilogue should be the standard push rbp; mov rbp, rsp; and leave; ret;. With only these bytes of information, we can establish several relations between bytes of user’s input and shrink the search space of md5 to a small range. (By the way, these relations suggest that the MSB of the xor of any two bytes of user’s input is 0, so I boldly guess that the input is printable.)
I wrote a C code to brute force the correct user’s input, which is $W337k!++y.
After providing the correct input, the challenge is still not finished.
***** hello? ***** >>> $W337k!++y - What kind of pet would you like to have? - Select the number of pet! 1. angelfish 2. bear 3. cat 4. dog 5. I don't want pets # number = 3 Did you choose a cat????? What type of cat would you prefer? '0' >>>
It reads 0x18 bytes to the return address just before returning, which means we have a three-gadget ROP. Since this is a PIE binary, and we don’t have any information leak, and the ROP chain is extremely short, it seems that we can basically do nothing. Nevertheless, the author of the program has deliberately put the exact three gadgets that we need in the second part of the data. Because the second part of the data is located in a mmaped memory of which address is known, we can easily use them to launch the shell.