Congratulations to Ed Robbins, Research Associate in the School of Computing, who has recently passed his PhD viva, with minor corrections. His PhD thesis was entitled ‘Solvers for Type Recovery and Decompilation of Binaries’.
The thesis made the case that powerful boolean satisfiability, Satisfiability Modulo Theory and Constraint Handling Rules solvers are perfectly suited to developing reverse engineering tools.
——————————————————————————–
Abstract
The last decade has seen the rise of powerful boolean satisfiability (SAT), Satisfiability Modulo Theory (SMT) and Constraint Handling Rules (CHR) solvers, that are able to solve a range of computationally hard problems. This thesis makes the case that these solvers are perfectly suited to developing principled, accurate and scalable reverse engineering tools.
It is first shown that type recovery can be formulated as SMT over a new theory, that of rational tree unification, and a solver straightforwardly implemented in Prolog by utilising reification. This approach is also shown to be well suited to implementation of fast and succinct (in terms of lines of code) SMT solvers for other theories, in particular Linear Real Arithmetic (LRA) and Integer Difference Logic (IDL). A framework for SMT solving in Prolog is constructed, and the solvers benchmarked. The IDL solver in particular performs very competitively.
Previous type reconstruction systems have relied entirely on empirical evaluation of the correctness of recovered types, by comparing against those found in the source code. The next part of this thesis focuses on overcoming this limitation by designing a type recovery system that in some sense guarantees correctness of recovered types. Like decompilation of binary code, type recovery from binaries is a one-to-many mapping. In type reconstruction this mapping is parameterised by a high level type system that dictates the possible recoverable types. To prove correctness of recovered types a decompiler was designed, for which type consistency dictates which of many possible decompilations is chosen for each machine instruction. Thus type recovery in fact generates a decompiled program, called a witness. By proving that the witness preserves the semantics of the binary, a strong correspondence between the recovered types and the binary is provided. However, relevance of both the witness and the types can only be ensured if the target language is itself type safe. Therefore a type-safe dialect of C, MinC (Minimal C), was designed. For simplicity, the machine language is an abstraction of x86, MinX. The decompilation is defined as an incomplete relation between MinX and MinC, with satisfying witness programs found using a CHR based solver. The system is evaluated against twenty programs that manipulate data structures.
The final piece of work in the thesis concerns optimisations for the principal operation of the octagon abstract domain. Octagons are closely related to integer differences, and concern linear inequalities of the form +/- x +/- y =< C, where x and y are variables and c is a constant. The primary operation of the domain is closure, which, given a set of octagonal constraints, makes all entailed constraints explicit. For example, given the constraints x – y =< c1 and y – z =< c2 closure will derive x – z =< c1 + c2. This is achieved by reusing the matrix representation normally used for differences alongside a modified shortest path algorithm. The thesis contributes by observing that the incremental operation of adding a single new constraint to an already closed system can only affect existing constraints in a limited number of ways. The correctness of the approach is proven, and experimental results provided that show that the new approach is almost an order of magnitude faster than existing algorithms.