Designing an object-oriented decompiler

Title: Designing an object-oriented decompiler – Decompilation support for Interactive Disassembler Pro

Decompilation, or reverse compilation, takes a computer program and produces high-level code that works like the original source code. This makes it easier to understand a computer program when source code is not available. However, there are very few tools for decompilation available today. This report describes the design and implementation of Desquirr, a decompilation plug-in for Interactive Disassembler Pro. Desquirr has an object-oriented design and performs basic decompilation of programs running on Intel x86 processors.The low-level analysis uses knowledge about specialized compiler constructs, called idioms, to perform a more accurate decompilation. Desquirr implements data flow analysis, meaning the conversion from primitive machine code instructions into code in a high-level language. The major part of the data flow analysis is the Register Copy Propagation which builds high-level expressions from primitive instructions. Control flow analysis, meaning to restore high-level language constructs such as if/else and for loops, is not implemented.A high level representation of a piece of machine code contains the same information as an assembly language representation of the same machine code, but in a format that is easier to comprehend. Symbols such as ?*? and ?+? are used in high-level language expressions, compared to instructions such as ?mul?and ?add? in assembly language. Two small test cases which compares decompiled code with assembly language shows promising results in reducing the amount of information needed to comprehend a program.

Reference URL 1: Visit Now

Author: David Eriksson

Source: Blekinge Institute of Technology


1 Introduction
1.1 Reverse compilation
1.2 Goal and objectives
1.3 Outline for remaining chapters
2 Method
2.1 Make a basic decompiler for 32-bit 80386 machine code
2.2 Make the decompiler object-oriented
3 Data flow analysis
3.1 Overview
3.2 Static compared to dynamic analysis
3.3 Idioms
3.3.1 C calling convention
3.3.2 Memcpy an unknown number of bytes
3.3.3 Memcpy a constant number of bytes
3.3.4 Compare and set boolean
3.3.5 The question-mark-colon operator
4 Design of the decompiler
4.1 Data structures
4.2 Data structure example
4.3 Function calls
4.4 Analysis class
5 Results
5.1 Decompiling the Fibonacci calculation
5.2 Decompiling the palindrome test
5.3 Object-orientation of decompiler
6 Discussion
7 Conclusions
A Desquirr class reference
A.1 Node class hierarchy
A.2 Instruction class hierarchy
A.3 Expression class hierarchy

Leave a Comment