Malware is a pervasive problem in distributed computer and network systems.Identification of malware variants provides great benefit in early detection. Control flowhas been proposed as a characteristic that can be identified across variants, resulting inclassification employing flowgraph based signatures. Static analysis is widely used toconstruct the signatures but can be ineffective if malware undergoes a code packingtransformation to hide its real content. This thesis proposes a novel system, namedMalwise, for malware classification using a fast application level emulator to reversethe code packing transformation, and two flowgraph matching algorithms to performclassification: exact flowgraph matching and approximate flowgraph matching. Theexact flowgraph matching algorithm uses string based signatures of graph invariants,and is able to detect malware with near real-time performance. The approximateflowgraph matching algorithm is slower but more effective and uses the decompilationtechnique of structuring to generate string based signatures amenable to comparisonsusing the string edit distance. To demonstrate the effectiveness and efficiency of theautomated unpacking and flowgraph based classification, we evaluate the system withsynthetic malware and over 15,000 real samples. The evaluation shows our system ishighly effective in terms o
f accuracy in revealing all a sample‟s
hidden code, executiontime for unpacking and classification, and accuracy in detection of malware variants.