In our system we first determine if malware is packed usingentropy analysis. In the next stage we have experimented with packer classification. Packer classification is an approach that can be used to identify the type of packer that is used and then deploy packer specific unpacking strategies. The concept is that someclasses of packers can be unpacked using simpler and lesscomputationally complex approaches. We perform malwareclassification techniques on the low entropy program code anddata in the binary to determine the packer type. For packers that
we don’t handle, we can decide to immediately mark these
potential malware as suspicious.
Application Level Emulation
Although application level emulation is foiled by anti-emulationtechniques used in malware, it can still successful unpack a largenumber of malware. Application level emulation is sometimeseasier to develop and more flexible than manually written staticunpackers. It can perform in real-time.
Static Unpacking Based on PackingAlgorithm
One novel approach that we have experimented with avoids the problems of undecidable code analysis or imperfect codeemulation. Our methodology is to use string matching to detect if and what type of off-the-shelf compression algorithm is used, andthen use standard decompression tools on the identified blob of packed content. The advantage of this approach is that the systemis not prone to anti-emulation or anti static analysis techniques. Inwidely used open source packing tools, it is quite common for thetool be modified by malware authors who add anti-analysis andanti-emulation code fragments. Our system can easily unpack these types of malware.
Detecting the Packing Algorithm
We detect packing algorithms by constants that are present in thecode. Many compression algorithms make heavy use of constantsin tables or magic bytes for headers and these can be identifiedusing string matching. Antivirus software is already very good atstring matching, and our approach employs the same types of algorithms.
Detecting the Packed Blob
To detect the packed blob, we make use of entropy analysis.Calculating a sliding window of entropy over the binary image wecan clearly identify the region of the image that is compressed.However, the precision is limited to the window size. We cannotdetermine the precise beginning of the packed content, but have agood idea that the packed content begins in a particular window.
Static Unpacking of Compressed Blobs
Given a window that the packed blob begins in, we attempt todecompress the data by brute force unpacking at every offsetwithin that window. If we have chosen the correct offset, then thedecompression or decryption routine will succeed and give us asignificantly sized resultant image. If we have chosen the wrongoffset, then the decompression should abort since the image will be detected as corrupt. At this point, we have unpacked themalware and can pass the result to a classification tool, or to ahuman analyst for further investigation.
Code packing is a tool used by malware authors that can hinder static analysis. If the goal of malware analysis is to have access to
the malware’s real code, then unpacking is necessary. This may be
required if malware are being grouped into families. Sometimeslegitimate commercial software is packed which means withoutanalysis of the hidden code Antivirus would incorrectly label it asmalicious. Unpacking code can be a challenging problem and nontraditional packing techniques such as instruction virtualizationare quickly becoming more and more used by malware authors. Itseems that the only safe solution for Antivirus is to perform packer detection and flag all such occurrences as potentialmalware. For legitimate software, white listing and co-ordinationwith Antivirus vendors may be the only secure way forward.
Packer DetectionPacker ClassificationApplication LevelEmulationStatic Unpacker based on Packer AlgorithmSuspicious...MalwareClassificationpacked
Unknown or can’t unpack
UnpackedNot packedUnpackedUnknownBinaryMaliciousBenignHash DatabaseUnknownWhite listedBlack listed
Figure 2. The proposed system.