Identifying Shared Software Components to Support Malware Forensics

Ruttenberg, B.1, Miles, C.2, Kellogg, L.2, Notani, V.2, Howard, M.1, Ledoux, C.2, Lakhotia, A.2, and Pfeffer, A.1

Presented at the 11th Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Egham, England (July 2014)


Recent reports from the anti-malware industry indicate similarity between malware code resulting from code reuse can aid in developing a profile of the attackers. We describe a method for identifying shared components in a large corpus of malware, where a component is a collection of code, such as a set of procedures, that implement a unit of functionality. We develop a general architecture for identifying shared components in a corpus using a two-stage clustering technique. While our method is parametrized on any features extracted from a binary, our implementation uses features abstracting the semantics of blocks of instructions. Our system has been found to identify shared components with extremely high accuracy in a rigorous, controlled experiment conducted independently by MITLL. Our technique provides an automated method to find between malware code functional relationships that may be used to establish evolutionary relationships and aid in forensics.


1 Charles River Analytics

2 Software Research Lab, University of Louisiana at Lafayette

For More Information

To learn more or request a copy of a paper (if available), contact Brian Ruttenberg.

(Please include your name, address, organization, and the paper reference. Requests without this information will not be honored.)