Hierarchical Management of Large-Scale Malware Data

Kellogg, L., Ruttenberg, B., O’Connor, A., Howard, M., and Pfeffer, A.

Presented at the IEEE International Conference on Big Data 2014 (IEEE BigData 2014), Washington, DC (October 2014)

As the pace of generation of new malware accelerates, clustering and classifying newly discovered malware requires new approaches to data management. We describe our Big Data approach to managing malware to support effective and efficient malware analysis on large and rapidly evolving sets of malware. The key element of our approach is a hierarchical organization of the malware, which organizes malware into families, maintains a rich description of the relationships between malware, and facilitates efficient online analysis of new malware as they are discovered. Using clustering evaluation metrics, we show that our system discovers malware families comparable to those produced by traditional hierarchical clustering algorithms, while scaling much better with the size of the data set. We also show the flexibility of our system as it relates to substituting various data representations, methods of comparing malware binaries, clustering algorithms, and other factors. Our approach will enable malware analysts and investigators to quickly understand and quantify changes in the global malware ecosystem.

For More Information

To learn more or request a copy of a paper (if available), contact Brian Ruttenberg.

(Please include your name, address, organization, and the paper reference. Requests without this information will not be honored.)