Malware Analysis and Attribution using Genetic Information (MAAGI)

The Situation

Cyber attacks, such as viruses, Trojans, and worms, are a growing threat to US missions and resources. To combat the growing threat of cyber attacks on US resources, the Defense Advanced Research Projects Agency (DARPA) created the Cyber Genome program. Cyber Genome aims to develop revolutionary, new cyber-forensic techniques to automate the discovery, identification, and characterization of malware variants.

Cyber image

(U.S. Air Force photo by Tech. Sgt. Cecilio Ricardo. Photo used with permission from the U.S. Air Force.)

The Charles River Analytics Solution

MAAGIAs part of the Cyber Genome program, Charles River developed and is refining MAAGI (Malware Analysis and Attribution using Genetic Information). In its current version, MAAGI combines ideas and techniques from biological evolution, reverse engineering of computer programs, and linguistics to rapidly identify the source and intent of new malware attacks. MAAGI makes use of the fact that malware authors often reuse code from one attack to the next, while trying to conceal this reuse from defenders by changing the “surface” features of the malware. By discovering the essential “genetic” properties of malware that are preserved from one malware sample to the next, MAAGI seeks to determine the lineage of each sample and uses the lineage to help characterize the source of the malware. Furthermore, by understanding the patterns of evolution in malware, MAAGI can be used to predict future malware development, anticipating potential attacks rather than — as we do today — merely reacting to them. MAAGI also uses methods from functional linguistics to identify the functional features and potential intent of malware, aspects that are especially likely to be preserved even when surface features change. MAAGI allows an analyst to view the evolution of malware on a gene-by-gene basis, as shown in the figure.


Some of MAAGI's features include:

Visualization of Malware Lineages

  • Cluster similar malware
  • View local lineage for a cluster
  • See similarities and differences within a cluster
  • Sort features by type (e.g., files, header properties, imports, traces of dll calls, etc.)

Selection Options

  • Similarity display updates based on the malware selected
  • Ability to select multiple samples
  • Selecting a whole lineage cluster displays similarity/difference characteristics

Convenient Lineage Views

  • View samples and descendants, but not ancestors
  • View ancestors, but not descendants
  • View Local Lineage compared to other samples
  • Filter features by in-group and out-group occurrence


  • Search for similar samples
  • Search for similar samples in other lineages
  • Substring search

The Benefit

MAAGI is an innovative approach to the Cyber Genome challenge of characterizing and predicting the evolution of malware. It supports detection and attribution of cyber attacks for both the defense and law enforcement communities.

By recognizing code and techniques from previous attacks, MAAGI enables quicker response times to defend against cyber attacks. MAAGI is proactive in that it not only assesses attacks, but anticipates and predicts the properties of future attacks. Finally, MAAGI changes the economics of malware by making it more difficult for malware authors to change superficial features and reuse their code.


The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Distribution Statement “A” - Approved for Public Release, Distribution Unlimited.


Contact Us