Wednesday 27 May 2015

Malware Identification by Statistical Opcode Analysis

The objective: This project determined the efficacy of statistical analysis of program assembly instruction (opcode) frequencies to identify Malware from Goodware.

Methods/Materials
Malware and Goodware binaries were obtained and a python script was created to extract opcode frequencies from specific parts of these files. Naive Bayes models and Kmeans based models were then trained using these executables. These models were tested using a different set of programs to determine their efficacy at identifying Malware from Goodware.

Results
The best Naive Bayes model had a recall of 1 for Malware and .8 for Goodware.

Conclusions/Discussion
Differences in opcode frequencies can differentiate Malware from Goodware. Certain instructions occur much more frequently in one group than in the other; these differences can be used to identify the two types of programs.
TThis project examines models that differentiate Malware from Goodware using the frequencies of program assembly instructions.

No comments:

Post a Comment