Turns out converting files into images is a highly effective way to detect malware

(Image credit: Pixabay)

A branch of artificial intelligence called machine learning is all around us. It's employed by Facebook to help curate content (and target us with ads), Google uses it to filter millions of spam messages each day, and it's part of what enabled the OpenAI bot to beat the reigning Dota 2 champions last year in two out of three matches. There are seemingly endless uses. Adding one more to the pile, Microsoft and Intel have come up with a clever machine learning framework that is surprisingly accurate at detecting malware through a grayscale image conversion process.

Microsoft detailed the technology in a blog post (via ZDNet), which it calls static malware-as-image network analysis, or STAMINA. It consists of a three-step process. In simple terms, the machine learning project starts out by taking binary files and converting them into two-dimensional images.

(Image credit: Microsoft)

The images are then fed into the framework. This second step is a process called transfer learning, which essentially helps the algorithm build upon its existing knowledge, while comparing images against its existing training.

Finally, the results are analyzed to see how effective the process was at detecting malware samples, how many it missed, and how many it incorrectly classified as malware (known as a false positive).

As part of the study, Microsoft and Intel sampled a dataset of 2.2 million files. Out of those, 60 percent were known malware files that were used to train the algorithm, and 20 percent were used to validate it. The remaining 20 percent were used to test the the actual effectiveness of the scheme.

Applying STAMINA to the files, Microsoft says the method accurately detected and classified 99.07 percent of the malware files, with a 2.58 percent false positive rate. Those are stellar results.

"The results certainly encourage the use of deep transfer learning for the purpose of malware classification. It helps accelerate training by bypassing the search for optimal hyperparameters and architecture searches, saving time and compute resources in the process," Microsoft says.

STAMINA is not without its limitations. Part of the process entails resizing images to make the number of pixels manageable for an application like this. However, for deeper analysis and bigger size applications, Microsoft says the method "becomes less effective due to limitations in converting billions of pixels into JPEG images and then resizing them."

In other words, STAMINA works great for testing files in a lab, but requires some fine tuning before it could feasibly be employed in greater capacity. This probably means Windows Defender won't benefit from STAMINA right away, but perhaps sometime down the line it will.

Paul Lilly

Paul has been playing PC games and raking his knuckles on computer hardware since the Commodore 64. He does not have any tattoos, but thinks it would be cool to get one that reads LOAD"*",8,1. In his off time, he rides motorcycles and wrestles alligators (only one of those is true).