MalwareGuard: FireEye’s Machine Learning Model to Detect and Prevent Malware
FireEye’s mission is to relentlessly protect our customers and their data with innovative technology and expertise learned from the front lines of cyber attacks. When it comes to protecting our customer’s endpoints, FireEye Endpoint Security has helped to create the endpoint detection and response (EDR) market and is an industry leader. Over the past year, we have significantly broadened the scope of Endpoint Security by integrating anti-virus (AV) protection. The AV functionality added malware prevention to our existing detection suite, which includes the ExploitGuard behavioral detection capability, as well as our Indicators of Compromise (IOC) detections.
Today, we are very excited to add a new machine learning (ML) layer to this defense-in-depth endpoint strategy: MalwareGuard. With MalwareGuard, customers will be able to detect and prevent malware from executing. This feature addresses an important need by detecting new malware on day zero that traditional AV technology misses.
MalwareGuard predicts whether a Windows executable is likely malicious prior to execution, and can therefore prevent malware from even gaining a foothold. To take full advantage of this new detection capability, we will also be deploying MalwareGuard to our Network Security and Email Security solutions. The ML model’s static detection capability effectively complements the existing Multi-Vector Virtual Execution (MVX) dynamic analysis engine that appears in the Network Security and Email Security. Performing both static and dynamic analysis increases the chances of detecting and stopping malware, making it more difficult for an attacker to evade detection.
In this post, we’ll cover the model’s goals and how we went about building and evaluating our solution.
We had a few objectives when we began the process of building MalwareGuard:
- The current detection and prevent scope is Windows Portable Executable (PE) files (e.g., EXE, DLL, and SYS files). The PE file format is a structure containing the information needed by Windows to load the executable code. PE files continue to make up a large segment of the malware universe, as evidenced by the distribution of file types submitted to VirusTotal (Figure 1).
- The focus of the ML algorithm is to take a PE file as input and output a prediction on whether the file is benign or malicious.
- Scoring a PE file must occur in sub-second times and require minimal memory usage and processing power. The time constraint ensures we can prevent files deemed malicious from executing.
- The ML model should have configuration settings that enable users to select the desired sensitivity. While we assume most users will want an extremely low false positive rate, we want to support other “hunting” use cases as well, where the ML model would be more aggressive at what it deems malicious.
FireEye is in an excellent position to achieve these goals. With our experience in responding to the most significant threats, we have access to a large and diverse population of malware. Going beyond commodity malware, FireEye has unsurpassed visibility into targeted and APT malware based on our Mandiant incident response (IR) engagements. In addition, the extensive knowledge of our subject matter experts on the FireEye Labs Advanced Reverse Engineering (FLARE) team provides in-depth knowledge. This combination of experience and intelligence allows for creating a unique dataset. The partnership between data scientists and reverse engineers for creating MalwareGuard was important for producing the best possible model.
Our first step was to assemble a dataset and create the pipelines for ingesting streams of PE files and their associated metadata. We curated a comprehensive and unique collection of malware from internal and external sources covering many years of activity. To accomplish that, we took advantage of two FireEye resources: 1) The MVX dynamic analysis engine was used to identify and label a portion of the malware, and 2) The FLARE team’s analysis reports were ingested to leverage the time and work of our reverse engineers.
The real challenge we faced was creating a benign sample set. To do that, we started off by including samples with known, trusted provenance. Given the diversity of benign PE files, we discovered that we needed to augment the initial benign set with additional samples that could help capture more of the variability that FireEye sees in the wild. As of today, our dataset has grown to more than 300 million samples. When training a model, we divide the dataset into three groups (training, validation, and testing) based on when the sample was first seen. The training set consists of the oldest samples, followed by the validation set, and then the test set representing the most recent samples. Breaking up the data this way allows us to gauge how well a trained model would hold up over time.
The next step was to decide how to encode a PE file as input for an ML algorithm. We explored two different options:
- A traditional machine learning approach where we craft a set of features based on static analysis of the PE file.
- A deep learning approach where the raw bytes in the PE file are used as input to the algorithm.
Exploring the first option, we worked closely with FLARE’s subject matter experts to identify indicators and capture relevant characteristics of the data. Example features are:
- The amount of byte randomness (entropy) in the text section of a PE file
- The number of sections in a PE file
- The presence or absence of particular API calls
These features capture some of the information about the structure and content of the file, and therefore could be informative for helping predict whether the file is benign or malicious. The iterative process of creating these static features is referred to as feature engineering. Feature engineering stands in stark contrast with the second deep learning option that we explored. The input to the deep learning algorithm is just the sequence of bytes that make up the PE file. It is the deep learning algorithm’s job to transform the bytes into a useful representation that allows for distinguishing malicious from benign samples. This is a very active area of research not just in the information security community. We presented our findings with deep learning algorithms at an applied ML conference last year. Ultimately, we had success using both the traditional and deep learning options.
Next, we trained a variety of supervised learning models. This included random forests, gradient boosted trees, neural networks, and logistic regression models for the SME identified features, as well as convolutional neural networks and recurrent neural networks for the deep learning case. A typical experiment consisted of training the model on the PE files in the training set while using the validation set for tuning the model’s hyper-parameters, then evaluating the trained model on the test set to determine the number of correct predictions, false positives and false negatives. To compare the results from different models, we used the area under the ROC curve (AUC). The AUC statistic is a single number between 0 and 1. A model with predictions that are always incorrect has an AUC of 0, while a model with predictions that are always correct has an AUC of 1. Our top performing malware detection models had a score of 0.9998. We selected a few of the best performing models for further evaluation.
For the past year, we have been running several promising candidate models during Mandiant IR and Managed Defense engagements to gather performance metrics. We also created an internal service where teams at FireEye could submit their PE files for ML model scoring. To date, we have made predictions on more than 20 million new PE files during this evaluation. This provided an invaluable source of feedback that allowed us to do several things. We were able to identify and address gaps in our data collection, and evaluating the false positives and false negatives permitted some additional, targeted feature engineering. These are important steps in the iterative process of producing a robust machine learning model. The last step in our internal evaluation was selecting the final model from amongst the candidates, which we did again using the AUC statistic.
During the internal evaluation period, we also developed the infrastructure to support long-term tracking and maintenance for MalwareGuard. Our goal was and is to have real-time visibility into the model’s performance, with the expectation that model retraining could be done on demand when performance dips below a threshold. To meet this objective, we developed data pipelines for each phase of the ML process, which makes the system fully automatable.
FireEye customers using Endpoint Security, Network Security or Email Security will benefit from MalwareGuard. Clients can expect a significant improvement on the ability to detect malware – even zero-day threats. For our Endpoint Security customers in particular, MalwareGuard is an important addition to our integrated, defense-in-depth solution. This solution consists of Indicators of Compromise detection, malware signature detection and prevention, ExploitGuard behavioral detection and blocking, and now a new layer with our machine learning malware detection and prevention in MalwareGuard.
While we are excited to reach this milestone announcement today, we are just as eager to improve MalwareGuard over time. The malware threat environment continues to evolve rapidly, and we believe that applying machine learning allows for detecting and stopping new malware faster than conventional signature-based approaches.