Mandiant Data Science Showcases Latest Security Machine Learning Research at CAMLIS ‘21

Philip Tully, David Krisiloff
Nov 02, 2021
3 mins read
Machine Learning
Data Science

The Mandiant Data Science (MDS) team’s mission is to develop innovative machine learning solutions that apply Mandiant’s unique expertise and threat intelligence at scale for our customers. MDS is involved in many diverse projects delivered as part of the Mandiant Advantage SaaS platform, but we also present and publish cutting-edge research at the intersection of security and machine learning at leading industry and academic conferences. We are proud to announce that our team recently had four talks accepted at the Conference on Applied Machine Learning in Information Security (CAMLIS). CAMLIS brings together researchers and security practitioners to share technical findings. Here is how we will be contributing to the conference:

Thursday November 4, 10:30 AM – 12:25 PM EDT; (Malware Analysis Track)

Lightweight, Emulation-Assisted Malware Classification

Xigao Li, David Krisiloff, and Scott Coull

  • Summary: We set out to explore the practical considerations of using emulation to drive machine learning models for malware analysis applications. In particular, we examine speed/accuracy tradeoffs and potential synergies with static analysis methods. This work was completed as part of our summer research internship program.

Annotating Malware Disassembly Functions Using Neural Machine Translation

Sunil Vasisht, Philip Tully, and Jay Gibble

  • Summary: MDS teams up with Mandiant’s FLARE team to share novel applications of machine learning for predicting disassembly function names using code-to-sequence neural networks. The talk will show how to leverage neural machine translation NLP models to aid reverse engineers as they piece together code functionality while triaging complex malware samples.

Thursday November 4, 4:05 PM – 4:50 PM EDT; (Lightning Talks Track)

SOREL-20M: A Large-Scale Benchmark Dataset for Malicious PE Detection

Richard Harang and Ethan Rudd

  • Summary: A new, publicly-available malware dataset, called SOREL-20M, is presented. It is the largest public labeled malware dataset in the world, and the talk will discuss associated characteristics and challenges. (Research conducted while both authors were at Sophos).

Loss on Demand: Toward Discriminative-Generative Hybrid Models for Malware Classification Confidence

Ethan Rudd and David Krisiloff

  • Summary: Do you know how confident your malware predictions are? In this talk, we tackle the problem of quantifying classification confidence for malware detection models using neural network losses that can be evaluated at inference time using a novel hybrid discriminative/generative.

For those unable to attend CAMLIS in-person, the talks will be livestreamed at no cost. If you find any of the presented material interesting, would like to develop data-driven tools to find evil, and/or are keen to work on multidisciplinary projects at the intersection of cyber security and machine learning, please consider joining the MDS team by applying to one of our job openings. We are currently hiring for Staff Data Scientist and Data Science 2022 Intern positions.