Detecting malicious campaigns in obfuscated JavaScript with scalable behavioral analysis

Oleksii Starov, Yuchen Zhou, and Jun Wang

Palo Alto Networks

WTMC 2019 (IEEE Security and Privacy workshop)

Abstract
Modern security crawlers and firewall solutions have to analyze millions of websites on a daily basis, and significantly more JavaScript samples. At the same time, fast static approaches, such as file signatures and hash matching, often are not enough to detect advanced malicious campaigns, i.e., obfuscated, packed, or randomized scripts. As such, lowoverhead yet efficient dynamic analysis is required.
In the current paper we describe behavioral analysis after executing all the scripts on web pages, similarly to how real browsers do. Then, we apply light “behavioral signatures” to the collected dynamic indicators, such as global variables declared during runtime, popup messages shown to the user, established WebSocket connections. Using this scalable method for a month, we enhanced the coverage of a commercial URL filtering product by finding additional 8,712 URLs with intrusive coin miners. We evaluated the impact of increased coverage through telemetry data and discovered that customers attempted to visit these abusive sites more than a million times. Moreover, we captured 4,633 additional distinct URLs that lead to scam, clickjacking, phishing, and other kinds of malicious JavaScript.
Our findings present up-to-date trends in unauthorized cryptographic coin-mining, and show that various scam kits make up another big fraction of the modern threat landscape on the Web.
Why behavior analysis?
Obfuscation is a big obstacle to static analysis of malicious JavaScript. There are tons of off-the-shelf and customized code transformers. With behavior analysis such as instrumenting headless Google Chrome to collect global variables, popup messages and websocket connections, we catch the invariants the attacker cannot hide. For example, in a crypto-jacking scenario, the attacker has to send the mined results back to get credit for his crypto-wallet; similarly, to facilitate embedding of crypto-jacking libraries such as coinhive.js, the library has to make its "handler" global for a short snippet to access it. These artifacts are ones we are looking for in this detection approach.
Paper
Our WTMC 2019 paper can be found at here.