DeepSec 2021 Presentation: Don’t get Hacked, get AMiner! Smart Log Data Analytics for Incident Detection – Florian Skopik, Markus Wurzenberger, Max Landauer
“Prevention is ideal, but detection is a must”. Active monitoring and intrusion detection systems (IDS) are the backbone of every effective cyber security framework. Whenever carefully planned, implemented and executed preventive security measures fail, IDS are a vital part of the last line of defence. IDS are an essential measure to detect the first steps of an attempted intrusion in a timely manner. This is a prerequisite to avoid further harm. It is commonly agreed that active monitoring of networks and systems and the application of IDS are a vital part of the state of the art. Usually, findings of IDS, as well as major events from monitoring, are forwarded to, managed and analyzed with SIEM solutions. These security information and event management solutions provide a detailed view on the status of an infrastructure under observation.
However, a SIEM solution is only as good as the underlying monitoring and analytics pipeline. IDS are an inevitable part of this pipeline, which spans from gathering data, including operating system logs, process call trees, memory dumps etc. from systems, feed them into analysis engines and report findings to SIEMs. Obviously, the verbosity and expressiveness of data is a key criterion for the selection of data sources. This is an art of its own and mainly dependent on answering what kind of common attack vectors today (see the MITRE ATT&CK framework) are reflected best in which sources (e.g., DNS logs, netflows, syscalls etc.). There are literally hundreds of tools and agents to harness the different sources and tons of guidelines on the configuration of these tools to control the verbosity and quality of resulting log data.
In terms of detection mechanisms, a wide variety of security solutions have been proposed in recent years to cope with increasing security challenges. While some solutions could effectively address upcoming cyber security problems, at least partially, research on intrusion detection systems is still one of the main topics in the IT security scientific community. Signature-based approaches are still the de-facto standard applied today for some good reasons: they are simple to configure, can be centrally managed, i.e., do not need much customization for specific networks, yield a robust and reliable detection and provide low false positive rates. While these are significant benefits for their application in today’s enterprise environments, there are, nevertheless, solid arguments to work on more sophisticated anomaly-based detection mechanisms:
Technical zero-day vulnerabilities are not detectable by blocklisting approaches; in fact, there are no signatures to describe the indicators of an unknown exploitation.
Attackers can easily circumvent conventional intrusion detection, once indicators are widely distributed. Re-compiling a malware with small modifications will change hash sums, names, IP addresses of command and control servers, rendering previous indicators useless.
The interconnection of previously isolated infrastructures entails new entrance points to ICT infrastructures. Especially in infrastructures comprising legacy systems and devices with small market shares, signature-based approaches are mostly inapplicable, because of their lack of (long-term) vendor support and often poor documentation.
Sophisticated attacks use social engineering as an initial intrusion vector. No technical vulnerabilities are exploited, hence, no concise blocklist indicators for the protocol level can appropriately describe erratic and malicious behavior.
Especially the latter aspect requires smart anomaly detection approaches to reliably discover deviations from a desired system’s behavior because of an unusual utilization through an illegitimate user in any area of an ICT network. This is the usual case when an adversary manages to steal user credentials, or access cards in case of facility security, and is using these legitimate credentials to illegitimately access a system, or gains physical access to a network device. However, an attacker will eventually utilize the system differently from the legitimate user, for instance running scans, searching shared directories and trying to extend his presence to surrounding systems. These activities will be executed at either unusual speed, or at unusual times, taking unusual routes in the network, issuing actions with unusual frequency, or causing unusual data transfers at unusual bandwidth. This will generate a series of events identifiable by anomaly-based detection approaches.
Tools such as the AMiner provide a line of defense, complementary to common signature-based approaches, by leveraging anomaly detection techniques that make use of machine learning to automatically learn a baseline of normal behavior and detect deviations from the generated models as suspicious activities that possibly relate to attacks. Thereby, the log processing pipeline of the AMiner comprises of several configurable modules. First, light-weight parser models extract relevant information such as timestamps, IP addresses, and usernames, from all kinds of logs, including access logs, audit logs, application logs, and more. The AMiner subsequently applies analysis techniques on the parsed data to learn a baseline of normal system events and their properties. On top of that, configurable detectors discover any deviations from this baseline, including detection of new values and value combinations, unusual character distributions of values, changes of event frequencies such as spikes or missing events, violations of expected correlation and sequence rules, as well as detection based on statistical distributions of values and event occurrences, among many others. All disclosed anomalies are eventually reported to security analysts for review and remediation through a number of interfaces, including message queues to store anomalies in databases or visualize them in SIEM dashboards. In our talk we will present a broad overview of the AMiner and explain its modules with the aid of several use-cases and hands-on examples.
The AMiner is free Software available at https://github.com/ait-aecid/logdata-anomaly-miner
Its underlying principles are discussed in a new book: https://www.springer.com/gp/book/9783030744496
Florian Skopik, Markus Wurzenberger and Max Landauer are with the Austrian Institute of Technology (AIT) where they develop new concepts, models and algorithms in the field of computer log data analysis and anomaly detection in national and international security research projects.