DeepSec 2021 Talk: Hunting for LoLs (a ML Living of the Land Classifier) – Tiberiu Boros, Andrei Cotaie
Living of the Land is not a brand-new concept. The knowledge and resources have been out there for several years now. Still, LoL is one of the preferred approaches when we are speaking about highly skilled attackers or security professionals. There are two main reasons for this:
- Experts tend not to reinvent the wheel
- Attackers like to keep a low profile/footprint (no random binaries/scripts on the disk)
This talk focuses on detecting attacker activity/Living of the Land commands using Machine Learning, for both Linux and Windows systems. Most of the AV vendors do not treat the command itself (from a syntax and vocabulary perspective) as an attack vector. And most of the log-based alerts are static, have a limited specter and are hard to update. Furthermore, classic LoL detection mechanisms are noisy and somewhat unreliable:
- They are dependent on the experience of the SME (Subject Matter Expertise) that creates them;
- they generate a high number of False Positives (because of the thin line in terms of tools and syntax between sysadmin operations and attacker operations);
- their rules grow organically, to the point where it is easier to retire and rewrite rather than maintain and update.
So, we made a robust, dynamic, high confidence project to fix this! We used Open-Source data, real incident data, a handful of Adobe’s SME and a lot of research and engineering. The presentation covers why it is hard to detect LoLs, the feature engineering used in our approach, comparison between different classifiers as well as hands-on experience using our library and integration into one of our previous open-source projects called One-Stop-Anomaly Shop (OSAS – https://github.com/adobe/OSAS). Additionally, we also discuss why OSAS and the LoL classifier are complementary solutions and how evading one will lead to being detected by the other.
* This project is scheduled to be open-sourced in August 2021.
We asked Tiberiu and Andrei a few more questions about their talk.
1) Please tell us the top 5 facts about your talk.
For us, this was one of the most exciting things we’ve worked on so far. There were a couple of challenges, such as the fact that we had to start from scratch, with our own dataset and our own tools. We were unable to find anything similar and it felt good putting it all together. We really hope to get other people involved by sharing our experience. This is also an open-source and open-development project.
2) How did you come up with it? Was there something like an initial spark that set your mind on creating this talk?
The reason for starting our project is a recent research paper called “Survivalism: Systematic Analysis of Malware Living-Off-The-Land”. The paper had some pretty compelling arguments to why it is hard and noisy (high false positive rates) to detect this type of malware. It felt like something challenging and we had a good feeling that we were going to pull this off.
3) Why do you think this is an important topic?
The idea to misuse system binaries and tools has been around for quite some time. It has the advantage that you can create fileless attacks or you can reduce the time needed to write your own code for certain operations. Less artifacts mean less chances of detection. Also, using these type of tools can easily confuse AV software and analysists, since most of them are also used in legitimate scenarios by sys-admins and power-users.
Still, if you look for datasets, tools and research papers there is not enough publicly available information on this topic.
4) Is there something you want everybody to know – some good advice for our readers maybe?
Well, this is just one of many types of intrusion detection mechanisms you need in your infrastructure. We only focus on misuse of LOL binaries and tools. It might seem obvious for most security experts, but we are going to say it anyway: Relying on just one type of detection, including LOLs, is not healthy from the security standpoint. Only by combining signature based, anomaly based, network profiling, obfuscation detection, system auditing and all the other well-established methods can one obtain a decent level of security and safety.
5) A prediction for the future – what do you think will be the next innovations or future downfalls when it comes to your field of expertise / the topic of your talk in particular?
As we mentioned, this is a field of research that is a little underrepresented and, obviously, our research just fills a small gap. In order for things to advance, there is a lot more effort to be put into this. As attackers get smarter, they will eventually find a way to avoid detection. But the important thing is that, the more things you have to evade, the more likely you are to make a mistake and to trigger something else. So, the important thing is to cover as many gaps as possible, whether it is though classical or machine-learning based approaches.
Tiberiu Boros is a Ph.D. in computer science, specifically in the field of Text-to-Speech (TTS) Synthesis. He is currently working for Adobe Systems Romania and is a former associate of the Research Institute for Artificial Intelligence of the Romanian Academy. Additionally, he maintains three Machine Learning open source projects (Stringlifier, OSAS, NLP-Cube). His research is focused on machine learning applied to security.
Andrei Cotaie is a Security Engineer specialized in Incident Response. Currently working for Adobe’s Security Coordination Center, Andrei made the transition from the public to the private sector almost 7 years ago. A big fan of automation and machine learning enthusiast, Andrei spends most of his time involved in monitoring and threat hunting projects, always trying to identify the latest unconventional attacks.