In the light of the recent news about the collection of call detail records (CDR) the term metadata has come up. Unfortunately the words cyber, virtual, and meta are used quite often – even as a disguise to hide information when not being used in a technical context. We have heard about all things cyber at the last DeepSec conference. The word virtual is your steady companion when it comes to All Things Cloud™. Now we have a case for meta.
Actually metadata is what forensic experts look for – a lot. Metadata usually lives in transaction logs or is part of a data collection. It describes the data it accompanies. Frequently you cannot make sense out of or use the data without the corresponding metadata. A well-stocked library seems like a labyrinth if you have no access to the library catalogue. Likewise the content of your harddisk(s) makes no sense without file name and directories. Without metadata you have no instructions to figure out what you are looking at. Databases are structured by use of metadata. Once your metadata is gone, your database record become meaningless. That’s why everyone is so keen about acquiring, storing and analysing metadata.
What can you do with metadata? Well, you could extract groups of people or organisations from communication logs. If you take a look at the graph, then you see nodes representing communication end-points (usually people) linked by communication (usually messages such as e-mails, tweets, phone call, …). The graph says a lot more than the log itself. You can clearly see hubs attracting/sending lots of messages. Additionally you see links between hubs and therefore can identify groups. IT security staff regularly uses these methods to gain insight into log files of security systems and applications. It’s used for defence in this context. You can very easily use filters, transformations and visualisation for other purposes as well. This is the reason why metadata is not less valuable as data. Often it’s the other way around. The conclusion: Protect your metadata!
For other interpretations of metadata see the blog post by Kurt Opsahl. He describes five examples where you can deduce the content of communication just by examining the metadata.
In case your infrastructure is distributed and you do not own or control all the systems your communication flows through, then the task of protecting metadata gets difficult or even impossible. This doesn’t mean that you shouldn’t care; you should. You need to know where and when your organisation leaks information and what the impact is. Some companies leak their entire business relations through their transaction logs. Make sure you know what this means!