DeepSec 2021 Talk: Real-Time Deep Packet Inspection Intrusion Detection System for Software Defined 5G Networks – Dr. Razvan Bocu
The world of the Internet of Things apparently becomes fundamental for the envisioned always connected human society. The 5G data networks are expected to dramatically improve the existing 4G networks’ real world importance, which makes them particularly necessary for the next generation networks of IoT devices. This talk reports the authors experience, which was acquired during the implementation of the Vodafone Romania 5G networked services. Consequently, this blogpost about our talk describes a machine learning-based real time intrusion detection system, which has been effectively tested in the context of a 5G data network. The system is based on the creation of software defined networks, and it uses artificial intelligence based models for the deep inspection of the transferred data packets. It is able to detect unknown intrusions through the usage of machine learning-based software components. The system has been assessed using real-world data, and the outcomes of the experimental process prove that it achieves superior performance with a lower overhead in comparison to similar approaches, which allows it to be deployed on real-time 5G networks.
Introduction
The global Internet network is presently made of several billion devices, and it is continuously expanding. This evolutionary trend is connected to the increased utilization of consumer-grade electronic devices, which incorporate a wide array of optical systems and data collection sensors.
It is relevant to note that the limited computational capabilities, which are installed on these devices, often imply that the actual data processing is offloaded to external third party devices that possess the required computational capacity. Additionally, these devices are connected through communication links that are established between them, which have the role to send synchronization and state data. The 4G mobile data networks cannot offer the development possibilities that are mandatory for the sustained long term evolution of the IoT networks. As a consequence, it is natural to infer that the 5G networks will form the backbone of the future high speed data networks.
The authors of this blogpost have contributed to the design and deployment of Vodafone Romania 5G networked services. Thus, this abstract of our talk considers the theoretical and practical experience that was gathered, and consequently describes an intelligent intrusion detection system that considers the creation of software defined networks, and uses artificial intelligence-based models. It is able to detect unknown intrusions by using machine learning algorithms and deep data packets inspection.
The rest of this blogpost is organized considering the following sections. Section 2 discusses the virtualized wireless networks, which allow for the logical architecture of the 5G Internet of Things (IoT) networks to be designed and implemented in practice. Moreover, the intrusion detection system is described. Following, the performance assessment process is presented. Consequently, the last section concludes the post.
2. Remarks Regarding the Virtualized Wireless Networks
The virtualized wireless network function (VWNF) is a fundamental component and process, which is related to the design and implementation of the 5G data networks. Thus, this approach has been considered in order to deploy the core of the 5G data network that was mentioned. This process allows for the reliable partition of the hardware networked infrastructure to occur through the logical specification of a self-sufficient 5G network, which considers the network functions virtualization (NFV). The research in this field is naturally interesting, both from a theoretical and practical perspective. Moreover, this approach support the real-world deployment of the specialized 5G networks on specific hardware and software infrastructures, such as cloud infrastructures, or telecommunications service providers networks [8]. This model was considered in order to implement specialized networked services on the 5G data network of the respective telecommunications service provider. The experiments that we conducted suggest that the virtualized networked environment provides the required logical flexibility and scalability, which allowed for the efficient deployment of the real-time intrusion detection system [9] to be performed. This relevant networking virtualization model is presented in Figure 1.
We demonstrated that this virtualization model sustains the proper real-time and deep packets inspection-based processing of the data traffic that flows through the provider’s 5G data network in order to detect known or potential threat patterns. We were able to prove that the considered mechanism is appropriate for the creation of the required specialized virtual data network infrastructure, which also supports the developments that are described in paper [10].
We have also demonstrated that the logical specification of the 5G networks optimizes the allocation and usage of the radio resources. This is a consequence of the creation of logical sub-networks, which are individually analyzing the 5G data traffic considering individual instances of the real-time intrusion detection system. Consequently, the findings that are reported in this paper extend and refine the work that is reported in [11]. Additionally, the experimental process that we designed and conducted confirms that properly defined and sized virtual 5G networks are able to support even applications that process large amounts of real-time data, like it is the case with the intrusion detection system.
3. The Intrusion Detection System
The architectural features of the presented intelligent intrusion detection system are described in Figure 2. The logical architecture of the system considers three layers: the data traffic forwarding layer, the data management and control layer, and the machine learning-based data analysis layer. The data forwarding layer performs the data traffic monitoring and capturing. It is able to gather and send the suspect data packets to the control layer, and it also blocks the malicious data traffic according to the instructions of the controller. Furthermore, the data management and control layer detects the suspicious data patterns through deep packets inspection techniques, and it also determines anomalies using the analyzed data. Consequently, it takes proper protective measures according to the decisions made by the data analysis layer, and it consequently instructs the data forwarding layer.
The data forwarding layer supplies the data management and control layer with real-time network status data through the real-time collection of suspect data patterns. Furthermore, intrusions are immediately blocked by dropping the malicious packects under the supervision of the other system layers.
The packet collection and data flow partitioning layer offers a more global perspective on the entire 5G data network that is monitored. Thus, the status monitoring module processes the relevant network status data, and it continuously analyzes the data packets that it receives. The data management and control layer processes and parses the received data traffic. Furthermore, it creates relevant clusters of data packets and generates a proper digital fingerprint, which considers the following logical network parameters: the source IP address, the destination IP address, the source port, the destination port, the session duration, and the considered network protocol.
The data fingerprints are important in order to define and label various data records that pertain to the intercepted data streams, which represent specific network connections and activities. The deep data packets collection and inspection is performed continuously. The data collection and inspection time interval is optimized in order to avoid possible undesirable delays regarding the real-time data analysis process.
The anomaly detection takes into account some basic data flow statistics, which are used to detect abnormal behaviors and potential anomalies. The particular intrusion detection system’s module enforces an entropy-based data analysis, which considers the Shannon’s theory for the proper detection of the distribution variations relative to the analysed data packet samples. The entropy of a random variable x is calculated using the following formula:
Here, p(xi) is the probability for x to take the value xi relatibe to all the already detected values. The equation considers four main parameters: the source IP address, the source port, the destination IP address, and the destination port. The values of these parameters are acquired by the real-time traffic analysis module of the system. Thus, considering a particular moment in time, the continuously updated value that the enthropy function H(x) generates helps to discover possible malicious data traffic patterns. Considering that E refers to the mean entropy, and S designates the corresponding standard deviation, a potential suspect pattern involves that the value of H(X) is outside the interval [(E-S),(E+S)]. Consequently, the suspect data packets are sent over to the proactive data analysis layer, which conducts supplementary analysis processes.
The feature selection component is designed in order to construct and update the features set, which contains patterns that conform to the detected malicious data patterns. This component is designed in order to process large amounts of data in a real-time manner. Furthermore, it removes the data features that are irrelevant to the machine learning core, which is part of the system’s proactive data analysis layer. Consequently, the data is separated into relevant categories, which helps to clearly isolate benign data traffic patterns from malicious data traffic patterns.
The data that is displayed in Table 1 considers the performance metrics, which determine five of the table’s columns. Moreover, the performance metrics are computed according to several fractions of the input data set, which are mentioned in the first column of the table. Let us recall that the data set that is considered for the performance assessment contains 32,000,000 network connections that were analyzed by the intrusion detection system. Furthermore, each connection entity consists of 39 features that are analyzed by the machine learning core of the intrusion detection system. The values of the performance assessment metrics demonstrate that the system scales well with the size of the analyzed data set. Additionally, the system is capable to accurately detect the malicious traffic patterns, while reducing to the minimum the incidence of the false positives. The real-world behaviour of the system is especially important in the case of commercial 5G data networks, which transport and process a large number of concurrent data transfer sessions that have to be analyzed in a proactive manner.
4. The Real World Performance Assessment
The system was deployed on the infrastructure of a major telecommunications services provider. The performance analysis uses the data that was effectively collected during the real-time intrusion detection process considering the provider’s 5G data network. The dataset that was used in order to perform the evaluation contains 32,000,000 processed network connections. Each connection is determined by 39 features that are grouped into three categories. Thus, the system recognizes network connections-based features, content-based features, and data traffic-based features. Furthermore, each data traffic item is marked either as a normal traffic entity, or as a suspicious traffic entity. The latter ones are further grouped into four distinct categories: remote to local, probe, user to root, and denial of service.
The performance assessment is quantified through the following metrics: precision (P), reliability (R), tradeoff (T), accuracy (A), and the false positives rate (FP). The precision is defined as the percentage of valid malicious data traffic predictions relative to the total number of predictions that the intrusion detection system makes. The reliability is calculated as the total number of accurately determined intrusion attempts relative to the total number of intrusions. Furthermore, the tradeoff represents a hybrid performance metric between the precision and the reliability, which has the role to provide a better accuracy of the data classification through the following formula:
The accuracy is a ratio that is calculated as the sum of the number of legitimate packets and malicious packets properly detected at the numerator, while the denominator is represented by the sum of the accurately detected legitimate and malicious packets plus the incorrectly detected legitimate and malicious packets. Additionally, the false positives rate is computed as the number of legitimate packets that are incorrectly classified over the sum between properly classified legitimate packets and incorrectly classified legitimate packets. The values of the performance metrics, which were obtained, are displayed in Table 1.
The data that is displayed in Table 1 refers to the performance metrics that are described, which define five of the table’s columns. Moreover, the performance metrics are computed considering several fractions of the input data set, which are mentioned in the first column of the table. Let us recall that the data set that is considered for the performance assessment contains 32,000,000 network connections that were analyzed by the intrusion detection system, while each connection item consists of 39 features, which are analyzed by the machine learning core of the intrusion detection system.
5. Conclusion
The 5G data networks already sustain the deployment of significant real-world applications. They have the potential to become the backbone of the future always connected human society. Consequently, there are rather difficult design, implementation and deployment problems, which concern all aspects of the 5G networks. Among them, the timely detection of any illegitimate access attempt is essential, especially in the context of a commercial data network. Therefore, our talk presents the state-of-the-art concerning the research that has been made on this very important topic. Furthermore, a real-time intrusion detection system, which is based on the utilization of machine learning techniques, is described. The performance of the system has been tested using real-world data, which has been obtained through the real-time monitoring of the 5G data traffic on the network of a significant Romanian telecommunications services provider. This assessment demonstrates that it is possible to design a software system that blocks most of the illegitimate traffic, which occurs on a high-traffic 5G data network, in a real-time fashion. Moreover, the various existing contributions, which are relevant to the approached topic, are presented in a constructive analytical manner, while the problems that have to be addressed are analyzed, and possible solutions are suggested for their resolution.
Dr. Razvan Bocu, Transilvania University of Brasov, Department of Mathematics and Computer Science, 500091, Romania (razvan.bocu@unitbv.ro). Dr. Bocu is a Research and Teaching Staff Member in the Department of Mathematics and Computer Science, the Transilvania University of Brasov, Romania. He received a B.S. degree in Computer Science from the Transilvania University of Brasov in 2005, a B.S. degree in Sociology from the Transilvania University of Brasov in 2007, an M.S. degree in Computer Science from the Transilvania University of Brasov in 2006, and a Ph.D. degree from the National University of Ireland, Cork, in 2010. He is the author or coauthor of 33 technical papers, together with four books and book chapters. Dr. Bocu is an editorial reviewing board member of seven high-profile technical journals in the field of Information Technology and Biotechnology.
References
- EghamUK.:Thedevelopmentofconnectedthings. https://www.gartner.com/newsroom/id/3598917. Cited 15 Jan 2018
- Meryem Simsek, Adnan Aijaz, Mischa Dohler, Joachim Sachs, Gerhard Fettweis: 5G- Enabled Tactile Internet. IEEE Journal on Selected Areas in Communications, Volume 34, Issue 3, 460–473 (2016)
- GodfreyA.Akpakwu,BrunoJ.Silva,GerhardP.Hancke,AdnanM.Abu-Mahfouz:ASurvey on 5G Networks for the Internet of Things: Communication Technologies and Challenges. IEEE Access, Volume 6, 3619–3647 (2017)
- I.Parvez,A.Rahmati,I.Guvenc,A.I.Sarvat,H.Dai:ASurveyonLowLatencyTowards5G: RAN, Core Network and Caching Solutions. IEEE Communications Surveys and Tutorials, Volume 20, Issue 4, 3098–3130 (2018)
- I.F. Akyildiz, P. Wang, S.C. Lin: SoftAir: a software defined networking architecture for 5G wireless systems. Comput. Netw. 85 (C), 1–18 (2015)
- X.Xia,K.Xu,Y.Wang,Y.Xu:A5G-EnablingTechnology:Benefits,Feasibility,andLimi- tations of In-Band Full-Duplex mMIMO. IEEE Vehicular Technology Magazine, Volume 13, Issue 3, 81–90 (2018)
- M. Chen, Y. Qian, Y. Hao, Y. Li, J. Song: Data-Driven Computing and Caching in 5G Net- works: Architecture and Delay Analysis. IEEE Wireless Communications, Volume 25, Issue 1, 70–75 (2018)
- A.-A.A.Boulogeorgos,etal.:TerahertzTechnologiestoDeliverOpticalNetworkQualityof Experience in Wireless Systems Beyond 5G. IEEE Communications Magazine, Volume 56, Issue 6, 144–151 (2018)
- Bassem Khal, Bechir Hamdaoui, Mohsen Guizani: Extracting and Exploiting Inherent Spar- sity for Ecient IoT Support in 5G: Challenges and Potential Solutions. IEEE Wireless Com- munications, Volume 24, Issue 5, 68–73 (2017)
- Lina Xu, Rem Collier, Gregory M. P. O’Hare: A Survey of Clustering Techniques in WSNs and Consideration of the Challenges of Applying Such to 5G IoT Scenarios. IEEE Internet of Things Journal, Volume 4, Issue 5, 1229–1249 (2017)
- S. Sekander, H. Tabassum, E. Hossain: Multi-Tier Drone Architecture for 5G/B5G Cellular Networks: Challenges, Trends, and Prospects. IEEE Communications Magazine, Volume 56, Issue 3, 96–103 (2018)
- P. Duan, et al: Space-Reserved Cooperative Caching in 5G Heterogeneous Networks for In- dustrial IoT. IEEE Transactions on Industrial Informatics, Volume 14, Issue 6, 2715–2724 (2018)
- MassimoCondoluci,GiuseppeAraniti,ToktamMahmoodi,MischaDohler:EnablingtheIoT Machine Age With 5G: Machine-Type Multicast Services for Innovative Real-Time Applica- tions. IEEE Access, Volume 4, 5555–5569 (2016)
- RicardVilalta,ArturoMayoral,RamonCasellas,RicardoMartinez,ChristosVerikoukis,Raul Munoz: TelcoFog: A Unified Flexible Fog and Cloud Computing Architecture for 5G Net- works. IEEE Communications Magazine, Volume 55, Issue 8, 36–43 (2017)
- MonowarHasan,EkramHossain:RandomAccessforMachine-to-MachineCommunication in LTE Advanced Networks: Issues and Approaches. IEEE Communications Magazine, Volume 51, 86–93 (2013)
- KaiLei,ShangruZhong,FangxingZhu,KuaiXu,HaijunZhang:AnNDNIoTContentDis- tribution Model with Network Coding Enhanced Forwarding Strategy for 5G. IEEE Transac- tions on Industrial Informatics, Volume 14, Issue 6, 2725–2735 (2017)
- Antonio Morgado, Kazi Mohammed Saidul Huq, Shahid Mumtaz, Jonathan Rodriguez: A Survey of 5G Technologies: Regulatory, Standardization and Industrial Perspectives. Digital Communications and Networks (2017)
- M. Ndiaye, G. P. Hancke, and A. M. Abu-Mahfouz: Software Defined Networking for Im- proved Wireless Sensor Network Management: A Survey. Sensors, Volume 17, no. 5, 1–32 (2017)
- F.Gringoli,etal.:PerformanceAssessmentofOpenSoftwarePlatformsfor5GPrototyping. IEEE Wireless Communications, Volume 25, Issue 5, 10–15 (2018)
- M.Palattella,M.Dohler,A.Grieco,etal.:InternetofThingsinthe5GEra:Enablers,Archi- tecture and Business Models. IEEE Journal on Selected Areas in Communications, Volume 34, No. 3 (2016)
- N. Linge, R. Odum, S. Hill, S. Von-Hunerbein, P. Linnebank, A. Sutton, and D. Townend: The impact of atmospheric pressure on the performance of 60GHz point to point links within 5G networks. Loughborough Antennas and Propagation Conference (2018)
- U. Habiba, E. Hossain: Auction Mechanisms for Virtualization in 5G Cellular Networks: Basics, Trends, and Open Challenges. IEEE Communications Surveys and Tutorials, Volume 20, Issue 3, 2264–2293 (2018)
- Razvan Bocu, Cosmin Costache: A Homomorphic Encryption-Based System for Securely Managing Personal Health Metrics Data. IBM Journal of Research and Development, Volume 62, Issue 1, 1:1–1:10 (2018)
- Arun Narayanan, et al.: Key Advances in Pervasive Edge Computing for Industrial Internet of Things in 5G and Beyond. IEEE Access, Volume 8 (2020)
- Jani Suomalainen, Shahriar Shahabuddin, Aarne Mammela, Ijaz Ahmad: Machine Learning Threatens 5G Security. IEEE Access, Volume 8 (2020)