Reference Paper

  • Title: Contextual Counters and Multimodal Deep Learning for Activity-Level Traffic Classification of Mobile Communication Apps during COVID-19 Pandemic
  • Authors: Idio Guarino, Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, Valerio Persico and Antonio Pescapè
  • Journal: Elsevier Computer Networks, Special issue on Machine Learning empowered Computer Networks
  • Year: 2022
  • Paper URL: ScienceDirect

Main Idea

Our idea is to address a Traffic Classification (TC) problem at activity level by exploiting both the (a) intrinsic characteristics of the bidirectional flow (viz. biflow) to be classified and (b) the information related to its context, namely taking into account the traffic related to the biflows that are contextually created by the same app when the user performs a specific activity. This insight comes from the fact that a single application, in order to perform its functions, may communicate at the same time with several network services, and a mix of services is likely to be more indicative of a specific activity of the same app.

Following these considerations, we keep the biflow as the TC object (i.e. the aggregated unit of traffic that will receive the classification label), and we will use the information from simultaneous same-app communications as contextual information that will help discerning the activities of a given application. Specifically, in what follows we refer to the TC object as the reference biflow or BFr. We consider the traffic generated by the same device, identified by its IP address hereafter referred to as device IP, and the same app generating BFr: a filter on the device IP address and a pre-classification stage detecting the app allow to select all biflows potentially related to BFr.

To restrict the contextual information to only communications simultaneous to BFr, and to impose causality (thus enabling online classification), we further restrict considered packets to the biflows that were open during the transmission of the first Np payload-carrying packets of BFr. In summary, naming tistart the arrival time of the SYN packet of BFr, and trNp the arrival time of its Npth payload-carrying packet, the set of packets defining contextual biflows satisfy all the following four conditions: (i) same device IP of BFr; (ii) same app label of BFr; (iii) biflow did not end before tistart; (iv) arrival time of the packet precedes trNp.

Overall Mechanism

For each biflow, the starting time and the current number of bytes/packets transmitted (in both the upstream and downstream directions) are saved. Then, for each reference biflow BFr, at time trNp (arrival of its Npth payload-carrying packet), the packets belonging to its contextual biflows are used to compute nine aggregate metrics to be used as Context Inputs (Context in short). Specifically, these metrics correspond to: (a) the number of contextual biflows (#CF), (b) the amount of transmitted byte/packets (V*/P*) and (c) the bit-/packet- rate (RB*/PR*) in both directions (we use * = u and * = d to denote the upstream and downstream directions, respectively). Regarding the practical feasibility of the definition above, we highlight that the Context Inputs are time-sliced analogous to flow counters kept by routing devices and traffic monitoring middleboxes (NetFlow and IPFIX standard), and can be directly derived from their values sampled at two instants determined by each reference biflow (the start and the arrival of Npth packet)

In the above sketch, for each biflow are highlighted: the arrival time of the first packet (△) (i.e. the SYN packet for TCP biflows), the arrival time of the last packet (▽), and the arrival time of first packet with non-zero payload (▲). Additionally, for the current biflow BFr, in addition to the arrival time of the first packet (◊), the arrival time of the Pth packet with non-zero payload is also highlighted (◆). PAY denotes the first Nb byte of transport-level payload. SEQ denotes header fields extracted from the sequence of the first Np packets. CONTEXT denotes Context Inputs computed at the arrival of the Npth packet of the reference biflow BFr (i.e., the biflow to be classified). tistart refers to the starting time of the generic biflow Bi identified by the quintuple qi. Bu,i/Bd,i and Pu,i/Pd,i denote the total amount of byte and packets, respectively, transmitted/received by the biflow Bi, up to time trNp.

Capitalizing on the appeal of Context Inputs, PAY, SEQ, and CONTEXT are given as input to a Multimodal DL Architecture consisting in three per-modality branches each fed with a different input: (i) a 1D Convolutional Neural Network (1D-CNN) fed with PAY, (ii) a Bidirectional Gated Recurrent Unit (BiGRU) fed with SEQ, and (iii) a Deep Multi-Layer Perceptron (MLP) fed with CONTEXT. Exploiting the multimodal nature of such an architecture, different traffic classification solutions can be designed. For instance, one could trade the PAY input for the new CONTEXT, paying a negligible performance cost to gain both a smaller training time and greater robustness to future more opaque encryption sublayers that could affect payload informativeness.

Implementation

For implementing and testing the Deep Learning architectures used for traffic classification, we exploit the model provided by Keras Python API running on top of TensorFlow 2. Input data are formatted in Parquet and optimally managed via Apache PyArrow. Data pre- and post-processing have been performed mainly by means of Numpy and Pandas libraries. For the evaluation metrics, we use the implementation of Scikit-Learn. Finally, the graphical data representation has been obtained using Matplotlib and Seaborn.