Universita' degli Studi di Napoli ''Federico II'' (Italy)
Dipartimento di Ingegneria Elettrica e delle Tecnologie dell'Informazione (DIETI)

Network Monitoring and Measurements  

Matlab scripts for statistical analysis

This page contains some Matlab™ functions useful for statistical analysis. Most of these scripts have been written with the explicit purpose of analyzing time series extracted from network traffic traces.


Citing

When refering to our tools, please cite the following Reference:

A. Pescapè, A. Dainotti, A. Botta, "On Internet Measurement Research", 1st Newsletter of the IEEE Technical Committee on Communication Systems Integration and Modeling, November 2006

If you are interested in collaborating with us or in opportunities in Traffic, please send an e-mail to Antonio Pescapè




Marginal Distributions Estimation and Analysis

scott.m [w,bins]=scott(c,normal,factor) Calculate bins width and bins vector using the Scott's rule applied to vector 'c'.
normal = 1 : Use Normal rule
normal = 0 : Correct normal rule with a skewness factor
factor : Multiply calculated width by factor.
pdfplot.m [p,w,bins]=pdfplot(c,color,flag) Plot the PDF of distribution c using Scott's rule to choose the bin width, if 'flag' argument is given then the Matlab 'plot' command is used instead of 'bars'.
pdfplotbins.m p=pdfplotbins(c,w,bins,color,flag) Plot the PDF of distribution c with given bins (w,bins are respectively the bin width and the bins' vector), if 'flag' argument is given then the Matlab 'plot' command is used instead of 'bars'.
bivpdf.m bivpdf(seta,wa,setb,wb,bmin,bmax) Plot the bivariate PDF of variables 'seta' and 'setb' by using bins widths 'wa' and 'wb'.
ccdf.m ccdf(empirical,cutmin) Display log-log Complementary CDFs and compute the alfa paramter starting from 'cutmin'.

Entropy

entropy_base.m entropy_base(p,base) Calculate the entropy of a single random variable given its PDF in the vector p and the base of the logarithm.
joint_ent.m joint_ent(Pxy,base) Calculate the joint entropy of a joint PDF "Pxy".
mutual_info.m mutual_info(Px,Py,Pxy,base) Calculate mutual info between 2 probability distributions.
conditional_ent.m conditional_ent(Px,Pxy,base) Calculate conditional entropy of a random variable.
sim_uncert.m sim_uncert(data,bins) Calculate the simmetrical uncertainty of the bivariate dataset data by using the number of bins specified by 'bins'.
umargplot_off.m [newset,entr,um]=umargplot_off(empir,n,w) Perform a dataset reduction. Based on the Offline-Marginal Utility
umarg.m [U]=umarg(Pr1, Pr2) Compute the marginal utility between two distributions

Second Order Analysis

acf_plot.m acf_plot(empirical,directory,variable) Plot the autocorrelation function of the data specified by empirical and save the figure in eps format.

Statistical Fitting

lambdasquare.m [x2,lambda2,dev] = lambdasquare(empirical, analytical, param, bins, mmin, mmax) Estimate lambda-square discrepancy.
param = parameters of the analytical distribution.
bins = custom bins vector or bin width.
min,max:
Use 0,0 to not make any truncations.
Use 1,1 to truncate analytical to match min and max from empirical.
Use different values to truncate both distros at your pleasure.
Returns the Chi-square estimation (x2), the lambda-square estimation (lamdba2), and its standard deviation.
fitting.m fitting(emp_orig,empirical,w,b) Perform Statistical fitting by comparing the empirical distribution given against several analytical distributions by means of the Lambda-square discrepancy measure.
It displays comparisons of the PDFs of each analytical distribution against the empirical one.
It outputs also data formatted to be inserted into a latex table.
bestlambda.m bl = bestlambda(s) Simple script needed by "fitting.m". It compares several lambda-square discrepancy measures taking into account their standard deviations.
workfit.m [lam,dev] = workfit(emp_orig, empirical, w, b, an, param, string); Simple script needed by "fitting.m". It plots the PDFs of an analytical distribution and of the empirical one in one single diagram.
normmix2.m [p, m, s, x, pdf, an] = normmix2(data, iter, flag) Perform statistical fitting using the Expectation Maximization algorithm with a mixture of 2 Gaussian distributions.
INPUT
data: data to be fitted
iter: max number of iterations (0 = default)
flag: OPTIONAL:
0 do nothing
1 plot a comparison of the PDFs

OUTPUT
p: percentages
m: param1
s: param2
x: x for the pdf
pdf: y for the pdf
an: OPTIONAL: generate 100k random numbers following the obtained mixture
normmix3.m [p, m, s, x, pdf, an] = normmix3(data, iter, flag) Same as above, but with a mix of 3 Gaussian distributions.
normmix4.m [p, m, s, x, pdf, an] = normmix4(data, iter, flag) Same as above, but with a mix of 4 Gaussian distributions.
normmix5.m [p, m, s, x, pdf, an] = normmix5(data, iter, flag) Same as above, but with a mix of 5 Gaussian distributions.
tail.m tail(empirical,analytical) Estimate goodness of tail approximation.

Network Traffic Time Series

bitrate.m [x,y] = bitrate(ipt, ps, k, steps, unit, time_unit) Calculate bit or packet rate from the inter-packet time and packet size series.

INPUT
ipt: inter packet times
ps: vector of packet sizes (set to a single value to calculate the packet rate)
k: sampling period of the calculated rate
steps: how many samples to compute
unit:
set unit = 1 to have byte rate
set unit = 8 to have bit rate (unaffected if packet rate is requested)
time_unit:
if ipt are expressed in seconds set to 1
if ipt are exepressed in ms set to 1000
etc..
Note that ps() has one more element than ipt() !

OUTPUT:
x: time ticks
y: corresponding rate for each tick
bitrate2.m [x,y]=bitrate2(time,ps,scale,unit,time_type,time_unit); Calculate bit or packet rate from the inter-packet time (or timestamps of arrivals) and packet size series.

Much faster than bitrate.m
Can work with the timestamps of arrivals too.

Network Traffic Modeling

HMM.zip Hidden Markov Models library This set of scripts implements techniques based on Hidden Markov Models to model, replicate, and predict network traffic. They have been used in our works, e.g. "An HMM Approach to Internet Traffic Modeling"