7 views

Uploaded by seventhsensegroup

A new approach is proposed to predict the fractal behavior of a distributed network traffic, in which a random scaling fractal model is used to simulate the self-affine characteristics ofa network traffic.A study of the network traffic is done by sniffing a portion of it using Wireshark. The sniffed traffic is inspected and dissected using filter option, for each differentprotocols. The fractal behavior of the traffic are sniffed and examined by an open-source network analyzer. Later, the packet records that were sniffed are exported to NeuroSolutions builder,SPSS andthen examined. Further, the exported and dissected traffic data is fed as input to train the neural network to let it predict the resultant fractal behavior of the distributed network traffic and an equation is proposed to derive the ultimate close network traffic prediction in SPSS.

save

You are on page 1of 6

**ISSN: 2231-2803 http://www.ijcttjournal.org Page 2452
**

A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation

Ms.Sharmi .S

1

Dr.M.Punithavalli

Research Scholar, Director,

MS University,Thirunelvelli SREC,Coimbatore.

Abstract: A new approach is proposed to predict the fractal behavior of a distributed network traffic, in which a

random scaling fractal model is used to simulate the self-affine characteristics ofa network traffic.A study of the

network traffic is done by sniffing a portion of it using Wireshark. The sniffed traffic is inspected and dissected

using filter option, for each differentprotocols. The fractal behavior of the traffic are sniffed and examined by an

open-source network analyzer. Later, the packet records that were sniffed are exported to NeuroSolutions

builder,SPSS andthen examined. Further, the exported and dissected traffic data is fed as input to train the neural

network to let it predict the resultant fractal behavior of the distributed network traffic and an equation is proposed

to derive the ultimate close network traffic prediction in SPSS.

Keywords: fractal behavior, sniffing, predict, SPSS, NeuroSolution builder, NeuroXL predictor.

I INTRODUCTION

For the examination of local problems in a

small network, monitoring at a single

observation point is sufficient to train the

network builder. For such cases, a network

analyzer may be used which can be a

machine running Wireshark and is directly

connected to a network segment or the

monitoring port of a switch or a router. In

larger networks, it is often necessary to

perform simultaneous monitoring at multiple

observation points to train the constructed

neural network in a more efficient manner.

In this research a Neural Network(Multi-

layer Perceptron)is proposed to be used to

predict the dependent variable values over

different independent variable value

distributions using two specific modeling

tools, viz., SPSS and NeuroSolutions. One

objective of this is to find the effect of the

dependent variable values distributions in

the dataset using different modeling tools on

the Neural Network prediction performance.

A second objective is to compare the

performance of the two modeling tools in

the predictionof the dependent variable

values.

Analyzing packet records with wireshark

Wireshark [1], formerly known as Ethereal

is probably the most popular open-source

network analyzer tool. For the experiments,

we configured Wireshark on our machine to

capture network packets. The data collected

is exported in Comma Separated Value

(.csv) format.

Wireshark can be divided into four main

modules: Capture Core, WireTap, Protocol

Interpreter and Dissector. Capture Core uses

the common library WinPcap to capture data

from different network (Ethernet, Ring,

etc.); once the data is obtained, WireTap is

used to save it as a binary file; since the data

is in binary, without the Protocol Interpreter

and Dissector, user cannot understand the

data. Dissector can be available in a built-in

or a plug-in mode. The proposed approach

allows profiting from Wireshark's extensive

packet inspection facility and protocol

dissection capabilities for distributed

network analysis.

Neuro solutions

The NeuralBuilder helps to construct the

neural network by selecting parameters. The

four currently available problem types in the

NeuralExpert are Classification, Prediction,

Function Approximation, and Clustering.

Later, a parameter list is selected to train the

neural network and the desired traffic is

output to train the network.

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8– August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2453

Figure 1. Flow diagram to deploy traffic prediction using ANN.

An ANN is a computational method

motivated by biological models. ANNs

attempt to mimic the fundamental operation

of the human brain and can be used to solve

a broad variety of problems [10]. One of the

most important features of ANNs is that it

can discover hidden patterns from data sets

[11], and solve complex problems especially

when a mathematical model does not exist

(or when the model is not suitable for the

case at hand). Furthermore, ANNs are

commonly immune to noise and

irregularities present in the data [12, 13].

ANN learning is typically based on two data

sets: the training set and the validation set.

The training set is used on a new artificial

neural network, as its name indicates, for

training. The validation set is used after the

neural network has been trained to assess its

performance. The validation set in most case

is similar to the training set but not same

[14, 15 ].

Data mapping

In artificial intelligence, a desired output is

commonly known as the target. For the

specific case of ANNs, the target is used for

network training [9]. ANNs can map a given

input to a desired output; when an ANN is

used for this purpose, the ANN is typically

called a mapping ANN. The network is

trained by applying the desired input to the

ANN, and then monitoring the actual ANN

output. The difference between the actual

ANN output and the desired output is

normally used to manage the learning

process. During the process of training, the

learning algorithm attempts to reduce the

error measured between the actual network

output and the targetin the training set [9,

11]. The training process may be time

consuming, but when the process has been

successfully completed, an ANN canquickly

calculate its output once the input data has

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8– August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2454

Start

Translate the

network traffic

data parameters.

Train the NN’s

architecture for N

number of epochs.

Evaluate the

performance

Criteria

Satisfy

Extract a new traffic

dataset dissected

Perform Prediction-

Original expected

traffic

Dissect the network

traffic dataset and

enlist the

Step: 1

Step : 2

Step :3

Step : 4

Step : 5

Stop

Y

N

Figure 2: Flow diagram of ANN using NeuroSolutions:

been applied to the network input.

Data classification

Data classification or just classification is

the process of identifying an object from a

set of possible outcomes [9, 12]. An ANN

can be trained to identify and classify any

kind of objects. These objects can be

numbers, images, sounds, signals, etc. An

ANN used for this purpose is also known as

a classifier.

Figure 3. Training fractal-dataset graph

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8– August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2455

The traffic data is trained initially with a

network traffic-dataset that had been

downloaded from wireshark sample captures

as a pcap file and the data is exported to

network builder for prediction. The

predicted fractal behavior on the traffic data

set is shown in table 1.

II INVESTIGATION OF

CORRELATION COEFFICIENT

VALUE

On investigating the effect of dependent

variable values and the distribution on the

prediction accuracy rate. The results of the

analyses lets us to find the effect of the

dependent variable values distribution on

prediction accuracy that exploits and leads

us generating an equation that would predict

the expected traffic based on the

independent variable-values distribution

using the modeling tool SPSS.

Correlation Coefficient, R, is a measure of

the strength of the association between the

independent (explanatory) variables and the

dependent (prediction) variable.R is never a

negative value. This can be seen from the

formula below, since the square root of this

value indicates the positive root[2,3].

Formula for R,Formula for two independent

variables, X1 and

X2

The coefficient of multiple correlation

estimates the combined influence of two or

more variables on the observed (dependent)

variable. To analyse the traffic data using

multiple regression, part of the process

involves the following assumptions to be

verified[8].

The dependent variable is measured

on a continuous scale.

Two or more independent variables,

are continuous or categorical.

Observatios should be recorded.

Linear relationship exists between

the dependent variable and each of

the independent variables.

Traffic data shows homoscedasticity,

which is where the variances along

the line of best-fit remain similar as

one move along the line.

The data does not

show multicollinearity, which occurs

when two or more independent

variables are highly correlated.

There are no significant

outliers, high leverage

points or highly influential points.

Residuals (errors) are approximately

normally distributed.

The above listed assumptions are not

violated and henceforth the Multiple

Correlation Coefficient, R, is computed to

measure the strength of the association

between the independent (explanatory)

variables and a single dependent (prediction)

variable.

Multiple Regression-booster prediction

phases:

In MR-Booster, by using each feature of the

association existing between the actual

traffic and the dissected traffic explicitly

helped to generate the prediction equation

and the standard error factor when probed in

further boosts a better way to refine the

regression equation that predicts the

network traffic. The correlation structure of

traffic is finally generated in a much easier

way.

Phase 1:

a. The sniffed traffic data is plotted as a

scatter plot graph to visualize if there is a

possible linear relationship.

b. Calculate and interpret the linear

correlation coefficient, using the data sets.

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8– August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2456

Phase 2:

c. Determine all possible regression equation

for the data by refining it further by

adjusting the constant standard error from it.

d. Select and apply the best generated

regression equation and forecast.

Phase 3:

e. Identify outliers and note the

observations.

f. Process and interpret the performance of,

R-booster prediction.

Table 1.Descriptive Statistics(SPSS)

Mean Std. Deviation N

Actual-Traffic .84 1.756 2581

Traffic-n1 .77 1.656 2581

Traffic-n2 .01 .139 2581

Traffic-n3 .60 1.308 2581

Table 2.Correlation Coefficientsa (a-dependent actual traffic-graph)

Model

Unstandardized

Coefficients

Standardized

Coefficients

T Sig. R Std. Error Beta

1 (Constant) .013 .005 2.711 .007

Network1(n1) .880 .007 .830 133.561 .000

Network2(n2) 1.047 .032 .083 32.668 .000

Network3(n3) .229 .008 .170 27.395 .000

The equation generated to predict the actual

traffic that could be generated for the

following dissected protocol-traffic.

Predicted traffic(w.r.t time slice)=n1 *(R(

n1)– standard Error-n1) + n2 *(R(n2) –

standard Error) +n3 * (R(n3) – standard

Error) +(R-constant – standard Error)

Predicted-traffic=Traffic-n1*0.873+Traffic-

n2*1.015+Traffic-n3*0.221+0.008

R value of traffic from n1 and n2

have a strong association with the actual

traffic, where as traffic from n3 has a weak

association is shown in table 3.

Table 3.R value strength.

R value Interpretation

0.9 strong association

0.5 moderate association

0.25 weak association

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8– August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2457

Figure 4. Actual-traffic vs Predicted traffic(NeuroSolutions) and Computed-traffic(SPSS)

The figure 4, shows that the traffic

computed using the generated equation is

very close to the actual-target-traffic.

III PERFORMANCE EVALUATION

The overall performance of the

analyzed prediction methods are stated here

to estimate the prediction accuracy.

Coefficient Efficiency(E) is one such

estimation method that measures the

performance and reveals the efficiency rate.

The efficiency coefficient can take values in

the domain (−∞, 1]. If E = 1, we have a

perfect fit between the observed and the

forecasted data. A value of E =0 occurs

when the prediction corresponds to

estimating the mean of the actual values. An

efficiency less than zero, i.e. −∞ < E < 0,

indicates that the average of the actual

values is a better predictor than the analyzed

forecasting method. The closer E is to 1, the

more accurate the prediction is as the

coefficient efficiency stays at 0.9 for the

forecasted traffic

IV CONCLUSION

The experimental results demonstrate that 1)

the regression model is more effective for

traffic prediction; and 2) both the proposed

prediction equation and standard error based

R(correlation coefficient) update scheme

are effective to predict the traffic in a easier

way.The goal of the experiments is to

evaluate and to compare the performance of

the ANN prediction approaches presented

earlier in this paper. Hence, the linear

regression model offers is a powerful tool

for analyzing the association between one or

more independent variables and a single

dependent variable. Some novice

researchers wish to move quickly beyond

this model and learn to use more

sophisticated models because they get

discouraged about its limitations and believe

that other regression models are more

appropriate for their analysis needs.

References

[1]Wireshark Homepage, http:// www. wireshark.org ,

2008.

[2] ClearSight Networks, Inc. Homepage,

http://www.clearsightnet.com,2008.

[3]https://statistics.laerd.com/spss-tutorials /multiple-

regression-using-spss-statistics. php

[4]http://en.wikipedia.org/wiki/Multiple _ correlation

[5]http://www.yeatts.us/6200-Multivariate

%20Stats/Lectures-tests/Test%202/Week-12-

assumptions.pdf

[6] WildPackets, Inc. Homepage, http: //www.

wildpackets.com, 2008.

[7] S. Waldbusser, “Remote Network Monitoring

Management InformationBase,” RFC 2819 (Standard),

May 2000.

[8] T. Masters, Practical Neural Network Recipes in C++.

Preparing Input Data (C-16), Academic Press, Inc., pp.253-

270, (1993).

[9] S. J. Russel and P. Norvig, Artificial Intelligence: A

Modern Approach.Prentice-Hall of India, Second

Edition.Statistical Learning Methods (C-20), pp. 712-762,

(2006).

[10] T. Masters, Neural, Novel & Hybrid Algorithms for

Time Series Prediction. Neural Network Tools (C-10),

J ohn Wiley & Sons Inc., pp. 367-374, (1995).

[11] T. Masters, Signal and Image Processing With Neural

Networks. Data Preparation for Neural Networks (C-3),

J ohn Wiley & Sons Inc., pp. 61-80, (1994).

[12] T. Masters, Advanced Algorithms for Neural

Networks. Assessing Generalization Ability (C-9), John

Wiley & Sons Inc., pp. 335-380, (1995).

[13] R. D. Reed and R. J. Marks II, Neural Smithing:

Supervised Learning in Feedforward Artificial Neural

Networks. Factors Influencing Generalization (C-14), The

MIT Press, pp. 239-256, (1999).

[14]http://en.wikipedia.org/wiki/Neural _Lab

- Nonlinear Regression AnalysisUploaded bySameh Ahmed
- Data P.ilyasUploaded byIndra Pratama
- Statistical Analysis My PptUploaded byYasmin Sharma
- Correlation and RegressionUploaded byNjuh Polki
- 18-Plant Breed TechUploaded byLieketso
- Econometric sUploaded bypriom_du
- Fin TypeUploaded byAnonymous mSOqmvCd
- 61 NotationUploaded byMario Sajulga Dela Cuadra
- Experiment 10 (1)Uploaded byNeerajBoora
- chapter 08 part 3Uploaded byapi-232613595
- A577CD017017.pdfUploaded byjnf
- Television and Divorce - Evidence from Brazilian Novelas - Alberto Chong.pdfUploaded byDaniel Pereira Volpato
- {PDF}Bus644 Ch06 Odd Problem SolutionsUploaded byVidanko
- Materi Correlational ResearchUploaded byIbnu Sina
- ch13_F06Uploaded byRavi Rathod
- Methods Seminar Chon 2004-04-14 LKUploaded bydullah_t2kg88
- MULTIPLE REGESSION ANALYSIS : DETERMINANTS OF RETURN ON ASSETS AND IMPLICATIONS ON STOCK PRICE CHANGES LEVEL ENDRI ENDRI, YANI RIYANI, KARTAWATI MARDIAH, LINDA SUHERMA AUploaded byGSA publish
- Text List of MiniTab CommandsUploaded byFaizargar
- AP Stats ChaptersUploaded byNatural Spring Water
- lUploaded byRajesh Rao B Raj
- NetworksUploaded byDemian Pereira
- AhmedUploaded byStacy Johnson
- Marketing of Services - Challenges in the Era of Globalisation Chandan LbsitUploaded byChandan Parsad
- Reading Students CasinoUploaded byaconnel
- Research MethodolgyUploaded byFareeha Khan
- 23Uploaded byUdien Ndut
- Analysing Repeated Measures Data in Cognitive Research a Comment on Regression Coefficient AnalysesUploaded bylauraiacomi
- Math (Regression Theory)Uploaded byAlina Borisenko
- 6941_77388_Bartov, E., Faurel, L., & Mohanram, P. S. (2016). Can Twitter Help Predict Firm-level Earnings and Stock ReturnsUploaded byLulut Khusnawati
- Leadership affects on employeeUploaded bymhussain480

- Extended Kalman Filter based State Estimation of Wind TurbineUploaded byseventhsensegroup
- Color Constancy for Light SourcesUploaded byseventhsensegroup
- Fabrication Of High Speed Indication And Automatic Pneumatic Braking SystemUploaded byseventhsensegroup
- Implementation of Single Stage Three Level Power Factor Correction AC-DC Converter with Phase Shift ModulationUploaded byseventhsensegroup
- A Multi-Level Storage Tank Gauging And Monitoring System Using A Nanosecond PulseUploaded byseventhsensegroup
- An Efficient Model Of Detection And Filtering Technique Over Malicious And Spam E-MailsUploaded byseventhsensegroup
- Experimental Investigation On Performance, Combustion Characteristics Of Diesel Engine By Using Cotton Seed OilUploaded byseventhsensegroup
- Comparison of the Regression Equations in Different Places using Total StationUploaded byseventhsensegroup
- Comparison Of The Effects Of Monochloramine And Glutaraldehyde (Biocides) Against Biofilm Microorganisms In Produced WaterUploaded byseventhsensegroup
- Optimal Search Results Over Cloud with a Novel Ranking ApproachUploaded byseventhsensegroup
- The Utilization Of Underbalanced Drilling Technology May Minimize Tight Gas Reservoir Formation Damage: A Review StudyUploaded byseventhsensegroup
- A Simple Method For Operating The Three-Phase Induction Motor On Single Phase Supply (For Wye Connection Standard)Uploaded byseventhsensegroup
- Application of Sparse Matrix Converter for Microturbine-Permanent Magnet Synchronous Generator output Voltage Quality EnhancementUploaded byseventhsensegroup
- Design, Development And Performance Evaluation Of Solar Dryer With Mirror Booster For Red Chilli (Capsicum Annum)Uploaded byseventhsensegroup
- FPGA Based Design and Implementation of Image Edge Detection Using Xilinx System GeneratorUploaded byseventhsensegroup
- Key Drivers For Building Quality In Design PhaseUploaded byseventhsensegroup
- An Efficient Expert System For Diabetes By Naïve Bayesian ClassifierUploaded byseventhsensegroup
- Separation Of , , & Activities In EEG To Measure The Depth Of Sleep And Mental StatusUploaded byseventhsensegroup
- Performance And Emissions Characteristics Of Diesel Engine Fuelled With Rice Bran OilUploaded byseventhsensegroup
- Non-Linear Static Analysis of Multi-Storied BuildingUploaded byseventhsensegroup
- Design And Implementation Of Height Adjustable Sine (Has) Window-Based Fir Filter For Removing Powerline Noise In ECG SignalUploaded byseventhsensegroup
- High Speed Architecture Design Of Viterbi Decoder Using Verilog HDLUploaded byseventhsensegroup
- An Efficient And Empirical Model Of Distributed ClusteringUploaded byseventhsensegroup
- Review On Different Types Of Router Architecture And Flow ControlUploaded byseventhsensegroup
- Study On Fly Ash Based Geo-Polymer Concrete Using AdmixturesUploaded byseventhsensegroup
- A Comparative Study Of Impulse Noise Reduction In Digital Images For Classical And Fuzzy FiltersUploaded byseventhsensegroup
- Free Vibration Characteristics of Edge Cracked Functionally Graded Beams by Using Finite Element MethodUploaded byseventhsensegroup
- A Review On Energy Efficient Secure Routing For Data Aggregation In Wireless Sensor NetworksUploaded byseventhsensegroup
- An Efficient Encrypted Data Searching Over Out Sourced DataUploaded byseventhsensegroup
- Effect Of Location Of Lateral Force Resisting System On Seismic Behaviour Of RC BuildingUploaded byseventhsensegroup