You are on page 1of 14

Name: Rushi Satani

Enrolment: 19BCE523
Subject: Computer Networks
Subject Code: 2CS502
Topic: End to End Routing Behaviour
in the Internet
Introduction:
When we talk about routing, there has been a lot of research in routing algorithms
but no significant research has been done on how routing act in the internet on a
large scale. The main purpose of this report is to answer the question what kind
of failures occur in routing and what pathologies occur? Do routes change
frequently or they remain stable for a long amount of time? Are the routes same
from A to B and from B to A i.e. are they symmetric in nature? To answer this
question, we have taken into consideration large sample of internet routes which
are spread over different location. More than 40,000 routes have been analysed
with help of 37 websites. We consider that the sample of routes take into
consideration will represent the same scenario occurring internet in general. We
will also analyse how routes change with respect to time, this is done to assess
what changes in the internet routing occurs with respect to time.

Research related to routing:


In the last 20 years a lot of research has been conducted on the problem of routing
traffic in communication networks. since a lot of research has been done on this
topic, a lot of books have been published which precisely focus on different
problems and solution related to routing.

Here we will be making clear distinction between routing protocols and routing
behaviour. Routing protocols are nothing but a procedure for spreading
information related to routing within a particular network this information can
even be useful to forward traffic. Routing behaviour conveys how routing
algorithms perform in practice. It is very crucial to understand the difference
between this two, since routing protocols are frequently studied while on the other
hand routing behaviour is rarely taken up for research. The same thing is reflected
in the literature regarding these topics. When we consider research and literature
available for routing behaviour, majority of the research is available based on
only simulation same is the case with literature related to routing behaviour.
Although few researches have been conducted, but they are almost always
qualitative in nature rather than working on large networks researches have been
done in small scale networks. however a researcher chinoy has dedicated his
research for routing behaviour in large scale.
Chinoy found vast majority of changes in the dynamics of routing information.
Even though no connectivity information has changed, there were routers who
were sending routing information from time to time. Chinoy found that most of
the changes which were occurring in the routing information were happening at
the edges of the network, while the backbone of the network hardly had any
changes in the same. There were situation during outages in which the network
was unreachable. This unreachability was ranging anywhere from a few minutes
to few hours. Most of the networks were dormant, however a few of them were
not.

Chinoy’s focus point was to study how information related to routing


disseminates in the network. However, chinoy is not sure how these dynamics
translates to routing dynamic which is being seen by the terminal user. Chinoy
suggested that the area of “the end to end dynamics of routing information” is
field which deserves further research.

To denote the abstraction of a direct link between two internet hosts at network
level the term virtual path will be used. For e.g. if host A wishes to have a network
level connection with host B, we will denote it by the notation of the virtual path
from A to B as A=>B.

At any point in time, the virtual path at network layer consist of a single route,
which is nothing more but a sequence of routers. Along this sequence of routers,
packets are going to be sent. the virtual path may change from time to time or
they might remain quite stable.

Thus, the area of research suggested by chinoy is that given two host, how will
the virtual path act? This is the principle question whose answer we are trying to
find out through this study.

Routing in internet:
The internet has been divided into a disjoint set of autonomous system for the
purpose of routing. A collection of routers and hosts which are unified into one
by running a single IGP (Internet Gate protocol) was the original ideal for
autonomous system. However, with evolution in time, the idea has developed to
be essentially compatible with the administrative domain, wherein the hosts and
routers are merged by a set of IGP’s and a single administrative authority. If we
want to achieve the highest level of internet interconnection, it is provided by
routing between autonomous systems. All the significant autonomous systems
use BGP which is currently in its 4th version. Arbitrary interconnection topologies
between autonomous Systems are allowed by BGP. A procedure for preventing
routing loops, between autonomous systems, is also provided by BGP. The
stability of inter-autonomous systems routing is the key to whether the using BGP
will scale to a very large extent. The phenomena, in which the routes between
autonomous system vary frequently, is known as flapping. If flapping is there
then a lot of time will be spent on updating the routing table by BGP also time
will be spent on spreading the changes in routing information.

Since autonomous systems are large entities which are capable of crucial internal
instabilities it is very significant to note that having a stable inter-Autonomous
System routing does not imply stable end-to end routing.

Methodology:
In this section we will be explaining what is the procedure used for our study.

Experimental apparatus:

A number of internet sites have been recruited to conduct our experiments. The
list of sites has been given in a table in the next section. Several networking sites
were recruited so that we can run NPD (Network probe daemon) on them. NPD
is an entity which provides several measurement services related to networking.
A control program “npd_control” contacted these NPD’s from time to time,
which were running on local workstation. Then using traceroute they were asked
to measure the distance to other NPD.

With a mean interval of 1-2 days each virtual path between two of the NPD sites
were measured. This first set of measurement was termed as D1. Now 60% and
40% with a mean inter-measurement interval of 2hours and 2.75 days
respectively was the second set of measurements which is termed as D2. In order
to make each NDP a traceroute measurement every two hours on average the D1
interval was chosen. Once we start adding more sites to the experiment, the NDP
rate of particular remote NDP sites measurements was decreased. This was done
to maintain the average load of one measurement per two hours, which finally
took us to the 1-2 days mean measurement. After looking through the data of D1,
we came to notice that having large sampling interval will prohibit us from
solving a lot of questions regarding stability of routing. Thus learning from the
lesson in D1, we used a new strategy in which we make measurements between
pairs of NPD sites in bursts which will have a mean interval of two hours among
the measurement in each burst. Since we needed data to assess routing stability
over long periods of time, we decided to go on with lower frequency
measurements between pair sites. In order for 50% to come in bursts and 50 in
widely spaced we arranged the measurements accordingly. we had measurements
from traceroute which was obtained from TCP study conducted using NPD
framework. Apparently even these were made with a two hours apart time interval
on average, hence when we included them the measurements tilted towards 40%
widely spaced and 60% bursts.

We had also paired the bulk measurements of D2, this meant that first we will
measure the virtual path A=>B and then straight away measured virtual path
B=>A.

Exponential Sampling:

In order for the time intervals between consecutive measurements of the


matching virtual path were independent and exponentially distributed, the
measurements were devised. We gained two related and important properties
by doing that. The first property that we found out is that additive random
sampling is being correspond to by the measurements. Here we need to note
that this type of sampling is not biased as it projects all instantaneous signal
vales with equal probability. Another property gained by this is that the
measurement times shapes a Poisson process. Well this indicates that the PASTA
principle by Wolff, which corresponds to “Poisson Arrivals See Time Acreages”,
can be applied to our measurements. Now talking about the prerequisites of the
said theorem is that the process which is being observed cannot await
observation arrivals. This requirement is failed by our measurement. Here we
need to note that when there is no network between the site potentially
conducting a traceroute and the site which is running “npd_control” it can be
forecasted by the network that there will be no measurement. This is the way in
which network can await arrivals even though the observation come distributed
exponentially. By doing so we are underestimating the generality of the network
connectivity problems.

Which observations are representative?

It is well known from the introduction that 37 internet hosts had participated in
the study of routing. This is very small fraction of the actual 66 lakhs internet
hosts. This numbers are estimated in the latter half of 1995. So from this we can
say that the behaviour which is observed by us cannot represent the actual
scenario. Also, these hosts belong to 34 different stub networks which again is
a very small fraction of the actual half a lakh stub networks. However, we also
justify that the 37 internet hosts do actually represent the internet, since a non-
negligible fragment of the autonomous system is included which together
incorporate the internet.

Confidence Interval:

Often in our study we will seek to assign some kind of confidence interval to the
probability available to analyse our data.

Suppose that out of a representative sample of n observations we find that a


i i i i i i i i i i i i i

subset of size k exhibits some property P. We might then estimate the


i i i i i i i i i i i i i

unconditional probability p of observing P as ^p = k=n. But the value of ^p is not


i i i i i i i i i i i i i i i i i

of much use unless we also have an idea of its possible error. For example, if,
i i i i i i i i i i i i i i i i

out of 2 observations, 1 of them exhibits P, we would not feel too confident


i i i i i i i i i i i i i i i

declaring that p ≈ 1/2.


i i i i i

To address this problem, we need to associate a confidence interval with ^p,


i i i i i i i i i i i i

the interval being a range of values that, with high confidence, includes p. In [,
i i i i i i i i i i i i i i i

we develop tight bounds on the interval in which p must lie to be consistent,


i i i i i i i i i i i i i i i

with confidence c, with observing k independent instances of P in n


i i i i i i i i i i i i

measurements. We find that pl, the lower range of p, is given by:


i i i i i i i i i i i i i

where: v1 = 2(n-k + 1) and v2 = 2k, and QF(v1;v2)(1-c) is the 1 - c quantile of the


well-known F variance-ratio distribution with parameters v1 and v2. The upper
bound, pu , has a similar form.

Now we shall also look at the problem which is of comparing confidence


intervals. Now let us suppose that we have two different datasets, D1 and D2,
in which K1 instances of P out of n1 independent measurements for D1, and k2
out of n2 for D2. If we then let c denote the confidence, we wish to associate
with an observation that the two datasets are significantly different (i.e., c
represents the probability that an evident difference is not just because of the
chance), then it is shown that the confidence intervals for D1 and D2 should be
computed using c’= 1-2 √(1 − 𝑐). The prevalence of P in D1 is divergent than in
D2, which has a confidence of c. This is only possible if the said intervals do not
overlap.

We have been and will be using 95% confidence interval, correlating to c=0.95
and c’≈0.553.

The raw routing data:


Sites participating in the study:
The routing experiment were conducted from November 1994 to the end of the
year. More than 7000 traceroutes were attempted across 27 sites. We have
referred to these measurements as D1. Another experiment was conducted
from November 1995 to the end of the year, which is referred as D2. In the
second experiment more than 37000 traceroutes were attempted between 33
sites. The sites which have participated in our study are given in the above table.

Failures in measurement:

In the two experiments conducted the traceroutes failed for about 5-8% of the
time. We were simply not able to contact the remote NPD. Because of the
inability of the npd_control to contact the remote NPD these failures were
caused. These failures to contact the remote NPD results into losing a chance for
observing of lack of connectivity. This will escort to a prejudice towards
underestimating the internet connectivity failures.

While conducting D2 measurements, it was somehow rectified for the


underestimation by pairing each measurement of the virtual path A=>B with a
measurement of virtual path B=>A. This increased the chances of observing such
failures in future. Even though the traceroutes failed for 5% of the time in D2
measurement.

Routing Pathologies:
First, we classify the occurrences of routing pathologies into the routes with
clear performance, inferior performance, out and out broken behaviour.

Routing Loops:

Here we will talk over the pathology of a routing loop. For that we distinguish
them into three types. Forwarding loop; a loop in which the packets return to
the router which were forwarded by itself. Information loop; based on the
information provided to router, it acts on the connectivity. Traceroute loop; in
this loop the measurements done by traceroute gives the same order of routers.

Normally, the routing algorithms try to avoid forwarding loop. Thus lop will be
formed when there is a change in the connectivity in network. This change is not
instantly reflected in the routers. Since forwarding loops represent connectivity
failure, it is important that forwarding loops are resolved by itself as soon as
possible.

For the purpose of our analysis, any traceroute which shows that a loop is
unresolved is considered as persistent loop. In our study 10 traceroutes showed
persistent loops in D1.

Similarly, there were 50 persistent loops in D2. Upper bound on how long the
loops preserved can be placed. This can be done by observing for neighbouring
measurement among the same host which has no loop. Sometimes the
neighbouring measurements show the loop which can allow us to assign upper
bound. The below table shows persistent routing loops in D2.

Erroneous Routing:
An example of erroneous routing was found in D1, in which wrong path was take
by the packets. This erroneous routing was connix=> ucl route in which a trans-
Atlantic route was to Rehvohot, Israel instead of London. However, no such
erroneous routing was found in D2.

This makes us believe that you cannot make assumption about where your
packet goes to in the internet.

Connectivity altered mid-stream:

In the 10 of the traces of D1 it was observed that the routing connectivity which
was earlier reported in traceroute was later nowhere to be found, maybe
altered. This indicated a failure in routing. A few of these failures were because
of outages. During the outages the intermediate routers were updating the
information of the view of current topology which led them to drop the packets,
maybe because it didn’t know how to forward them. We have noticed a wide
range in recovery time, some take less time around 100 msec. however, others
took about a minute or so to resolve. The recovery taking more time creates
problem for application requiring real time support.

It is interesting to note that we found 155 such instances in D2 as compared to


D1.

Fluttering:

The term referred to rapidly-oscillating routing is fluttering. In the below figure


we have shown the possible effects of the same.
Infrastructure failure:

Apart from persistent routing loops leading to traceroute failures and erroneous
routing, 125 traceroutes from D1 and 617 f tracerouted from D2 didn’t reach
the expected destination for some reasons. We term these reasons as
infrastructure failure, where in a route stops working in the middle of the
network.

Summary of pathologies:
The above table gives a summary of pathologies.

End to end routing stability:


The internet architecture does not want to have large scale architecture. A
number of features of networking which routing stability can affect. Some of
them are the extent to which the features of network paths are predictable; the
extent to which past observations can be used, to learn about network
conditions, by a connection. And the extent to which we can determine that
same paths are being observed by the measurements, based on repeated
measurements.

Definitions of stability:

Routing stability has two different definitions. first one is “given that a route r is
observed now, what is the possibility that the same route r is going to observed
again in future”. This idea of stability is referred as prevalence.

The second one is that “given that a route r is observed a time t, till how much
time that route will remain same” this idea of stability is known as persistence.
This impacts how effectively the route state will manage.

Reducing the data:

Here our analysis is confined to just D2 measurements, since they were made at
a wide range of intervals. Hence, we can tackle persistence ambiguity and over
many time scales assess stability. Out of the 35000 D2 measurements these
pathologies were omitted and also for the one for which traceroutes hops were
missing were omitted. Atlast, we were left with 31,709 measurements.

i assess the differences in ^πdom between the sites in our study. To do so, for
i i i i i i i i i i i i i i

i each site s (and for each granularity) we computed:


i i i i i i i i
Routing persistence:

Now the more difficult task is to determine the persistence of the routes. How
likely are they going to live through before being changed. routing persistence

We need to determine whether routing changes on short time scales first so that
we can accurately analyse the requirements of persistence. If the routes does
not change in short time span then we can rely on short measurements which
observes the same route. This can also be used to analyse if the route changes
on medium time scales.

Medium scale route alternation:

After doing research we found that except for a few sites changes in routes does
not occur in less than one hour. So, we will now assess the measurements which
are done an hour or less so that we can determine whether it is same about
medium scale routing persistence. For measurements which are made for less
than hour, Let Phrsrc s and Phrdst s be the analogs of P10src s and P10dst s.

Now to analyse we have 7,287 pairs of measurements since we eliminated the


pairs with rapidly oscillating virtual paths. We also have 1517 triple observation
which span for an hour or less than that. However, out of this 10 did have the
pattern of R1;R2;R3 which basically indicates that two observations which are
an hour apart will have lesser chances of having a change in routing.

Summary:

Here the report is being made after doing an analysis of more than 40,00
internet routes. This analysis was conducted between various internet sites.
This research differentiates pathological routing situations, routing symmetry
and routing stability.
A continuous subject which is going through our research is we have a lot of
variation. We have time and again seen that various sites of group of sites have
encountered separate routing characteristics. This characterizes that the
difference in internet traffic statistics among sites are critical to the point that
we haven’t found any typical internet site. Also, no such internet path has been
found. However, the extent of our findings has given us a pretty good insights in
the breadth of the behaviour and also how it works from an end point of view.

You might also like