You are on page 1of 32

GUIDE

The Complete Guide to Traceroutes:

The Network Troubleshooting


Tool For IT Pros
TABLE OF CONTENTS

1.1. What Are Traceroutes? 4

1.2. How Traceroutes Work 5


What if the router doesn’t respond? 6

1.3. Common network problems 7

2.1. How To Identify Network Issues with Traceroutes 8


Traceroute Performance Metrics 8
How to analyze packet loss 9
In figure B, is the latency good or bad? 9
Analyzing a bad traceroute 10

2.2. Why Do Some Routers Drop Packets or Have High Latencies? 11


ICMP TTL Exceeded Rate Limiting 11
How to detect rate limiting 12
Impact of the Router CPU 12
Impact of firewalls on traceroutes 12

2.3. Decode the Hidden Information from Traceroute DNS 13


Theoretical example 13
The 100km / 1ms rule of thumb 16
Why is it useful? 16

2.4. Internet Traffic is Asymmetrical - How to Catch Reverse Path Issues? 17


Congestion in the Reverse Path 18
Congestion at the Dallas Interconnection 19

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 2
2.5. How to Share a traceroute with an ISP NOC? 21
Where is the problem? 21
Who should I contact first? 21
How to replicate the issue? 22

2.6. Impact of Load Balancing and Multiple Paths on Traceroutes 23


Multiple Path Technologies 23
Link Aggregation 23
Equal Cost Multi Path (ECMP) 23
Load balancing across multiple interfaces 23
Per-packet Load Balancing 23
Per-flow Load Balancing 23
Multiple Paths, Load Balancing and Traceroutes 24
Types of packets used for traceroutes 25

2.7. MPLS Networks, TTL Propagation and ICMP Tunneling 26


ICMP Tunneling 26
TTL Propagation 28

3. Obkio's Live traceroute feature 30


How Live Traceroutes Work 30
Live Traceroute Sharing 30
Traceroute From Both Directions 30
Traceroute Settings 31

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 3
1.1 WH AT ARE TRACEROUTES?

Let’s start the complete guide to traceroutes with an introduction to traceroutes,


including what they do, when they were created, and how they became to be the
most popular network troubleshooting tool for network engineers.

Traceroutes are the most popular tool that network route from a source to a destination inside an IP
engineers use to troubleshoot network perfor- network. It shows users data from routers, as well
mance issues like choppy VoIP Quality, jerky video as the round-trip latency from the source to each
calls, and network and application slowness. of the routers.

Using Traceroutes, users can troubleshoot network Traceroute commands are available on almost any
performance issues faster by pinpointing the loca- host. On Windows, there is the tracert.exe com-
tion of network performance issues, and sharing mand and on Linux and MacOS there is the tracer-
the results within their team and third-parties such oute command. There are other free and commer-
as IT consultants or service providers, to trouble- cial software that do traceroutes such as the Obkio
shoot quick resolutions. Agent. Figure A below is an example from Obkio
Live Traceroute feature.
In the ultimate guide to traceroutes, learn about
what traceroutes are, how they work, and how you It’s important to understand that traceroutes will
can utilize them to troubleshoot network issues, only trace Layer 3 IP Routers or Hosts. If there is
and better understand network performance data. a switch or wifi access point between two routers,
a traceroute will not show them even if they have
Traceroutes were first invented in 1987 and are a management IP to access them. A switch with
still highly relevant today. As its name suggests, Layer 3 / IP routing features will appear only if it is
the main purpose of a traceroute is to trace the IP routing the packets.

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1 | 0.0 | 20 | 4.3 | 1.5 | 0.4 | 4.3 |
| 2 | router1.ispA.com | 0.0 | 20 | 6.8 | 15.4 | 6.8 | 35.9 |
| 3 | router2.ispB.com | 0.0 | 20 | 12.3 | 13.7 | 8.4 | 28.1 |
| 4 | router3.ispC.com | 0.0 | 20 | 11.3 | 13.8 | 9.0 | 38.4 |
| 5 | website.com | 0.0 | 20 | 12.8 | 16.1 | 10.4 | 38.4 |
+---+-------------------+-------+-----+------+------+------+------+

Figure A

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 4
1.2 HOW TRACEROUTES WORK

Traceroutes are considered a more advanced network troubleshooting tool.


Let’s look into more details about how they actually work to help network engi-
neers and IT specialists troubleshoot networks.

In the IP Header, there is an 8-bit field called Time- Here’s how it works…
to-live (TTL) that goes from 0 to 255. The value of You can follow along with the Figure B below to
the TTL is decremented by 1 each time a packet get a better understanding.
is routed by a router. When the TTL value is 0, the
packet is discarded and an ICMP TTL Exceeded 1. Firstly, the Source (Src) sends a packet with
message might be sent back to the source of the TTL=1.
packet.
2. The Router decrements the TTL by 1, which
changes the value to 0. The packet is dropped
The main objective of the TTL field is not to trace
and the router sends an ICMP TTL Exceed-
a route but to discard packets if there is a routing
ed message. The destination IP address
loop in a network. So if there is a loop, since each
for the ICMP message equals the source IP
router decrements the TTL value, at one point, it
address of the discarded packet. The source
goes to 0 and gets discarded.
IP address of the discarded packet is the IP
address of the interface on which the packet
So the traceroute software uses the TTL to discov-
was received.
er the routers between a source and a destination.

Figure B

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 5
3. The Source receives the “ICMP TTL Exceed- What if the router doesn’t respond?
ed” message and adds the router IP to the The latency measured for each router in the tra-
Traceroute hops table. ceroute is the time difference between when the
message is sent and when the TTL exceeded mes-
4. Then the process starts over again with
sage is received. It’s important to note that there is
TTL=2.
no obligation for the router to send that ICMP TTL
5. The packet is routed through the first Router Exceeded message. So if a router never sends the
(R1), which also decrements the packet value. message, it will not be discovered in the traceroute,
but since it is still decrementing the TTL value, it
6. The second Router (R2) receives the packet,
will count as an unknown hop in the trace.
decrements the TTL, discards the packet and
sends the “ICMP TTL Exceeded” message.
Figure C below is an example with hop #3 not
7. And it continues like this by incrementing the sending ICMP TTL Exceeded packets.
TTL by 1 until it reaches its destination.

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1 | 0.0 | 20 | 4.3 | 1.5 | 0.4 | 4.3 |
| 2 | router1.ispA.com | 0.0 | 20 | 6.8 | 15.4 | 6.8 | 35.9 |
| 3 | ???
router2.ispB.com | 100.0
0.0 | 20 | 12.3- | 13.7
- | 8.4- | 28.1
- |
| 4 | router3.ispC.com | 0.0 | 20 | 11.3 | 13.8 | 9.0 | 38.4 |
| 5 | website.com | 0.0 | 20 | 12.8 | 16.1 | 10.4 | 38.4 |
+---+-------------------+-------+-----+------+------+------+------+

Figure C

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 6
1.3 COMMON NE T WORK
PROBLEMS

Networks are complex systems that can be prone to a variety of different prob-
lems. Let’s look at some of the most common network problems that many
users face.

Intermittent network problems frustrate users, affect business productivity, and are a nightmare for all IT admin-
istrators because they are the most difficult to solve. Network problems like choppy VoIP, jerky video calls, and
network and application slowness issues can affect your business in drastic ways - which is why it’s important
to know how to identify network issues so you can solve them quickly.

The 5 common network problems are:

1. High CPU Usage


CPU usage can increase drastically when your network becomes bogged down by enormous amounts of traffic,
like when a larger number of network packets are sent and received throughout your network.

2. High Bandwidth Usage


When something on your network is monopolizing your bandwidth by downloading gigabytes worth of data, it
creates a congestion in your network. When there’s congestion, it leaves not enough bandwidth for other parts
of your network - which can cause problems like slow download speed over the internet.

3. Poor Physical Connectivity


When a cable or connector is defective, the interface of the network equipment to which it is connected will typi-
cally generate errors. A copper cable, or fiber-optic cable can be damaged, which will likely reduce the amount of
data that can go through it without packet loss.

4. Malfunctioning Devices
Whenever you install or reconfigure a device, like switches and routers, or upgrade equipment firmware on your
network, you need to test that device to ensure that it’s been configured correctly. Many performance issues are
caused by misconfigurations that can turn into major problems down the line.

5. DNS Errors
DNS (Domain Name System) errors happen because you’re unable to connect to an IP address, signalling that
you may have lost network or internet access.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 7
2.1 HOW TO IDENTIFY
NE T WORK ISSUES
WITH TRACEROUTES

When it comes to identifying network problems with traceroutes, always


remember: “If the packet loss doesn't continue, it’s not an issue!”

Traceroute Performance Metrics Packet loss simply refers to the percentage of sent
When looking at a traceroute, we usually have two packets which never received a response out of the
important values for each hop or router: latency total number of sent packets.
and packet loss. Take a look below, at Figure D,
at this traceroute from the Obkio Live Traceroute In this example, a loss of 10% at hop 2 is quite signifi-
feature. cant. However, the first thing to look at is the number
of packets that have been sent (Snt column).
The latency is the round-trip latency calculated by
the source. It refers to the time difference between In this case, we lost 1 packet out of the 10 that
the time when a packet was sent and when a were sent, resulting in a packet loss rate of 10%. So
response was received. In the table above, we have 10% packet loss is a lot but out of 10 packets, it’s
10 latency values because 10 packets have been not very significant. Out of 1,000 or 10,000 packets,
sent (column Snt). The last packet latency is Last, it would be another story. Traceroute tools often
the average latency is Avg, the best and worst are have a configuration option to change the number
the two last columns. of packets that are sent and the interval at which
they are sent.

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1 | 0.0 | 10 | 1.0 | 1.6 | 0.5 | 3.9 |
| 2 | router1.ispA.com | 10.0 | 10 | 5.0 | 5.6 | 4.5 | 7.9 |
| 3 | router2.ispB.com | 0.0 | 10 | 10.0 | 10.6 | 9.5 | 15.9 |
| 4 | router3.ispC.com | 0.0 | 10 | 12.0 | 12.6 | 11.5 | 22.9 |
| 5 | router4.ispC.com | 0.0 | 10 | 13.0 | 13.6 | 12.5 | 23.9 |
| 6 | router5.ispC.com | 0.0 | 10 | 14.0 | 14.6 | 13.5 | 21.9 |
| 7 | router6.ispC.com | 0.0 | 10 | 15.0 | 15.6 | 14.5 | 29.9 |
| 8 | website.com | 0.0 | 10 | 16.0 | 16.6 | 15.5 | 39.9 |
+---+-------------------+-------+-----+------+------+------+------+

Figure D. Traceroute from the Obkio Live Traceroute feature

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 8
How to analyze packet loss In figure D, is the latency good or bad?
The rule of thumb when looking at a traceroute is Looking at the example below, is the latency good?
very simple: If the packet loss doesn’t continue, Is it normal? With only this traceroute and no more
don’t panic, it’s not an issue! information, we don’t know.

Let’s take a look at Figure E. We all know that 50% The latency between two hops can be affected by
packet loss over a connection is terrible and makes a number of things such as:
it almost unusable. So are there any issues with
• the distance between them
this new traceroute example? Let’s apply the rule of
thumb and figure it out. • the medium connecting them (fiber optic,
coax cable, copper lines, wireless, etc.)
Does the 50% packet loss continue in the tracer-
• the technology used (cable Docsis, DSL,
oute? Does every hop report that same 50% that
GPON, dedicated fiber, etc.)
we see with hop #2? The answer is no, otherwise
we would see packet loss with hops #3 through #8. • the configuration on the routers such as traf-
fic shaping
Should we panic and call our ISP to tell them we
• the network condition such as congestion
have packet loss on the path? No! Does it mean
there is an issue with that router? No! It only tells
So to be able to qualify the latency in a traceroute
us that hop #2 is responding to 50% of the packet
as good or bad, we need to know more information
or that 50% of the “ICMP TTL Exceeded” message
about the path. That information can come from our
returns to the source.
experience or knowledge of the path and routers,
but the best one comes from historical traceroutes.

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1 | 0.0 | 10 | 1.0 | 1.6 | 0.5 | 3.9 |
| 2 | router1.ispA.com | 50.0 | 10 | 5.0 | 5.6 | 4.5 | 7.9 |
| 3 | router2.ispB.com | 0.0 | 10 | 10.0 | 10.6 | 9.5 | 15.9 |
| 4 | router3.ispC.com | 0.0 | 10 | 12.0 | 12.6 | 11.5 | 22.9 |
| 5 | router4.ispC.com | 0.0 | 10 | 13.0 | 13.6 | 12.5 | 23.9 |
| 6 | router5.ispC.com | 0.0 | 10 | 14.0 | 14.6 | 13.5 | 21.9 |
| 7 | router6.ispC.com | 0.0 | 10 | 15.0 | 15.6 | 14.5 | 29.9 |
| 8 | website.com | 0.0 | 10 | 16.0 | 16.6 | 15.5 | 39.9 |
+---+-------------------+-------+-----+------+------+------+------+

Figure E. Example of insufficient information to qualify latency

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 9
+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1 | 0.0 | 10 | 1.0 | 1.6 | 0.5 | 3.9 |
| 2 | router1.ispA.com | 50.0 | 10 | 50.0 | 55.6 | 33.5 | 77.9 |
| 3 | router2.ispB.com | 50.0 | 10 | 52.0 | 54.6 | 9.5 | 56.9 |
| 4 | router3.ispC.com | 50.0 | 10 | 54.0 | 53.6 | 32.5 | 66.9 |
| 5 | router4.ispC.com | 50.0 | 10 | 55.0 | 55.6 | 44.5 | 72.9 |
| 6 | router5.ispC.com | 50.0 | 10 | 53.0 | 52.6 | 21.5 | 58.9 |
| 7 | router6.ispC.com | 50.0 | 10 | 52.0 | 56.6 | 29.5 | 99.9 |
| 8 | website.com | 50.0 | 10 | 56.0 | 55.6 | 43.5 | 87.9 |
+---+-------------------+-------+-----+------+------+------+------+

Figure F. Example of bad traceroute analysis

By comparing the latency over time, it’s much maybe not… We should focus on where the packet
easier to know if the latency we are looking at is loss starts and where it is between hop #1 and
normal or not. Of course, a network performance hop #2.
monitoring solution such as Obkio has historical
traceroute features that can help with that. Let’s take a look at the other network performance
metric we have in this traceroute, the latency. By
Analyzing a bad traceroute comparing Figure B and Figure F, it’s clear that
Above is another example similar to Figure E. We there is an increase in the latency values, and it all
have the same path from the source to the desti- starts between hop #1 and #2, just like the packet
nation but the packet loss and the latency values loss. In this case, with an increase of packet loss
are different. and an increase of latency, it looks like congestion
on the network.
Let’s start with packet loss and the rule of
thumb: does the packet loss continue after it Since Hop #1 (192.168.1.1) is the business’ firewall
started? Oh yes! In this case, we see 50% packet and Hop #2 (router1.ispA.com) is the ISP A router,
loss increase between hop #1 and hop #2 and it the congestion is probably on the business Internet
continues all the way to the last hop. So in connection. By looking at the bandwidth usage on
this case, there are chances that there is indeed the firewall, the IT administrator of the business
some packet loss between hop #1 and #2. can easily validate if there is congestion. A solution
such as Obkio’s Device Performance Monitoring
Be careful, internet traffic is asymmetrical so the solution is able to get that info.
issue can be on the reverse path!
In the case where there is no congestion, a trouble
So if there is packet loss with routers at ISP A, ticket can be opened with ISP A to troubleshoot the
ISP B and ISP C, maybe we should call all of them network issues and the traceroute must be shared
and tell them they have 50% packet loss on their with them to accelerate the troubleshooting.
routers… or maybe post that on social media... or

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 10
2.2 WHY DO SOME ROUTERS
DROP PACKE TS OR H AVE
HIGH L ATENCIES?

There are different reasons for why a single router may drop traceroute packets
or have higher latencies, but don’t worry, it’s normal.

Now, let’s take a look at the different reasons why ICMP TTL Exceeded Rate Limiting
a single router can drop traceroute packets or have A traceroute will report packet loss if:
higher latencies and why it’s normal. First, let’s take
• The packet from the source never reached the
a look at the figure below
router, so a response cannot be sent;

In Figure G below, hops #1, #4, #5 and #6 are • The packet from the source is received by
dropping packets. As explained earlier, the general the router but the response is lost on the
rule of thumb when looking at packet loss is that if reverse path;
the packet loss doesn’t continue with the following
• The packet from the source is received by the
hops, then it’s not a network issue.
router but it decided not to respond with an
“ICMP TTL Exceeded” message.
So in this example, everything runs smoothly
between the source and the destination. Let’s see
Obviously, reasons A and B correspond to standard
why some hops have packet loss and why hop #4
packet loss between the source and the router
has higher latencies.
either in the forward path (source to router) or on
the reverse path (router to source).

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1 | 90.0 | 10 | 1.0 | 1.6 | 0.5 | 3.9 |
| 2 | router1.ispA.com | 0.0 | 10 | 5.0 | 5.6 | 4.5 | 7.9 |
| 3 | router2.ispB.com | 0.0 | 10 | 10.0 | 10.6 | 9.5 | 15.9 |
| 4 | router3.ispC.com | 50.0 | 10 | 62.0 | 62.6 | 31.5 | 72.9 |
| 5 | ??? | 100.0 | 10 | - | - | - | - |
| 6 | router5.ispC.com | 20.0 | 10 | 14.0 | 14.6 | 13.5 | 21.9 |
| 7 | router6.ispC.com | 0.0 | 10 | 15.0 | 15.6 | 14.5 | 29.9 |
| 8 | website.com | 0.0 | 10 | 16.0 | 16.6 | 15.5 | 39.9 |
+---+-------------------+-------+-----+------+------+------+------+

Figure G. Example of hops dropping packets

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 11
Reason C is special because of something called Responding to traceroute packets with ICMP TTL
Rate Limiting. Some routers, but not all of them, Exceeded messages is not their top priority. This is
have interval rules to limit the number of ICMP TTL why some responses may be dropped if the CPU is
Exceeded messages that are sent by the router. doing something else, or it can be delayed if some
Usually, the rule is there to protect the router CPU other important processes are using the CPU when
(Central Processing Units). Sometimes it is config- a response needs to be sent.
urable, while other times it is not. In some cases,
routers never respond with the ICMP messages, as In that case, an increase of the latency is possible,
we see in bv A with hop #5. as shown on hop #4 of Figure A. Since the latency
increase is not propagated at every hop after hop
A lot of small offices and home routers/firewalls are #4, there is no network issue.
configured with a rate-limit of 1pps (packet per sec-
ond) and there is nothing you can do to change that. Impact of firewalls on traceroutes
For some reasons, some networks are configured
How to detect rate limiting to block ICMP traffic. This is not something we
One way to detect if the drops are related to rate recommend, but when it happens, the ICMP TTL
limiting is by changing the rate of the traceroute Exceeded messages are dropped by a firewall and
packets sent by the source. the traceroute will not work at all.

If by sending at 1pps (1 packet per second) you And just a quick note to all network administrators,
don’t see drops, but when you increase the rate to if you block all ICMP traffic, never do that with IPv6.
5pps (5 packets per second or 1 packet every 0.2 IPv6 uses ICMP to replace ARP and if you block
second), you see drops on some hops, this usually ICMP, you will also block ICMPv6 NDP (Neighbor
indicates that there is a rate limiting rule. For exam- Discovery Protocol), which will completely block
ple, an 80% packet loss rate at 5pps can suggest a IPv6 connectivity.
1pps rate limiting configuration.

Impact of the Router CPU


ISP routers are complex systems with a lot of com-
ponents such as CPUs, NPUs (Network Process-
ing Units), ASICs (Application Specific Integrated
Circuits) and FPGAs (Field Programmable Gate
Arrays). The main purpose of a router is to route
packets and to maintain routing protocols to make
sure the routing table is always up to date.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 12
2.3 DECODE THE HIDDEN
INFORM ATION FROM
TRACEROUTE DNS

The information hidden inside the hostname DNS is equally useful for network
engineers working at ISPs, as well as IT administrators within businesses.

Among the network performance metrics that switch is not present in the traceroute because it’s
are packet loss and latency, the hostname of the not a Layer 3 device.
traceroute hops can give a lot of information about
the real path from the source to the destination. The fun part is the hops #2, #3 and #4. It’s clear
There are four pieces of information that can be with the hostname that they are all three routers
decoded from the hostnames: of ISP A. Hop #2 is in city A and hop #3 and #4 in
City B. The ports and router numbers are clearly
• ISP operating the router identified in the hostnames.

• The city where the router is located


This one was easy. It’s not always the case with
• The router name, number, or unique id real traceroutes but here are some examples and
tricks on how to decode the information.
• The ingress interface or port by which the
traceroute packet came on the router
IATA City and Airport Codes
The International Air Transport Association (IATA)
Theoretical example
has a list of codes to identify a lot of major cities
Let’s see that with a theoretical example. The
and airports around the world. Some international
traceroute in Figure H below is between a desk-
ISPs use the IATA codes to identify the city where
top and a website. The desktop is connected to a
the routers are located.
switch that is connected to a router (Hop #1). The

+---+-------------------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1 | 0.0 | 1 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | port1.router1.cityA.ispA.com | 0.0 | 1 | 9.0 | 9.0 | 9.0 | 9.0 |
| 3 | port4.router2.cityB.ispA.com | 0.0 | 1 | 30.0 | 30.0 | 30.0 | 30.0 |
| 4 | port7.router3.cityB.ispA.com | 0.0 | 1 | 31.0 | 31.0 | 31.0 | 31.0 |
| 5 | website.com | 0.0 | 1 | 32.0 | 32.0 | 32.0 | 32.0 |
+---+-------------------------------+-------+-----+------+------+------+------+

Figure H. Theoretical example

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 13
IATA City Codes The rule is not perfect as we can see with FMT, but
The example below in Figure I is a traceroute from it makes a traceroute so beautiful that it’s okay to
a Hurricane Electric customer in Calgary, Canada cheat if there is no code.
to the he.net website. The hostnames are very
clear, the ISP is always he.net. The port number So you might have realized that the latencies are
and the router names are very clear but to be quite weird in that traceroute. It doesn’t mean there
honest, it’s not very useful except for the he.net is congestion or any network issue, it’s because of
network engineer. MPLS ICMP Tunnelling, which we will cover later
on in the document.
The cities in that example are IATA City Codes:

• YYC: Calgary, AB, Canada

• YVR: Vancouver, BC, Canada

• SEA: Seattle, WA, USA

• PDX: Portland, OR, USA

• PAO: Palo Alto, CA, USA

• FMT: Unknown... But it’s for Fremont, CA, USA


where HE.net is hosted.

+---+-------------------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------------------+-------+-----+------+------+------+------+
| 1 | 100ge11-1.core2.yyc1.he.net | 0.0 | 1 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | 100ge14-2.core1.yvr1.he.net | 0.0 | 1 | 31.0 | 31.0 | 31.0 | 31.0 |
| 3 | 100ge10-2.core1.sea1.he.net | 0.0 | 1 | 30.0 | 30.0 | 30.0 | 30.0 |
| 4 | 100ge15-1.core1.pdx1.he.net | 0.0 | 1 | 31.0 | 31.0 | 31.0 | 31.0 |
| 5 | 100ge15-2.core1.pao1.he.net | 0.0 | 1 | 32.0 | 32.0 | 32.0 | 32.0 |
| 6 | 100ge14-1.core3.fmt1.he.net | 0.0 | 1 | 32.0 | 32.0 | 32.0 | 32.0 |
| 7 | he.net | 0.0 | 1 | 32.0 | 32.0 | 32.0 | 32.0 |
+---+-------------------------------+-------+-----+------+------+------+------+

Figure I. HE.net Example

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 14
+---+----------------------------------------------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------------------------------------------+-------+-----+------+------+------+------+
| 1 | gi0-4-1-19.99.agr21.ymq01.atlas.cogentco.com | 0.0 | 1 | 0.5 | 0.5 | 0.5 | 0.5 |
| 2 | te0-0-0-6.ccr22.ymq01.atlas.cogentco.com | 0.0 | 1 | 0.8 | 0.8 | 0.8 | 0.8 |
| 3 | be2104.ccr22.alb02.atlas.cogentco.com | 0.0 | 1 | 5.7 | 5.7 | 5.7 | 5.7 |
| 4 | be2916.ccr42.jfk02.atlas.cogentco.com | 0.0 | 1 | 8.8 | 8.8 | 8.8 | 8.8 |
| 5 | be2807.ccr42.dca01.atlas.cogentco.com | 0.0 | 1 | 14.5 | 14.5 | 14.5 | 14.5 |
| 6 | be3524.rcr22.iad03.atlas.cogentco.com | 0.0 | 1 | 15.3 | 15.3 | 15.3 | 15.3 |
| 7 | be2952.agr11.iad03.atlas.cogentco.com | 0.0 | 1 | 15.4 | 15.4 | 15.4 | 15.4 |
| 8 | cogentco.com | 0.0 | 1 | 14.9 | 14.9 | 14.9 | 14.9 |
+---+----------------------------------------------+-------+-----+------+------+------+------+

Figure J. cogentco.com Example

IATA Airport Codes the response than the router. But remember, what
The example in Figure J above is a traceroute from matters is the time it takes to route the packet, and
a Cogent Communications customer in Montreal, routers are very good at this.
Canada to the cogentco.com website. Instead of
using city codes, they are using airport codes: Japanese Example
In Figure K, the network engineers at NTT were
• YMQ: Montreal, QC, Canada
very explicit in the hostnames. They added the
• ALB: Albary, NY, USA country and a 6 letter string to identify the city and
• JFK: New York, NY, USA the state.
• DCA: Washington, DC, USA
The routers are located at:
• IAD: Washington, DC, USA
• Madrid, Spain
In this traceroute, the last hop responded a bit fast- • London, England
er than the previous one. This is not unusual since
• Newark, NJ, USA
the routers CPUs sending the ICMP TTL Exceeded
message are not very fast compared to the server • Seattle, WS, USA
CPU, and the server takes less time to generate • Tokyo, Japan

+---+-------------------------------------+-------+-----+-------+-------+-------+-------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+-------------------------------------+-------+-----+-------+-------+-------+-------+
| 1 | ae-7.r02.mdrdsp03.es.bb.gin.ntt.net | 0.0 | 1 | 0.5 | 0.5 | 0.5 | 0.5 |
| 2 | ae-6.r24.londen12.uk.bb.gin.ntt.net | 0.0 | 1 | 25.0 | 25.0 | 25.0 | 25.0 |
| 3 | ae-7.r20.nwrknj03.us.bb.gin.ntt.net | 0.0 | 1 | 95.0 | 95.0 | 95.0 | 95.0 |
| 4 | ae-5.r22.sttlwa01.us.bb.gin.ntt.net | 0.0 | 1 | 151.0 | 151.0 | 151.0 | 151.0 |
| 5 | ae-3.r30.tokyjp05.jp.bb.gin.ntt.net | 0.0 | 1 | 234.0 | 234.0 | 234.0 | 234.0 |
| 6 | ae-2.r03.tokyjp05.jp.bb.gin.ntt.net | 0.0 | 1 | 240.0 | 240.0 | 240.0 | 240.0 |
| 7 | ae-0.ocn.tokyjp05.jp.bb.gin.ntt.net | 0.0 | 1 | 241.0 | 241.0 | 241.0 | 241.0 |
| 8 | ??? | 0.0 | 1 | - | - | - | - |
+---+-------------------------------------+-------+-----+-------+-------+-------+-------+

Figure K. NTT Example

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 15
The 100km / 1ms rule of thumb
To determine if a latency between two cities is
normal or optimal, you can use the 100km / 1ms
rule of thumb. The rule is not perfect but it’s a good
approximation of the latency inside a fiber optic
network. This can help you guess what is the city
for a specific router if you don’t recognize it easily
with the router hostname.

Why is it useful?
The information inside the hostname DNS is of
course very useful for the network engineers work-
ing at ISPs. However, it can also help IT administra-
tors inside enterprises understand why the latency
is changing between a source and a destination.
For example, a fiber cut between Montreal and
New York City will force the traffic to go through
Toronto and will add 10ms. With a traceroute
history, it’s then easy to identify route changes and
explain latency changes.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 16
2.4 INTERNE T TRAFFIC IS
ASYMME TRICAL - HOW TO
CATCH REVERSE PATH ISSUES?

Did you know that Internet traffic is asymmetrical most of the time? Let’s go
through why that matters when you’re looking at a traceroute.

When the source SRC sends a packet to the des-


tination DST, ISP A receives the packet on router
A-NYC. As soon as it receives the packet, it will
search for a way to send it to ISP B. In this case,
the interconnection between the two routers in
NYC is the fastest path to reach ISP B. The packet
Figure L. Normal Traffic Flow then continues its way inside ISP B up to the desti-
nation DST.
When looking at a traceroute, people often forget
that traffic on the Internet is asymmetrical most
The traceroute from SRC to DST will look like
of the time. It is called the Hot Potato Routing. As
Figure M, located below.
soon as an ISP has a packet with a destination
address outside its own network, it will try to pass
On the other side, when DST replies back to SRC,
the packet to the next ISP ASAP.
the packet goes from ISP B to A in SFO because
this is the fastest route between the two networks.
Figure L above is a good example of the Hot Pota-
The forward traffic from SRC to DST uses a differ-
to Routing. In the figure, there are 2 ISPs (A and B)
ent path than the reverse traffic from DST to SRC.
and they both have 3 routers located in New York
City (NYC), Dallas (DAL) and San Francisco (SFO).
In the 3 cities, the ISPs have interconnections to
exchange traffic from one network to another.

+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | A-NYC | 0.0 | 1 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | B-NYC | 0.0 | 1 | 2.0 | 2.0 | 2.0 | 2.0 |
| 3 | B-DAL | 0.0 | 1 | 40.0 | 40.0 | 40.0 | 40.0 |
| 4 | B-SFO | 0.0 | 1 | 80.0 | 80.0 | 80.0 | 80.0 |
| 5 | DST | 0.0 | 1 | 81.0 | 81.0 | 81.0 | 81.0 |
+---+----------+-------+-----+------+------+------+------+

Figure M. SRC to DST

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 17
+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | B-SFP | 0.0 | 1 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | A-SFO | 0.0 | 1 | 2.0 | 2.0 | 2.0 | 2.0 |
| 3 | A-DAL | 0.0 | 1 | 40.0 | 40.0 | 40.0 | 40.0 |
| 4 | A-NYC | 0.0 | 1 | 80.0 | 80.0 | 80.0 | 80.0 |
| 5 | SRC | 0.0 | 1 | 81.0 | 81.0 | 81.0 | 81.0 |
+---+----------+-------+-----+------+------+------+------+

Figure N. DST to SRC

The traceroute from DST to SRC will look like The forward path traceroute will look like Figure P.
Figure N, located above.
The traceroute clearly indicates a network issue
Congestion in the Reverse Path because the packet loss continues below hop 4.
Let’s take a look at the traceroute if there is con-
gestion (50% packet loss) on the reverse path If the forward and the reverse paths were the same,
between A-DAL and A-SFO exactly at the red circle we could say that there is congestion inside ISP B
on Figure O below. between Dallas and San Francisco. But we know it’s
not that. The congestion is in ISP A’s network.

Figure O. Congestion in the reverse path

+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | A-NYC | 0.0 | 10 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | B-NYC | 0.0 | 10 | 2.0 | 2.0 | 2.0 | 2.0 |
| 3 | B-DAL | 0.0 | 10 | 40.0 | 40.0 | 40.0 | 40.0 |
| 4 | B-SFO | 50.0 | 10 | 80.0 | 80.0 | 80.0 | 80.0 |
| 5 | DST | 50.0 | 10 | 81.0 | 81.0 | 81.0 | 81.0 |
+---+----------+-------+-----+------+------+------+------+

Figure P. SRC to DST during congestion

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 18
+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | B-SFP | 0.0 | 10 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | A-SFO | 0.0 | 10 | 2.0 | 2.0 | 2.0 | 2.0 |
| 3 | A-DAL | 50.0 | 10 | 40.0 | 40.0 | 40.0 | 40.0 |
| 4 | A-NYC | 50.0 | 10 | 80.0 | 80.0 | 80.0 | 80.0 |
| 5 | SRC | 50.0 | 10 | 81.0 | 81.0 | 81.0 | 81.0 |
+---+----------+-------+-----+------+------+------+------+

Figure Q. DST to SRC during congestion

The reverse path traceroute will show us the other Congestion at the Dallas Interconnection
side of the medal, shown in Figure Q above. Networks are complex. There are millions of
connections on the Internet where a network issue
So in that example, Figure Q, the reverse traceroute can happen. Let’s see what happens if there is 50%
from DST to SRC gave us the good answer about packet loss on the Dallas interconnection between
where the problem was, but unfortunately, there is ISP A and B. The two traceroutes look like Figure S
no way for us to know which traceroute (forward or and Figure T (located on the next page).
reverse) is exact. However, with that information in
hand, the network engineer at ISP A and B can help
troubleshoot the network issue that is affecting the
traffic between SRC and DST.

To help troubleshoot the issue further, traceroutes


from sources and destinations that are in the same
ISP can help locate the exact issue. Figure R. Congestion at the Dallas interconnection

+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | A-NYC | 0.0 | 10 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | B-NYC | 0.0 | 10 | 2.0 | 2.0 | 2.0 | 2.0 |
| 3 | B-DAL | 50.0 | 10 | 40.0 | 40.0 | 40.0 | 40.0 |
| 4 | B-SFO | 0.0 | 10 | 80.0 | 80.0 | 80.0 | 80.0 |
| 5 | DST | 0.0 | 10 | 81.0 | 81.0 | 81.0 | 81.0 |
+---+----------+-------+-----+------+------+------+------+

Figure S. SRC to DST during congestion in Dallas

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 19
As we learned earlier, if the packet loss doesn’t Now you understand why it’s important to have a
continue, don’t panic, there is no network issue. reverse traceroute to compare the data. It’s also
Well, it’s correct to assume that there is no network clear that a single traceroute can be misleading,
issue affecting the traffic from SRC to DST, but so we must be careful when we think that we’ve
in that special case, all the ICMP TTL Exceeded pinpointed a network issue.
responses from B-DAL to SRC are dropped at the
interconnection issue because B-DAL is using the As one can imagine, a network performance moni-
shortest path (i.e. the interconnection) to send toring solution like Obkio offers reverse traceroute
back the packet. On the other side, the responses to help troubleshoot network issues.
from A-DAL to DST are also dropped.

+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | B-SFP | 0.0 | 10 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | A-SFO | 0.0 | 10 | 2.0 | 2.0 | 2.0 | 2.0 |
| 3 | A-DAL | 50.0 | 10 | 40.0 | 40.0 | 40.0 | 40.0 |
| 4 | A-NYC | 0.0 | 10 | 80.0 | 80.0 | 80.0 | 80.0 |
| 5 | SRC | 0.0 | 10 | 81.0 | 81.0 | 81.0 | 81.0 |
+---+----------+-------+-----+------+------+------+------+

Figure T. DST to SRC during congestion In Dallas

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 20
2.5 HOW TO SH ARE A TRACEROUTE
WITH AN ISP NOC?

Whether a network problem occurs in your ISP’s network or somewhere else


on the Internet, we always recommend reaching out to your ISP. A traceroute
can provide your ISP with all the info they need to get troubleshooting.

The first most important factor to remember when Who should I contact first?
looking at a traceroute is: If the packet loss doesn’t Whether the problem is in your ISP’s network or
continue, don’t panic, it’s not an issue! somewhere else on the Internet, we always rec-
ommend reaching out to your ISP’s NOC (Network
The second most important thing to remember is Operation Center) to help troubleshoot.
that internet traffic is asymmetrical and that the net-
work issue seen on a traceroute might be complete- Explain to them the issue with the following infor-
ly different when looking at the reverse traceroute. mation on hand:

• IP addresses of the Source and the Destina-


That being said, if a traceroute shows a network
tion
issue affecting the network path from a source to a
destination, it’s important to work on it. • A traceroute from Source to Destination

• A traceroute from Destination to Source


Where is the problem?
First, figure out where the problem is. If it is near • Historical traceroutes where everything is
your firewall or router, make sure you have a good running fine (if you have them)
device performance monitoring solution because
• A way to replicate the issue (more on that
the problem is probably right under your nose. Fire-
later!)
wall exhausted resources (i.e. high CPU usage?) or
bandwidth congestion (i.e. connection maxed out!)
At this point, within large ISPs, you should be able
are typical network issues.
to skip level 1 support with standard “reboot your
modem” solutions and move to level 2 support with
Otherwise, if the problem does not seem to be
network technicians or engineers more familiar
near your network or is affecting multiple offices or
with traceroute analysis and how the internet
customers, it might be time to work with someone
else.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 21
works. These resources will be able to confirm if How to replicate the issue?
your analysis is good or not. If you have the chance For those of you who are programmers, you know
to have a good ISP, they will also explain why your that when you can replicate a bug, half of the job is
analysis is wrong if this is the case. done and it will be way easier to fix it. It’s the same
thing with network issues. So when you open a
Something else that is very important is: Keep a trouble ticket with your ISP’s NOC, if you are able
good attitude. to give them a way to replicate the issue, it will be
easier for them to troubleshoot and fix the issue.
You may be new to traceroute analysis and your
only goal is to find out where the issue is and to This is why Obkio allows 48h sharing links with the
make sure it is fixed quickly so your end-users (and Live Traceroute feature. With this feature, Obkio
your boss!) are happy. users can share a link that is accessible by any-
one without having to log into the App. This way,
Now, what happens if the issue is not in your ISP’s your ISP network engineers can see the problem,
network but somewhere else on the Internet? Your change the traceroute options and validate that
best bet is to work with your ISP engineers and ask their changes are fixing the issue without having
them to contact their peers at other ISPs. to get back to you. In the end, this means a faster
resolution and less time wasted.
A lot of ISPs have direct relationships. Some are
clients of others and in a lot of cases, they ex- In order to increase capacity and resiliency be-
change traffic (called peering in the industry) so tween routers, it is common to have more than one
they know who to contact. They also have access connection between them. If at any point a router
to databases with technical contacts that are not does not support higher speed interfaces, the only
available to everyone. And since you had a good solution to support a higher capacity would be to
attitude initially, they will probably be happy to work aggregate two or more ports together.
with you.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 22
2.6 IMPACT OF LOAD BAL ANCING
AND MULTIPLE PATHS ON
TRACEROUTES

Now that we’ve covered how a traceroute works and how to analyze it, let’s go
through two more advanced aspects related to traceroutes.

Next, we’ll be covering two advanced but very Load balancing across multiple interfaces
important topics about traceroutes and modern Before we go back to traceroutes, we need to
networks. understand how the load balancing or load shar-
ing is done by the routers when there are multiple
Multiple Path Technologies interfaces (i.e. links or ports). There are two algo-
There are usually two possible configurations that rithms that can be used by routers: per-packet and
allow you to set up multiple connections between per-flow. This applies to both Link Aggregation
routers: the Link Aggregation and the Equal Cost and ECMP.
Multi Path (ECMP).
Per-packet Load Balancing
Link Aggregation Per-packet load balancing selects a different inter-
The first configuration is Link Aggregation at the face for each packet. This is an excellent method
Ethernet layer. It is also often referred to as Port to achieve equal load balancing on the interface
Channel by Cisco users, but terms like Bonding, and get the best bandwidth utilization. However,
Bundling, Teaming and Trunking are also used by it can lead to packet reordering, which is terrible
equipment vendors. Whatever the number of inter- for real-time traffic such as VoIP. It is never recom-
faces (ports) in the LAG (Link Aggregation Group), mended to use per-packet load balancing unless
there will be a single IP address associated with you really know what you are doing.
the LAG.
Per-flow Load Balancing
Equal Cost Multi Path (ECMP) Per-flow load balancing uses a hashing algo-
The second configuration is named Equal Cost rithm on the packet headers to select which
Multi Path (ECMP). Compared to Link Aggregation, interface to use. The goal is to always have all
it operates at the IP Layer. There is one IP address the packets from a single flow going through
for each interface and it’s the router’s routing en- the same interface.
gine that will load balance the traffic between the
different paths.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 23
Generally, two packets are considered in the same With ECMP, since there are multiple IPs and since
flow if they match the same five attributes: the router source IP for the ICMP TTL Exceeded
is the IP of the interface that received the packet,
• Source IP Address
you will be able to know exactly which interface
• Destination IP Address received the packet.

• IP Protocol (ex: ICMP, TCP, UDP)


Figure U (on next page) shows a traceroute from a
• Source Port (if TCP or UDP) business Internet connection from Fibrenoire to the
obkio.com website hosted inside AWS. Take a look
• Destination Port (if TCP or UDP)
at hop #3. We see two hostnames because there
are at least two paths between the router at hop #2
The 5 attributes can be changed on a per-router
and the router at hop #3.
basis by the network engineers but generally,
these are the 5 attributes that are used on the
At hop #4, it’s even more interesting where we see
Internet. With this technique, a TCP session (ex:
that the two hostnames are not from the same
HTTP request) between a source and a destina-
router. Since the two paths inside the network are
tion (ex: desktop and web server) will always use
equal and the router is using ECMP, it will route the
the same path. The same goes for a VoIP call
packets on either path based on the packet header
using UDP packets.
information and the hashing algorithm.

Multiple Paths, Load Balancing and


Traceroutes
Let’s get back to traceroutes. So we learned that
there are two technologies to increase the number
of links between routers: Link Aggregation & ECMP.

With Link Aggregation, there is a single IP ad-


dress configured on the whole LAG. With this
configuration, it would be impossible for tracer-
outes to know if there are multiple links, unless
the DNS hostname of the IP is clear about it. For
example, ISP A can choose to use the hostname
lag1.router1.ispA.com.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 24
+----+---------------------------------------+-------+------+------+------+------+-------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+----+---------------------------------------+-------+------+------+------+------+-------+
| 1 | 192.168.1.1 | 72.8 | 2220 | 3.2 | 2.0 | 0.6 | 34.4 |
| 2 | CUST01-C1.asr02.mtl1080.fibrenoire.ca | 0.0 | 2220 | 2.0 | 3.6 | 1.6 | 51.0 |
| 3 | et-0-0-2.mpr01.mtlcx03.fibrenoire.ca | 0.0 | 2220 | 1.4 | 4.5 | 1.0 | 66.7 |
| | et-0-0-3.mpr01.mtlcx03.fibrenoire.ca | | | | | | |
| 4 | et-0-0-4.cre01.mtl1981.fibrenoire.ca | 0.0 | 2220 | 1.2 | 5.6 | 1.1 | 84.9 |
| | et-0-0-4.cre01.mtlsunl.fibrenoire.ca | | | | | | |
| 5 | et-0-0-2.mpr02.mtlcx03.fibrenoire.ca | 0.0 | 2220 | 1.4 | 5.6 | 1.2 | 135.0 |
| | et-0-0-3.mpr02.mtlcx03.fibrenoire.ca | | | | | | |
| 6 | 52.95.219.62 | 0.0 | 2220 | 3.3 | 4.3 | 1.1 | 78.2 |
| 7 | 52.94.82.32 | 0.0 | 2220 | 2.7 | 4.4 | 1.8 | 67.4 |
| | 52.94.81.134 | | | | | | |
| 8 | 52.94.81.193 | 0.0 | 2220 | 7.8 | 3.3 | 1.4 | 56.3 |
| | 52.94.81.191 | | | | | | |
| 9 | 52.94.82.69 | 0.0 | 2220 | 28.2 | 26.4 | 22.4 | 60.5 |
| | 54.239.44.18 | | | | | | |
| 10 | 150.222.242.150 | 48.6 | 2220 | 28.9 | 26.4 | 22.3 | 50.5 |
| | 150.222.242.118 | | | | | | |
| 11 | 150.222.242.150 | 89.1 | 2220 | 28.9 | 26.1 | 22.4 | 37.7 |
| | 150.222.242.152 | | | | | | |
| 12 | ??? | 100.0 | 2220 | - | - | - | - |
| 13 | ??? | 100.0 | 2220 | - | - | - | - |
| 14 | ??? | 100.0 | 2220 | - | - | - | - |
| 15 | 150.222.243.215 | 23.6 | 2220 | 24.0 | 26.1 | 22.3 | 54.7 |
| | 150.222.241.183 | | | | | | |
| 16 | 150.222.241.193 | 74.9 | 2220 | 25.5 | 25.8 | 22.3 | 41.9 |
| | 150.222.243.223 | | | | | | |
| 17 | ??? | 100.0 | 2220 | - | - | - | - |
| 18 | ??? | 100.0 | 2220 | - | - | - | - |
| 19 | ??? | 100.0 | 2220 | - | - | - | - |
| 20 | ??? | 100.0 | 2220 | - | - | - | - |
| 21 | ??? | 100.0 | 2220 | - | - | - | - |
| 22 | ??? | 100.0 | 2220 | - | - | - | - |
| 23 | ??? | 100.0 | 2220 | - | - | - | - |
| 24 | ??? | 100.0 | 2208 | - | - | - | - |
| 25 | ??? | 100.0 | 2146 | - | - | - | - |
| 26 | 52.93.28.204 | 23.3 | 1977 | 22.7 | 26.3 | 22.3 | 76.3 |
| | 52.93.28.202 | | | | | | |
| 27 | 52.93.28.204 | 73.7 | 1274 | 26.9 | 26.2 | 22.4 | 48.7 |
| | 52.93.28.202 | | | | | | |
| 28 | ??? | 100.0 | 1226 | - | - | - | - |
+----+---------------------------------------+-------+------+------+------+------+-------+

Figure U. Traceroute to obkio.com

Types of packets used for traceroutes On Windows, the tracert command uses ICMP.
As explained earlier, the load balancing is done This way, the traceroute will display a single path,
using a hashing algorithm on the packet headers. making it easier to understand.
Depending on the software used to do the tracer-
oute, the packet sent could be ICMP, UDP or TCP With the Obkio Live Traceroute feature, you can
SYN packets. select the protocol and the ports to use. Therefore,
you can use ICMP to have an easy to read tracer-
On Unix, the default traceroute command uses oute or use TCP (or UDP) with random ports to see
UDP with random destination ports between 33434 the full paths between the source and the destina-
to 33534. Since the port is random, the packet tion. You can also use fixed source and destination
header will not always be the same, so the load ports in the TCP and UDP traceroutes.
balancing algorithms will use multiple paths.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 25
2.7 MPLS NE T WORKS,
T TL PROPAGATION AND
ICMP TUNNELING

There are two aspects of MPLS networks that affect traditional IP traceroutes
and can change the way we look at a traceroute without giving us the exact pic-
ture on what’s really going on.

Service providers (SP) and large enterprises use Note: This is always true for MPLS VPN services
MPLS (Multiprotocol Label Switching) networks (what SP calls MPLS private networks). For the
to better segment and manage their networks. Internet, it depends on the configuration of the
Initially designed to allow faster switching than IP routing table.
networks, the main usage of MPLS networks is to
offer multiple services within the same network such When an MPLS router in the middle of the LSP
as MPLS VPNs or MPLS Ethernet Pseudowires. needs to send the ICMP TTL Exceeded packet
back to the source, it doesn’t know where to send
This section is not about MPLS networks but we the packet. When this happens, the router that
will cover two aspects of MPLS networks that is sending the ICMP TTL Exceeded packet will
affect traditional IP traceroutes. It applies to all IP add the same MPLS Label as the original packet
traffic going through an MPLS Network, whether it and forward that packet to the MPLS Destination
is for a private network (i.e. MPLS VPN service) or router (based on the MPLS Label information). The
standard Internet connectivity. destination router will remove the MPLS Label and
then forward the packet back to the source. This is
ICMP Tunneling called ICMP Tunneling.
MPLS is a technology that encapsulates every
packet with one or many MPLS Labels that are In Figure V (on next page) the routers R2, R3 and
then switched inside the MPLS network. To keep it R4 are in the middle of the LSP and they don’t know
simple, let’s say that the MPLS Label contains the how to reach the source. The only thing they know
information about the destination of the packet. is that the destination for the MPLS packet is R5.
Inside an MPLS network, only the first and the last When they need to send the ICMP TTL Exceeded
router of a LSP (Label Switched Path) have the packet, they add the MPLS label to reach R5 and
routing table for a specific service. then R5 routes the packet back to R1 (with another
MPLS Label that tells the other routers to forward
to R1).

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 26
Figure V. ICMP Tunneling

In the two following examples, there is exactly 10ms between each router and there is 50% packet loss between
R4 and R5.

This is the traceroute without the MPLS networks (i.e. a traditional IP network):

+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | R1 | 0.0 | 10 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | R2 | 0.0 | 10 | 11.0 | 11.0 | 11.0 | 11.0 |
| 3 | R3 | 0.0 | 10 | 21.0 | 21.0 | 21.0 | 21.0 |
| 4 | R4 | 0.0 | 10 | 31.0 | 31.0 | 31.0 | 31.0 |
| 5 | R5 | 50.0 | 10 | 41.0 | 41.0 | 41.0 | 41.0 |
| 6 | DST | 50.0 | 10 | 42.0 | 42.0 | 42.0 | 42.0 |
+---+----------+-------+-----+------+------+------+------+

Figure W. Traceroute without MPLS Network

However, when it’s going through an MPLS network with ICMP Tunneling, the traceroute will look like this:

+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | R1 | 0.0 | 10 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | R2 | 50.0 | 10 | 41.0 | 41.0 | 41.0 | 41.0 |
| 3 | R3 | 50.0 | 10 | 41.0 | 41.0 | 41.0 | 41.0 |
| 4 | R4 | 50.0 | 10 | 41.0 | 41.0 | 41.0 | 41.0 |
| 5 | R5 | 50.0 | 10 | 41.0 | 41.0 | 41.0 | 41.0 |
| 6 | DST | 50.0 | 10 | 42.0 | 42.0 | 42.0 | 42.0 |
+---+----------+-------+-----+------+------+------+------+

Figure X. Traceroute with MPLS Network

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 27
Both the latency and the packet loss are different even if the network path is the same. So if the latency does a
big jump and then stays the same for hops that are far away from each other, keep in mind that ICMP Tunneling
might be the cause.

TTL Propagation
Earlier we briefly discussed the MPLS label that is added to the packet when it comes in an MPLS network.
In the label, similar to the Ethernet or IP packet header, there are multiple fields: destination label, traffic class
(QoS) and time-to-live (TTL).

The TTL field inside the MPLS label is used exactly as the IP TTL field. Each time it reaches a router, it is decre-
mented by one.

But the question is, what is the initial TTL value? There are two choices:

1. It is copied from the IP TTL field. This is called TTL propagation.

2. A new value of 255 is used so TTL Propagation is disabled.

When TTL propagation is disabled, some routers are not visible in the traceroute. Let’s get back to the same
previous example but with TTL propagation disabled.

Figure Y. TTL Propagation Disabled

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 28
If there is exactly 10ms between each router and there is 50% packet loss between R4 and R5, the traceroute
without TTL propagation will look like this:

+---+----------+-------+-----+------+------+------+------+
| # | Hostname | Loss% | Snt | Last | Avg | Best | Wrst |
+---+----------+-------+-----+------+------+------+------+
| 1 | R1 | 0.0 | 10 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2 | R5 | 50.0 | 10 | 41.0 | 41.0 | 41.0 | 41.0 |
| 3 | DST | 50.0 | 10 | 42.0 | 42.0 | 42.0 | 42.0 |
+---+----------+-------+-----+------+------+------+------+

Figure Z. Traceroute with MPLS Network

As you can see, MPLS networks change the way we look at traceroute without giving us the exact picture on
what is going on. That being said, what we learned earlier is still valid: if the packet loss doesn’t continue, it’s not
a problem.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 29
3 OBKIO'S LIVE
TRACEROUTE FEATURE LIVE

With all this talk about traceroutes and how they can help you troubleshoot net-
work problems, you may be wondering: “What’s the best traceroute tool for me?”

To help users troubleshoot network performance Live Traceroute Sharing


issues faster than ever before, Obkio offers a Live In the Live Traceroute page, there is a button Share
Traceroute feature as a core feature to their Net- Link that can be used to share the page with some-
work Performance Monitoring Solution. one else. People that use the link do not need to
have an Obkio account or be logged into the App.
Live Traceroutes show the forward and the reverse The link is valid for 48 hours.
traceroutes with latencies and packet loss. It’s the
perfect tool to pinpoint the location of network per- With this sharing feature, it is easy for our users to
formance issues. It also allows users to share the share the traceroute results with their colleagues,
results within their team but also with third-parties IT consultants but also their service providers’
such as IT consultants or service providers. support team. With access to the live traceroute
feature, everyone will be able to troubleshoot as
How Live Traceroutes Work fast as possible. Faster troubleshooting means
The live traceroute feature is used in combination faster fix and less pain for the end-users.
with the network monitoring sessions. This means
that a monitoring session must be configured be- Traceroute From Both Directions
tween two agents to be able to use that feature. Traffic in IP networks is asymmetrical, meaning
that the path used from a source to a destination
Inside the App (Web or Mobile), users can go to is probably not the same used from the destination
any network monitoring session page and click the to the source. Network engineers will tell you, when
Live Traceroute button to launch the live traceroute looking at traceroute, you must have traceroute
feature. Note that a separate live traceroute win- from both directions.
dow (popup) is opened.
This is exactly what the live traceroute gives you.
On the next page is a screen capture that shows You can see, in real-time, the traceroute from
how to do it. both directions.

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 30
Traceroute Settings • The ToS (IP Type of Service) which includes
When the traceroute is not running, it is possible to the DSCP code to troubleshoot a specific
change the settings of the traceroute. QoS class of service similar to our QoS Mon-
itoring feature.
You can change:

• The traceroute packet interval to enforce a As you see in the screen capture above, Live Tracer-

more aggressive rate than the default 1 pack- oute is an advanced feature. Traceroute is a power-

et per second. ful tool that is easy to misinterpret and get to false
conclusions - which is why it’s extremely important
• The protocol used from the default ICMP to to understand what it does and how it works.
TCP and UDP and select random source and
destination ports to detect multiple paths.

Obkio's Live Traceroute Feature

The Complete Guide to Traceroutes: The Network Troubleshooting Tool For IT Pros 31
Network Performance Monitoring

Every IT pro wishes that everything could run smoothly so they can work on new projects
and deploy new infrastructure. But it is part of IT; things can break, capacity can be insuff-
isant or a configuration change can affect performance. At Obkio, we think that you should
not rely on your users to raise performance issues.

Our solutions monitor the performance 24/7, establish a baseline for each network monitor-
ing session and let you know if anything goes wrong.

Try Obkio for free or contact our experts to learn more!

Free Trial Get in Touch

Montréal
5605 avenue de Gaspé, suite 204
Montréal, QC, H2T 2A4
Canada
514-832-0101
obkio.com
hello@obkio.com

You might also like