You are on page 1of 34

INTERPROVIDER OPTION C, ON JUNIPER

JUNOS ROUTERS – PART 1:


CONFIGURATION WITH LDP (INCLUDES
FULL TOPOLOGY CONFIG!) (JNCIP-SP,
JNCIE-SP)
 July 10, 2019 7611 Views  2 Comments
When I was young, my granddad used to say to me “Son, you’ll never extend an
MPLS VPN between two autonomous systems. It can’t be done.” Well, guess what
Granddad: you were wrong, plus you were an idiot. Maybe they couldn’t do it in the
1940s, but today we can do it easily. Shut up, Granddad. Shut up.

In previous posts we’ve seen two ways to do extend MPLS VPNs between two
ISPs. In our post on Interprovider Option A we saw how we can treat the other ISP as if
they were just another customer of ours. Then, in our Option B post we saw that we
can actually exchange VPN labels with the other ISP, over a dedicated link.
Today we’re going to look at Interprovider Option C, so-called because it’s the third
suggestion on the actual RFC for MPLS VPNs.
Option C isn’t too hard to understand, in principle. However, there’s a fair number of
pieces needed to actually get it working. In addition, depending on exactly what
protocols you’re using, and depending on where you’re storing your prefixes, there’s
a looooooot of tweaks you need to know about.

That’s why this blog post is the first of three. Why three posts? There’s plenty of
other guides out there that give you a quick run-down on the high points about
Option C. These posts are going to do the opposite. By the end of this three-parter,
we’ll know Option C inside-out:

 In this first post we’ll see a “basic” Option C config, with both ISPs running LDP.
 In our second post, we’ll take a look at the labels involved, because there’s something
very unique about the label stack in Option C. We’ll also talk about the use case. Why
would we ever use Option C, over Option A or B? In Part 2, we’ll find out.
 Finally, in our third post, we’ll reconfigure one of our ISPs to run RSVP, and we’ll also
make a few other changes too. We’ll see how these changes break things – and then, we’ll
see how to fix them.
Fun fact: Option C is the most complicated solution to set up, but once it’s actually
up and running it’s also the most scalable. Why? In just a moment, you’ll find out.
But first:

REQUIRED KNOWLEDGE FOR THIS POST

Juniper
Business
Use Only
If you’ve just arrived here after a Google search for something like “Juniper inter-AS
option C”, you might like to know that I’ve also done posts before on Interprovider
Option A and Interprovider Option B. In fact, this post uses the same topology and
configuration. With that in mind, you might want to read those posts first.
It’s not essential to read them – but I would still highly recommend it, because in
those posts we set up the two-ISP lab that we’re going to use today. We also
introduced a few concepts that we’ll be referring to again in this post, like the
difference between service and transport labels, and the concept of BGP AFI/SAFI.
In addition, you’ll definitely want to be familiar with the concept of BGP-Labeled
Unicast. If you’re not, don’t worry: I wrote a post all about it, especially so that you can
understand it! Go give it a read if you don’t know about it already.
If you’re already comfortable with LDP, RSVP, OSPF, IS-IS, BGP, BGP-LU, and address
families, and you just want to learn some sweet sweet Option C, then let’s gamble:
jump right in! Just know that if you find yourself feeling confused at any point, you
can go back and read the previous posts to get up to speed.

And if at any stage you find yourself feeling aroused, then don’t worry: it’s a natural
reaction to reading my posts.

HOW OPTION C WORKS: THE SHORT VERSION


Take a look at our topology below. Notice that there’s two route reflectors, one in
each ISP. There’s also one dedicated link connecting the two ISPs, between routers 4
and 5. Take half a minute to properly understand this topology. Notice that each
router has a loopback which is just the number of the router, so Router 8 is
loopback 8.8.8.8. The one exception is the route reflectors – notice that Reflector 2 is
22.22.22.22.

My advice to you: open this pic in a new tab, because we’ll be referring back to it
throughout the course of this post. Click the pic to make it big:

Juniper
Business
Use Only
Here’s how Option C works: the route reflectors in each ISP are actually going to talk
directly to each other to exchange VPN prefixes, each route reflector acting as a
client of the other. This is rare: usually route reflectors only reflect routes within
their own autonomous system. In this scenario though, we’re reflecting from one
ISP to the other. Dare to dream, friends! Dare to dream.

Immediately here, we see one of the reasons why Option C is more scalable. In
Option A, our ASBRs (Autonomous System Border Routers – Routers 4 and 5 in our
topology) needed to hold a ton of state in their memory, like all the prefixes for all
the VRFs, the VRFs themselves, plus BGP sessions. In Option B our ASBRs still
needed to hold a lot of state, remembering a label per-prefix per-VRF. However, in
Option C all of this state is moved to the PEs and the route reflectors, where the
state belongs. All our ASBRs need to be aware of is the fact that they’re part of a
transit label-switched path
The reason this is possible is thanks to the unique way our route-reflectors
advertise the next-hop for these VPN prefixes. In the pic above our PEs are Router 1
and Router 8. When Router 1 advertises a VPN prefix to its route reflector, the prefix

Juniper
Business
Use Only
has a next-hop of 1.1.1.1. Reflector 1 then advertises this to Reflector 2, at which
point Reflector 2 reflects it to Router 8 – still with a next-hop of 1.1.1.1!

In other words, Router 8 sees Router 1 – a router in a totally different autonomous


system – as the next-hop! The next-hop doesn’t change at any step of the way, even
though we’re crossing an autonomous system boundary. This is in deep contrast to
options A and B, where the border routers told the other ISP that the border router
itself was the next-hop.

Here we see another example of the scalability of Option C. Once again, we’re
relieving our border routers of keeping track of state, and keeping that
responsibility where it belongs, because the ASBRs don’t need know they’re the
next-hop for every single individual VPN prefix. As long as they know the LSP (label-
switched path) to put the traffic on, that’s all that’s needed.

Now, the sharp thinkers among you might be thinking: how does Router 8 resolve
the IP address of Router 1? Even if Router 1’s loopback (1.1.1.1) happened to be in
Router 8’s routing table, that’s not enough: for an MPLS VPN, running private IPs
over a provider network, we need a label-switched path from R8 all the way to R1.
How on earth does this work? How?!?
To understand that, we need to know about SAFI 4, otherwise known as BGP Labeled
Unicast. And it’s important that we understand it in detail. That’s why I’ve written an
entire post all about it! Click here to read my blog post on BGP Labeled-Unicast, on
Juniper routers. Once you understand what it is, how it works, the default behaviour,
and how to manipulate it, come back here. Don’t worry, I’ll wait. Take your time: I’ve
got lots of Super Mario to catch up on.
Done? Perfect. So: we have route reflectors talking directly to each other,
exchanging VPN prefixes, with the next-hop unchanged. Meanwhile, our PE routers
in each ISP have a full label-switched path to the PE in the other ISP, thanks to our
border routers talking BGP-LU with each other. Router 8 knows a label for Router 1.
If you still can’t quite picture how that works, don’t worry: later on we’ll be talking
about it in great detail.

RESETTING OUR LAB


In previous posts we set up our IGP, our iBGP, and our VRFs. This post picks up
where we left off, so we won’t go over all that again.

You’ll remember that we’re running IS-IS in ISP 1, and OSPF in ISP 2. I’m going to
change my lab a little bit, and just run LDP everywhere in both ISPs. This will help us
focus on what makes Option C unique. (As I mentioned earlier, in Part 3 we’ll be
bringing RSVP back, to see how it changes things.)
 

Juniper
Business
Use Only
TURNING ON OPTION C: OUR PLAN OF ACTION
One of the things we’re going to configure is an eBGP multi-hop peering directly
between Reflector 1 in ISP 1, and Reflector 2 in ISP 2. This peering will only talk the
VPN unicast family.

We’re also going to run BGP-Labeled Unicast between our ASBRs, Routers 4 and 5,
so that each ISP receives labeled routes to the PEs and route reflectors of the other
ISP. We’ll add a policy onto it, to make sure we’re only advertising what we need to
advertise.

How about the VPN route-targets for each VRF? It’s common for an ISP to use their
autonomous system in the target, so if we’ve got VPN prefixes coming from a
different AS, we need to give this some thought.

In our Option B post we learned how to re-write communities as routes pass from
one AS to another, using a policy. It’s not hard, but it involves a lot of lines of config.
Because we’ve already seen what that looks like, in this post we’ll just do a simple
solution, and import both ISP 1 and ISP 2’s target communities into each VRF.

Soon enough we’ll be ready to start configuring. But before we jump into it, let’s take
a look at what should happen when it’s all working.
 

THE EXPECTED RESULT: OPTION C’s THREE-


LABEL STACK
You’ll remember that the aim of all this is to give Router 1 a labeled path to 8.8.8.8,
and Router 8 a labeled path to 1.1.1.1.

Juniper
Business
Use Only
Here’s what’s going to happen: ASBR Router 5 will generate a label for 8.8.8.8 (the
loopback of Router 8), and pass this label to Router 4. R4 will then generate a new
label to get to Router 8 (when a router advertises itself as the next-hop, it always
generates a new label), and R4 will advertise this labeled route throughout ISP 1. As
such, Router 1 will receive it.

So, after all that, if PE Router 1 wants to send traffic to a VPN prefix with a next-hop
of 8.8.8.8, what will Router 1 actually do? Here’s where things get really interesting:
Router 1 will actually push THREE labels onto the packet:

• First, R1 pushes an inner VPN label, useful only to R8, to identify the VPN itself.
• Next, R1 pushes a middle label that won’t be processed until we get to R4. This is the
label that tells R4 how to get to R8, via R5. We’ll talk about this label more in a
moment.
• Finally, a top outer transport label, which identifies the label-switched path from R1 to
R4.
Crikey! Cor blimey gov’nor! Apples and pears! That’s a lot of labels. Again, don’t
worry if you can’t visualise that, because once we’ve configured it all and got it
working we’ll move over to Part 2 in this series, where we’ll look at a diagram that
shows exactly how the label stack works, end-to-end.
 

ON WHICH ROUTER SHOULD WE REDISTRIBUTE


OUR LOOPBACKS INTO BGP?
Now, which step should we configure first? The BGP-LU peering between the two
border routers, or the BGP inet-vpn unicast peering between the route reflectors?

Well, we can’t set up the reflector peering just yet, because our route reflectors don’t
have routes to each other. They’ll only know about each other’s loopback when the
BGP-LU peering comes up.

As such, t he very first


thing we’ll do is turn on BGP between our ASBRs – Routers 4 and 5. Remember, in this
example we’re specifically going to turn on only BGP-Labeled Unicast, because this link is
dedicated to VPN transit traffic. We have no need for regular public BGP IPv4 unicast
(AFI 1/SAFI 1) here.

Juniper
Business
Use Only
To make this nice and clean, we’re going to configure a policy on R4 which takes the
relevant loopbacks in ISP 1, and sends them to R5 via eBGP. Labeled, naturally! Without
this policy, there’s a danger that Router 4 would advertise everything via BGP to Router
5. And of course, we’ll do the same on R5, making a policy to advertise ISP 2’s important
loopbacks as labeled prefixes.
Now, it may not be immediately clear why we’re choosing to redistribute these
loopbacks into BGP at our ASBRs in particular. For example, if Router 1 is talking
BGP to the rest of ISP 1, why don’t we redistribute 1.1.1.1 into BGP on Router 1
itself? Why not redistribute it at the source of the prefix?

The answer comes in the behaviour of route reflectors. Let’s imagine that we turned
BGP labeled-unicast on everywhere in ISP 1, and that we added a policy on Router 1
itself to redistribute its loopback into iBGP. Here’s the result: R1 sends its loopback
address, 1.1.1.1, via iBGP to its route reflector. As such, our plucky route reflector
will indeed receive the prefix, with a label:

root@Reflector1> show route receive-protocol bgp 1.1.1.1 1.1.1.1/32 extensive

inet.0: 13 destinations, 18 routes (13 active, 0 holddown, 0 hidden)


1.1.1.1/32 (3 entries, 2 announced)
Accepted
Route Label: 3
Nexthop: 1.1.1.1
Localpref: 100
AS path: I

(The command above can look confusing with the two 1.1.1.1 entries. The red one is
the address of Reflector 1’s BGP neighbour. The blue one is the actual route we’re looking up.
By coincidence, in this example the two addresses are the same!)
But look o’er yonder – Reflector 1 isn’t actually reflecting this route on to Router 4:

root@Reflector1> show route advertising-protocol bgp 4.4.4.4 1.1.1.1/32 extensive

root@Reflector1>

Why? For our answer, let’s take a look at 1.1.1.1/32 in Reflector 1’s inet.0 table.

root@Reflector1> show route table inet.0 1.1.1.1/32

inet.0: 13 destinations, 14 routes (13 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

1.1.1.1/32 *[IS-IS/18] 00:00:04, metric 20


> to 10.10.112.2 via ge-0/0/2.0
[BGP/170] 00:05:27, localpref 100, from 1.1.1.1
AS path: I
> to 10.10.112.2 via ge-0/0/2.0, Push 299776

Juniper
Business
Use Only
It’s in there twice – once as an IS-IS route, and once as a BGP route. Reflector 1 has
rightly chosen the IS-IS route as the route that it thinks is best…. and because the
BGP route isn’t the most preferred route, it isn’t reflected to Router 4.

Remember, reflectors don’t just take every prefix they get, and reflect the whole lot.
They actually go through the BGP path selection process themselves, and only select
the winners. And as an extensive output shows, this route was not the winner:

root@Reflector1> show route table inet.0 1.1.1.1/32 protocol bgp extensive | match Inactive
           Inactive reason: Route Preference

There’s ways around it, of course – but in our lab, we’re going to put the loopback
into BGP at the edge of our network. That way, we can be very specific about which
labeled unicast prefixes we choose to pass to our friends in the other ISP.

CONFIGURING OPTION C – ADVERTISING


LABELED-UNICAST PREFIXES TO THE OTHER ISP
Let’s start configuring! Remember to have the topology handy in a different tab, so
you can refer to it as we talk about all the different routers.

First we’re going to turn on BGP-LU between our two ISPs, at ASBRs Router 4 and
Router 5. I’m also going to apply a policy to export only the loopbacks of our PE
routers, and the loopbacks of our route reflectors. That’s quite a mouthful, so let’s
look at some config to see what’s going on. Here’s the BGP config on Router 4:

set protocols bgp group TO_AS64513 type external


set protocols bgp group TO_AS64513 family inet labeled-unicast rib inet.3
set protocols bgp group TO_AS64513 export R1_LOOPBACK_IN_LABELED_UNICAST
set protocols bgp group TO_AS64513 peer-as 64513
set protocols bgp group TO_AS64513 neighbor 10.10.45.5

You’ll remember from my post on BGP-LU that we often choose to put the prefixes


into inet.3. As for the policy, here it is in hierarchy format so it’s a little easier to
read:

root@Router4> show configuration policy-options policy-statement


R1_LOOPBACK_IN_LABELED_UNICAST

term ACCEPT_R1_LOOPBACK {
    from {
        route-filter 1.1.1.1/32 exact;
        route-filter 11.11.11.11/32 exact
  }
    then accept;
}

Juniper
Business
Use Only
term ELSE_REJECT {
    then reject;
}

We’ve also added the equivalent config on Router 5.

Did it work? Let’s see what ISP 1’s ASBR (R4) is learning from ISP 2’s ASBR (R5):

root@Router4> show route receive-protocol bgp 10.10.45.5 extensive

inet.0: 15 destinations, 15 routes (15 active, 0 holddown, 0 hidden)

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)


* 8.8.8.8/32 (1 entry, 1 announced)
     Accepted
     Route Label: 300016
     Nexthop: 10.10.45.5
     MED: 1
     AS path: 64513 I

* 22.22.22.22/32 (1 entry, 1 announced)


     Accepted
     Route Label: 300032
     Nexthop: 10.10.45.5
     MED: 1
     AS path: 64513 I

Great! R5 is saying to R4 “If you want to get to R8, come to me, and put label 300016
on the packet.” And notice that these labeled prefixes are in inet.3, right where we
want them.

We also see that Router 4 has learned 22.22.22.22, which is the address of Reflector
2, in ISP 2. Router 4 will now advertise this to the rest of ISP 1, which means that the
peering between our two route reflectors will have no problems – right? We’ll find
out in a moment.

CONFIGURING OPTION C – ADVERTISING


LABELED-UNICAST PREFIXES THROUGHOUT OUR
ISPs
So, Routers 4 and 5 have successfully swapped labeled prefixes. It’s time for Routers
4 and 5 to now re-advertise these prefixes through their respective ISPs. To achieve
this, we need to enable “family inet labeled-unicast” on our iBGP peerings within
each ISP.
We were already running unicast BGP on Router 1. Let’s add this line on too:

Juniper
Business
Use Only
set protocols bgp group AS64512 family inet labeled-unicast rib inet.3

We add the equivalent lines on RR1, R4, R5, RR2 and R8. Our iBGP sessions go down,
and come back up. And after a while, do we see 8.8.8.8 in Router 1’s table?

root@Router1> show route 8.8.8.8

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

8.8.8.8/32         *[BGP/170] 02:00:00, MED 1, localpref 100, from 11.11.11.11


                      AS path: 64513 I
                    > to 10.10.12.2 via ge-0/0/0.0, Push 299952, Push 299808(top)

Indeed we do! Can we ping it?

root@Router1> ping 8.8.8.8 source 1.1.1.1


PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

Now that’s interesting. We’re learning it, but we can’t ping it. Can you see why?

Ten points if you spotted it: it’s because the prefix is in the inet.3 table, not inet.0.
Router 1 will be using the entry in inet.3 to resolve its BGP next-hops. But if we try to
ping 8.8.8.8 directly, the lookup happens in inet.0 – and because the prefix isn’t in
inet.0, the ping fails.

In this situation, it’s not a big deal. Router 1 has no need for 8.8.8.8 to be in inet.0,
other than for testing general reachability, but that’s just a nice-to-have, not an
essential. But as we’re about to see, this lack of an entry in inet.0 is going to cause
some big problems for our route reflectors…

CONFIGURING OPTION C – ADVERTISING THE


VPN PREFIXES, VIA ROUTE REFLECTORS
A lot of the Option C examples you’ll find on the internet seem to peer directly
between PE routers in each ISP. Of course, in the real world we’re much more likely
to use route reflectors. So, let’s do that here!

First of all, I’d like you to see the BGP config that Reflector 1 is using to talk with the
other routers in its own autonomous system, Routers 1 and 4. It’s nothing special:
we’re just talking unicast, VPN unicast, and labeled-unicast (in inet.3). We’ve also got

Juniper
Business
Use Only
a cluster ID, which tells our Juniper router that it’s to act as as a route reflector to its
peers:

set protocols bgp group AS64512 type internal


set protocols bgp group AS64512 local-address 11.11.11.11
set protocols bgp group AS64512 family inet unicast
set protocols bgp group AS64512 family inet labeled-unicast rib inet.3
set protocols bgp group AS64512 family inet-vpn unicast
set protocols bgp group AS64512 cluster 11.11.11.11
set protocols bgp group AS64512 neighbor 1.1.1.1
set protocols bgp group AS64512 neighbor 4.4.4.4

In a moment we’re going to add an important line to that config.

For now, let’s build a brand new BGP peering, so Reflector 1 can to peer with
Reflector 2 in ISP 2. Notice in the config below that it’s an external peering, which
you don’t often see on route reflectors! Notice as well that we’re only talking inet-vpn
unicast, because that’s the only family we need!
Here’s the config on Reflector 1:

set protocols bgp group TO_AS64513_ROUTER22 type external


set protocols bgp group TO_AS64513_ROUTER22 multihop ttl 5
set protocols bgp group TO_AS64513_ROUTER22 local-address 11.11.11.11
set protocols bgp group TO_AS64513_ROUTER22 family inet-vpn unicast
set protocols bgp group TO_AS64513_ROUTER22 peer-as 64513
set protocols bgp group TO_AS64513_ROUTER22 neighbor 22.22.22.22

We’re turning on BGP multihop, because we’re now doing eBGP. Why “multihop 5”
in particular though? Because it’s precisely the number of hops between the two
reflectors!

Shall we see if it worked?

root@Reflector1> show bgp summary | match 22


22.22.22.22 64513 27 28 0 5 20 Active

…oh. That’s a shame: our BGP didn’t come up. How come?

Do you know: now I think about it, we never actually checked if Reflector 1 even has
a route to Reflector 2! Let’s find out:

root@Reflector1> show route 22.22.22.22

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

22.22.22.22/32 *[BGP/170] 02:23:04, MED 1, localpref 100, from 4.4.4.4

Juniper
Business
Use Only
AS path: 64513 I
> to 10.10.113.3 via ge-0/0/3.0, Push 299968, Push 299792(top)

Well, it does… but do you see the problem?

Twenty points if you spotted it: this prefix is in inet.3. In fact, it’s exactly the same
problem as when we tried to do that ping earlier. For this BGP peering to come up,
22.22.22.22 needs to be in Reflector 1’s inet.0 table – but we configured our router
to put BGP-LU prefixes in inet.3.

Luckily, I know how to fix it. And guess what: I’ll tell you the answer for free! I’ll show
you the config first, then I’ll explain what it does. But do you promise you’ll keep it a
secret? Even if the FBI asks you really nicely? Yes? Good.

In short, Reflector 1 is going to take the labeled prefix for Reflector 2, and copy it
into its inet.0 table. First, we add in this line to Reflector 1’s iBGP config with the rest
of its own AS (because this is the peering that it’s actually learning 22.22.22.22 from):

set protocols bgp group AS64512 family inet labeled-unicast rib-group RR2_INTO_INET0

Then, we also add these lines:

set routing-options rib-groups RR2_INTO_INET0 import-rib inet.3


set routing-options rib-groups RR2_INTO_INET0 import-rib inet.0
set routing-options rib-groups RR2_INTO_INET0 import-policy RR2_LOOPBACK

set policy-options policy-statement RR2_LOOPBACK term RR from route-filter 22.22.22.22/32 exact


set policy-options policy-statement RR2_LOOPBACK term RR then accept
set policy-options policy-statement RR2_LOOPBACK term ELSE_REJECT then reject

So, what exactly did we just do?

The main moving part here is the rib-group, which are possibly one of the most
misunderstood elements in all of Junos. Essentially, the import-rib command allows
us to take prefixes that would usually be put into one routing table, and additionally
import them into another routing table. The first table is where you’d usually find
the prefixes, and then any table listed after that is where we’re importing them to.
Essentially we made a rib-group, but we also added a policy on the group so that
only 22.22.22.22/32 actually gets imported. Here’s what the same config looks like in
hierarchy format, so it’s a little easier to read. When it’s like this, it’s easier to see
that inet.3 is the first group, and inet.0 is the secondary group.

root@Reflector1> show configuration routing-options


rib-groups {

Juniper
Business
Use Only
RR2_INTO_INET0 {
import-rib [ inet.3 inet.0 ];
import-policy RR2_LOOPBACK;
}
}
autonomous-system 64512;

With this config, we now see 22.22.22.22 in both tables:

root@Reflector1> show route 22.22.22.22

inet.0: 13 destinations, 17 routes (13 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

22.22.22.22/32 *[BGP/170] 00:15:34, MED 1, localpref 100, from 4.4.4.4


AS path: 64513 I
> to 10.10.113.3 via ge-0/0/3.0, Push 299968, Push 299792(top)

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

22.22.22.22/32 *[BGP/170] 02:41:16, MED 1, localpref 100, from 4.4.4.4


AS path: 64513 I
> to 10.10.113.3 via ge-0/0/3.0, Push 299968, Push 299792(top)

And as such, our BGP comes up:

root@Reflector1> show bgp summary | find 22


22.22.22.22 64513 66 68 0 5 15:59 Establ
bgp.l3vpn.0: 4/4/4/0

Hooray! And would you look at that: Reflector 1 has learned four prefixes! Could
they be the BGP prefixes from ISP 2? Indeed they could:

root@Reflector1> show route receive-protocol bgp 22.22.22.22 table bgp.l3vpn.0

bgp.l3vpn.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)


Prefix Nexthop MED Lclpref AS path
8.8.8.8:1:172.16.20.0/30
* 8.8.8.8 64513 I
8.8.8.8:1:192.168.20.0/24
* 8.8.8.8 64513 I
8.8.8.8:2:172.16.20.4/30
* 8.8.8.8 64513 I
8.8.8.8:2:192.168.20.0/24
* 8.8.8.8 64513 I

We did it! We did it together, you and me! Take that, granddad!!

Juniper
Business
Use Only
CONFIGURING OPTION C – THE VRF TARGET
POLICIES
Each ISP has their own unique route-target that represents a customer. This is a
problem when we’ve got one customer split over two ISPs. So, the final step is to get
Router 1 and Router 8 to know what to do with each other’s VPN prefixes.

You’ll remember that we said we weren’t going to do anything complicated like re-
writing route-targets as they cross our AS boundary. Instead, we’re just going to tell
each PE to import not only the ISP’s own route-target for that customer, but the
other ISP’s target for that customer too. Let’s see what this config looks like on
Router 1, for one of our two customers, Barry’s Ice Creams (a company that I
desperately wish existed):

root@Router1> show configuration policy-options

[...]
policy-statement TARGET_IMPORT_BARRYS_ICE_CREAM {
    term ACCEPT_CORRECT_COMMUNITY {
        from community [ TARGET_AS64512_BARRYS_ICE_CREAM
TARGET_AS64513_BARRYS_ICE_CREAM ];
        then accept;
  }
    term REJECT {
        then reject;
  }
}

community TARGET_AS64512_BARRYS_ICE_CREAM members target:64512:1;


community TARGET_AS64513_BARRYS_ICE_CREAM members target:64513:1;

Gosh, it’s a thing of beauty! Would you like to marry it? Well, you can’t. Don’t even
think about it.

DOES IT WORK YET?


We’ve now added three main elements:

— First of all, we’ve turned on labeled-unicast around the place, which has
ultimately given us one conceptual full label-switched path between two PEs in
different ISPs, made up of three separately-signalled label-switched paths: R1 to R4,
R4 to R5, and R5 to R8.

— Second, now that Reflectors 1 and 2 know how to get to each other’s loopbacks,
we set up a BGP peering between the two, with the inet-vpn unicast family.

— Finally, we’ve set up our PEs to be aware of the target communities used by the
other ISP, and to import them in to the relevant VRFs.

Juniper
Business
Use Only
So now, everything should work, right? Let’s take a look at VRF on Router 1, and see
if it knows the 192.168.20.0/24 prefix from ISP 2:

root@Router1> show route table BARRYS_ICE_CREAM.inet.0

BARRYS_ICE_CREAM.inet.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

172.16.10.0/30     *[Direct/0] 04:11:38


                    > via ge-0/0/1.0
172.16.10.1/32     *[Local/0] 04:12:02
                      Local via ge-0/0/1.0
172.16.20.0/30     *[BGP/170] 00:10:15, localpref 100, from 11.11.11.11
                      AS path: 64513 I
                    > to 10.10.12.2 via ge-0/0/0.0, Push 299872, Push 299984, Push 299808(top)
192.168.10.0/24    *[Static/5] 04:11:38
                    > to 172.16.10.2 via ge-0/0/1.0
192.168.20.0/24    *[BGP/170] 00:10:15, localpref 100, from 11.11.11.11
                      AS path: 64513 I
                    > to 10.10.12.2 via ge-0/0/0.0, Push 299872, Push 299984, Push 299808(top)

Success! And just as a final test, to be sure – let’s get CPE 1 to ping CPE 2:

CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 116/132/188 ms

Perfect! Time to go to the pub? You betcha!

DOWNLOAD THE FULL TOPOLOGY CONFIGS!


As always, you can click here to get the full configurations of all ten Juniper routers in this
lab, so you can try it out for yourself. Happy birthday! Pop it on routers in your own
lab, take stuff out, add stuff in, and see what happens!
 

THAT’S IT!
If you’ve read this far, you’re an absolute hero. But I’m sure you’re itching for that
explanation I promised you of the labels involved in Option C? If so, click here for
Part 2 of this mighty deep-dive into Option C!
(Now you’ve come this far, I bet you can see why I split this post up into chunks. I
feel like I’ve written an entire book!)

As always, if you enjoyed this post, I’d love you to share it on your favourite social
media of choice. The more people read my blog, the more it inspires me to make
even more posts. So, if you want more posts like this, share it far and wide!

Juniper
Business
Use Only
And hey: if you’re on Twitter, give me a follow! I’lll tell you when I make new posts, plus
I’ll occasionally share what some people (me) have described as “the very best
opinions in the entire networking industry”. Wow, high praise indeed (from me)!

INTERPROVIDER OPTION C, ON JUNIPER


JUNOS ROUTERS – PART 2: THE THREE
LABEL STACK, AND THE USE CASE VS
OPTION B
 July 10, 2019 3503 Views  0 Comments
In Part 1 of our journey into Interprovider Option C, we learned how to configure
Juniper routers to extend an MPLS VPN between two autonomous systems. Along
the way we made a few friends, broke a few hearts, and quite drastically increased
the rate at which carbon is warming up the planet. Whoopsie-daisy!!
Anyway, in Part 1 we talked in detail about how Option C works, and how to
configure it. We even pinged from one end to the other, to verify that it works. But
there’s three things we didn’t do.

First, we didn’t talk about why we’d ever choose to use Option C, over Options A or
B. Second, we only skimmed over this idea of the three-label stack. And third, we
didn’t phone the people who mean the most to us, and tell them that we love them.

So, in this post we’re going to do all those things, and more. Along the way, we’ll be
asking ourselves such questions as “What do labels taste like?”, “What is the
meaning of life?”, and “If you put a plane on a massive conveyor belt, would it ever
take off?”

…Sorry, I just re-read my notes, and actually we won’t be answering any of those
questions. Apologies for the confusion there. This blog post is purely about MPLS,
not the meaning of life. My mistake. Whoopsie-daisy!!

FOLLOWING THE LABELS, END TO END


Let’s remind ourselves of our topology again. As before, I recommend you give this
picture a nice firm “click”, and open it in a new tab, because we’ll be referring back to
it a lot.

Juniper
Business
Use Only
Okay, time to jump into the forwarding plane. If a host behind CPE_BARRY_1 (let’s
say 192.168.10.5) sent a ping to a host behind CPE_BARRY_2 (let’s say 192.168.20.5),
what would happen?

The packet leaves the source, and gets sent to its gateway, which is CPE_BARRY_1.
This CPE router passes the packet to Router 1, at which point it enters the MPLS
VPN. The packet gets label-switched to Router 4, crosses an AS boundary to Router
5, transits an MPLS network in ISP 2, reaches Router 8, at which point it gets passed
to CPE_BARRY_2 – and then, finally, the packet arrives at its destination.

The full path is like this: CPE 1 > Router 1 > Router 2 > Router 3 > Router 4 > Router 5
> Router 6 > Router 7 > Router 8 > CPE 2.

What does Router 1 actually do when it receives the packet? First it notices that the
packet came in on an interface that’s in a VRF, so it knows to look up the prefix in
the VRF’s dedicated routing table. In doing so, Router 1 sees that 192.168.20.0/24
has a next-hop of 8.8.8.8 – and that it needs to add three labels before sending the
packet on its merry way. In just a moment we’re going to look at a picture of all
these labels. For now, follow us on the journey from end to end.

root@Router1> show route table BARRYS_ICE_CREAM.inet.0 192.168.20.0/24

BARRYS_ICE_CREAM.inet.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

192.168.20.0/24    *[BGP/170] 00:02:10, localpref 100, from 11.11.11.11


                      AS path: 64513 I
                    > to 10.10.12.2 via ge-0/0/0.0, Push 299936, Push 299984, Push 299808(top)

The inner label (299936) is the VPN label that Router 8 is advertising for this VRF.
Nothing unusual there! This label only has meaning to Router 8. If any other router

Juniper
Business
Use Only
tries interpreting this label, it will be meaningless. This label is also known as
the service label.
The outer label (299808) is the transport label that ISP 1’s LDP instance generated for
getting to Router 4. There’s also nothing unusual here: this is the usual outer
transport label we’d find when going between two routers within a single ISP.
The middle label (299984) is where things get interesting.

This middle label has a meaning only to Router 4. When the packet is sent by Router
3 to Router 4, Router 3 pops the outer transport label, as per the usual penultimate-
hop-popping process. As a result, the label stack, which previously had three labels,
now only has two.

This means that the label which was previously the “middle” label has been
promoted to being the outer transport label! And, because it’s the outer label, it’s
precisely this label that Router 4 processes. Router 4 knows that if it receives a
packet with an outer label of 299984, it should swap it for label 300016 and pass it
to Router 5:

root@Router4> show route table mpls.0 label 299984

mpls.0: 13 destinations, 13 routes (13 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

299984             *[VPN/170] 00:18:42


                    > to 10.10.45.5 via ge-0/0/3.0, Swap 300016

And when Router 5 receives this packet, with an outer label of 300016, it knows to
put the packet into the LSP towards Router 8:

root@Router5> show route table mpls.0 label 300016

mpls.0: 13 destinations, 13 routes (13 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

300016             *[VPN/170] 03:07:23


                    > to 10.10.56.6 via ge-0/0/2.0, Swap 299872

From here, it’s business as usual. The packet goes down Router 5’s LSP to Router 8,
and arrives at R8 with just one label (because of penultimate-hop popping). Router 8
looks up this label, sees it belongs to a VRF, pops the label, and passes the packet
straight down the relevant AC (attachment circuit – basically a fancy word for the WAN
link). This is the default behaviour in Junos. No need to inspect the destination IP –
the label itself tells our PE router which WAN link to pass the packet down.

root@Router8> show route table mpls.0 label 299936

Juniper
Business
Use Only
mpls.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

299936             *[VPN/170] 02:32:59


                    > to 172.16.20.2 via ge-0/0/1.0, Pop

As a result of all this magic, CPE1 can ping CPE 2!

CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!

THE LABELS, VISUALISED


Here’s what these labels look like visually. Notice the bottom VPN label stays the
same, end-to-end. Also, notice how the label the starts as the “middle” label
becomes the “outer” transport label as it enters ISP 2!

Finally, notice that between R5>R6 and R6>R7, the label is actually the same. This is
a complete coincidence. This happened because my lab was extremely simple, so
there’s not many prefixes that need labels. Make no mistake though: in effect, R6 is
“swapping” a label for the same label!

By visualising this, we again get to understand why Option C is so scalable: Router 1


has one single transport label to get to Router 8, which it uses for every single VPN
prefix it knows. As far as Router 1 is concerned, the middle label it pushed to show
that the packet is ultimately destined for Router 8, label 299984, is used even if
there’s a hundred different MPLS VPNs being shared between the ISPs.
Let’s see this in practice. Router 1 has learned four VPN prefixes from Router 8.
Notice how the top two labels (the middle one, and the one on the right) are the
same for every single prefix. Only the left label changes, per-CPE. If that bit is
confusing to you, hold fire, because in the next section we’re going to talk about
exactly that concept.

Juniper
Business
Use Only
root@Router1> show route table bgp.l3vpn.0 extensive | match Label
                Label operation: Push 299936, Push 300320, Push 299808(top)
                Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
                VPN Label: 299936
                Label operation: Push 299936, Push 300320, Push 299808(top)
                Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
                VPN Label: 299936
                Label operation: Push 300032, Push 300320, Push 299808(top)
                Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
                VPN Label: 300032
                Label operation: Push 300032, Push 300320, Push 299808(top)
                Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
                VPN Label: 300032

In Option A we needed a VRF/sub-interface/BGP peering for each VPN. In Option B


we only needed the one BGP session, but our ASBRs (Routers 4 and 5) still needed a
label for every prefix in every VPN. By contrast, our ASBR in Option C just advertises
the one label, representing the loopback of the PE. Any information about the state
of the VPN is kept at the edges – right where it should be!

WHEN WOULD WE USE OPTION C?


In this lab we’ve been pretending that each AS is a different ISP. However, due to the
levels of trust involved in this setup, in practice Option C is mostly used within one
single ISP that has multiple autonomous systems. Perhaps they have different AS
numbers for different parts of the world, or perhaps they’ve just acquired or
merged with another ISP. In either case, there really has to be an excellent trust
relationship between the two “organisations” to extend MPLS in this way.
If ever you have a situation where you organisation controls both autonomous
systems, choose either Option B or Option C. Option B is simpler, but puts more
load on your AS border routers. Option C is more complicated to set up, but keeps
the state where it belongs: on the PE routers and route reflectors.

Perhaps it still isn’t clear to you why it’s important to have control over both
autonomous system. After all, it’s only labels. It’s not like we’re extending OSPF or IS-
IS between the two. So, let’s take a look at the kind of action that the admins of one
autonomous system could take which might severely impact the other AS.

THE DANGERS OF EXTENDING SOMEONE ELSE’S


MPLS INTO YOUR OWN NETWORK
Picture, in your beautiful mind, three prefixes on two CPE routers. By default, a
Juniper PE router generates only two VPN labels for these three prefixes: one label
for each CPE router (to be precise, for each next-hop from the perspective of the
PE). Or per-link, to be precise.

Juniper
Business
Use Only
Let’s imagine that the two labels our PE router generates are 69 (nice), and 420.

 10.10.10.0/24 on CPE 1 gets label 69 (nice).


 10.10.20.0/24 on CPE 2 gets label 420.
 And 10.10.30.0/24 on CPE 2 also gets label 420 – because it’s on the same CPE.
Thanks to this behaviour, our Juniper router can receive a packet with label 420, and
know immediately that the packet is destined to CPE 2, regardless of which of the
two subnets the packet is ultimately destined for. This is a nice and efficient way of
passing the traffic to the right place.

What if we turn on vrf-table-label? With this command, our Juniper router instead


generates one label for all three of these prefixes – in other words, one label for the
entire VRF.
Let’s imagine our router generates label 666 for all three prefixes. This means that
when the PE router receives a packet with label 666, the PE has to pop the label, and
do a second lookup on the IP header to see where the packet is ultimately destined
to. It’s more work – but also allows you to implement firewall rules on the PE,
because you’re forcing the PE to look at the IP header. There may also be QoS
settings in the IP packet, which the PE might previously have not looked at.

So, imagine that ISP 2 is running vrf-table-label. Imagine as well that there were 100
MPLS VPNs shared between ISPs 1 and 2, and that in total ISP 2 hosted around
30,000 VPN prefixes. Thanks to vrf-table-label, let’s imagine we have something like
400 labels to represent all of these prefixes. PE1 can easily handle this many.

Now imagine that for some reason, ISP 2 turns off vrf-table-label. Suddenly, the PEs
in ISP 1 are going to receive a massive increase in the number of labels that ISP 2
sends, because instead of one label per-VRF per-PE, it’s one label per-VRF per-CPE!
That’s a BIG increase in state. Suddenly we could go from 400 labels to many
thousands of labels. I say thousands: that’s a complete guess. I’ve not done the
maths, and I’m not going to. You get my point though: you better cross your fingers
that the routers in ISP 1 have enough memory for the sudden increase in labels.

If the other autonomous system really is owned by someone else, you have no
control over whether they choose to do this. Actions by the other autonomous
system could send a lot of new state to your PE router – and depending on what
you’re getting that box to do, this could well be a problem!

This is why Option C is suited to situations where you’re the admin of both domains.
Otherwise, the admins of ISP 1 might retaliate by doing something really nasty. You
know: like swearing through ISP 2’s letterbox, or taking them off their Christmas
Card list. Noooo, it’s too sad to even think about!!

THAT’S IT!

Juniper
Business
Use Only
Except not really, because of course, this post is part 2 of three. And if you’ve read
this far, I’m absolutely positive that you’re going to want the complete story.

Therefore, it is your mission, neigh your DUTY, to click here and read Part 3, where we
first introduce RSVP into our network, and then we configure BGP-LU in a slightly
different way. Will it take out MPLS VPNs down? Yep? Will we fix it? Click to find out…
In the mean time, you’ll do my the truest honour if you share this post on your
favourite social media of choice. And of course, if you enjoy my nonsense then I’d
absolutely love it if you followed me on Twitter or LinkedIn, where I’ll share any
more posts I make in the future. That’s right, Twitter and LinkedIn: the two literal
worst websites on the entire internet!

INTERPROVIDER OPTION C, ON JUNIPER


JUNOS ROUTERS – PART 3: USING RSVP,
AND PUTTING BGP-LU INTO INET.0
 July 10, 2019 3642 Views  2 Comments
There’s been a lot of incredible Part 3’s in history. To name just a few: Back To The
Future, Part 3. Alvin and the Chipmunks: Chipwrecked. And of course, the greatest
movie in all of cinema history, Shawshank Redemption Part 3: Tokyo Drift.

With that in mind, it’s a true honour for this blog post to be listed (by me) alongside
the greatest trilogies of all time, as we embark on the third and final part of this
series on using Interprovider Option C to extend MPLS VPNs between two ISPs.

In Part 1 we learned how to do a basic config, and in Part 2 we looked at the actual
use case for Option C, as well as taking a detailed look at how the labels work end-
to-end. To keep things simple, we ran LDP everywhere, and we also put all our BGP-
Labeled Unicast routes into inet.3.
In this third and final post, we’re going to learn some things to be aware of in
different configurations. In particular, we’re going to turn on RSVP in ISP 2, and we’re
going to put our BGP-LU prefixes in inet.0. We’ll see how it breaks our setup, and
then we’ll look at the configuration we need to get it working again.

Then afterwards, when we’re all done… we could go for a meal maybe? Or just a
drink? Or we could go catch a movie? …what? You’re busy? Oh okay, no worries,
never mind. No it’s cool, it was only an idea anyway, I don’t mind. No worries! No
worries.

RECONFIGURING ISP 2 – TURNING ON RSVP

Juniper
Business
Use Only
Let’s remind ourselves of our topology. As always, I recommend opening this pic up
in a new tab, as we’re going to be referring back to it a lot:

First, let’s remove all LDP config from ISP 2. We add this line to all routers in ISP 2:

delete protocols ldp

Next, I’m going to turn on RSVP, and create two label-switched paths. On Router 5,
I’m making a path to Router 8. And on Router 8, I’m making a path back to Router 5.
Here’s some new config on Router 5:

set protocols rsvp interface ge-0/0/2.0


set protocols mpls label-switched-path PE5_to_PE8 to 8.8.8.8
set protocols mpls label-switched-path PE5-to-RR2 to 22.22.22.22
set protocols ospf traffic-engineering

The last line is important: in IS-IS, the traffic-engineering extensions for RSVP are
turned on by default in Junos, so RSVP’s Constrained Shortest Path First (CSPF)
algorithm can run with no problems. In OSPF, we need to add that command to
every router in ISP 2.
We add the equivalent config on Router 8. We also turn on RSVP and MPLS on the
relevant infrastructure interfaces on Routers 6 and 7. Notice that I haven’t turned on
RSVP on ISP 2’s route reflector, Reflector 2.

Now, in previous posts on this blog we’ve talked about how it’s common to not turn
on MPLS on your route reflectors, when your route reflectors aren’t also acting as
transit routers. After all, why do they need label-switched paths to routers that
they’re not sending ingress or transit label-switched traffic to? All the route reflector
is doing is reflecting prefixes.

Juniper
Business
Use Only
With that in mind, in the past we’ve used some commands to make the router
*think* it can successfully resolve the next-hop of VPN prefixes, just to trick it into
actually reflecting the prefixes. In this story though, things are a little bit different.

Imagine we turn on RSVP everywhere apart from our route reflector. Now,
remember that ISP 2 has learned about 11.11.11.11, ISP 1’s route reflector, via BGP
Labeled-Unicast. With that in mind, does the BGP peering between the two
reflectors come up?

root@Reflector2> show bgp summary | match 11.11.11.11


11.11.11.11           64512         24         37       0       2        4:24 Active

Nope! How come? We still have a route to 11.11.11.11, don’t we?

root@Reflector2> show route 11.11.11.11


root@Reflector2>

Aah! That’s interesting. So we had a route when we were talking LDP everywhere,
but not when we’re only talking RSVP between our PEs, and any router in between.
Let’s see if the route is being learned and ignored.

root@Reflector2> show route receive-protocol bgp 5.5.5.5 11.11.11.11 hidden extensive

inet.0: 14 destinations, 14 routes (13 active, 0 holddown, 1 hidden)


  11.11.11.11/32 (1 entry, 0 announced)
     Accepted
     Route Label: 299872
     Nexthop: 5.5.5.5
     MED: 1
     Localpref: 100
     AS path: 64512 I
{snip}

(A note about this output: you’ll remember that we added a policy on our route reflector to copy
our 11.11.11.11 BGP-LU route from inet.3 into inet.0. When typing the command above, and the
ones below, we also get identical output for the inet.3 table. I’ve removed it to keep the post
short.)
Now that IS odd. It’s hidden – and yet, it’s saying that it’s “accepted”! Not particularly
helpful. Perhaps our routing table will tell us more?

root@Reflector2> show route 11.11.11.11 hidden extensive

inet.0: 14 destinations, 14 routes (13 active, 0 holddown, 1 hidden)

11.11.11.11/32 (1 entry, 0 announced)


         BGP    Preference: 170/-101
                Next hop type: Unusable

Juniper
Business
Use Only
                {snip}
                Primary Routing Table inet.3
                Indirect next hops: 1
                        Protocol next hop: 5.5.5.5
                        Push 299872
                        Indirect next hop: 0 -

Hmm. Still not very explicit. But that label is a clue.

Router 5 is telling Reflector 2 “to get to 11.11.11.11, come to me and add label
299872 to the packet” Can you see the problem? Reflector 2 needs to be able to add
at least one more transport label to this packet. Otherwise, if Reflector 2 just added
this one label to the packet, and passed it to the next hop (Router 6), then Router 6
would have no idea what to do with it! Label 299872 only has meaning to Router 5.

There’s another command that can help us see the problem:

root@Reflector2> show route resolution unresolved


Tree Index 1
8.8.8.8:1:192.168.20.0/88
Protocol Nexthop: 8.8.8.8 Push 299840
Indirect nexthop: 2 no-forward
1.1.1.1/32
Protocol Nexthop: 5.5.5.5 Push 299856
Indirect nexthop: 0 -
11.11.11.11/32
Protocol Nexthop: 5.5.5.5 Push 299872
Indirect nexthop: 0 -
{snip}

The fact that we don’t have an indirect next-hop means that we can’t resolve 5.5.5.5
in a labeled way. All we have is an unlabeled path:

root@Reflector2> show route 5.5.5.5

inet.0: 14 destinations, 14 routes (13 active, 0 holddown, 1 hidden)


+ = Active Route, - = Last Active, * = Both

5.5.5.5/32         *[OSPF/10] 00:15:04, metric 2


                    > to 10.10.226.6 via ge-0/0/3.0

All of this is a very long way of saying: if you’re doing Option C, and you’re using
RSVP, it’s essential that you have label-switched paths on your route reflectors, even
if the reflectors are outside of the MPLS transit path. Otherwise, they won’t be able
to get to the route reflector in the other ISP.

There’s a few ways we could fix this, but because we’re in a simple lab, let’s just
make RSVP LSPs between routers 5 and 8. We add this config to Reflector 2:

Juniper
Business
Use Only
set protocols rsvp interface ge-0/0/2.0
set protocols rsvp interface ge-0/0/3.0
set protocols mpls label-switched-path RR2-to-PE5 to 5.5.5.5
set protocols mpls label-switched-path RR2-to-PE8 to 8.8.8.8
set protocols mpls interface ge-0/0/2.0
set protocols mpls interface ge-0/0/3.0
set protocols ospf traffic-engineering

And then:

root@Reflector2> show bgp summary | match 11.11.11.11


11.11.11.11           64512          5          5       0       2           5 Establ

----------------------

root@Reflector2> show route 11.11.11.11

inet.0: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

11.11.11.11/32     *[BGP/170] 00:25:47, MED 1, localpref 100, from 5.5.5.5


                      AS path: 64512 I
                    > to 10.10.226.6 via ge-0/0/3.0, label-switched-path RR2-to-PE5

----------------------

CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 116/125/148 ms
CPE_BARRY_1>

Hooray!

Now everything is working again, let’s make the second big change: instead of
putting my BGP-Labeled Unicast prefixes in inet.3, I’m actually going to put them in
inet.0.

RECONFIGURING ISP 2 – PUTTING BGP-LU


PREFIXES INTO INET.0
Now, usually we’d want BGP-LU prefixes in inet.3, because then we can use them to
resolve BGP next hops. We usually leak a certain number of prefixes into inet.0
when they’re needed, just like we’ve done in this lab: on Reflector 2, we leaked
11.11.11.11 into inet.0, purely because it needs to be in inet.0 for the BGP peering to
come up.

Juniper
Business
Use Only
In fact, although I’ve personally only worked at a small number of ISPs, I can still tell
you these two things: 1) Everywhere I’ve ever worked has decided to put BGP-LU
prefixes in inet.3. 2) Everyone I’ve talked to about it has only ever put BGP-LU
prefixes in inet.3. In the real world, in my experience, not a single person puts their
BGP-LU prefixes in inet.0.

So then, why are we bothering to put them into inet.0 now? Two reasons.

First of all, the vast majority of other posts on the internet about Option C put them
in inet.0. That includes the official Juniper knowledge base articles on Option C!! This
tells me that there must at least be some people in the world who do this in
production.
Secondly, there may be times when it’s actually preferable to have them in inet.0,
such as if you need to use most/all the prefixes for general reachability. If you also
need to use them to resolve BGP next-hops then you can then leak them the other
way, into inet.3. The problem with this setup though is that you can’t run BGP
unicast and BGP-labeled unicast at the same time. Well, you can, it’s just that you
get a lot more labels. It’s complicated, and we talked about the different variations
in my post all about BGP-LU, so give that post a read if this paragraph is news to you.
Anyway, let’s put this config on Router 5, and apply similar config throughout ISP 2,
including on our route reflector:

delete protocols bgp group AS64513 family inet unicast


delete protocols bgp group AS64513 family inet labeled-unicast rib inet.3
set protocols bgp group AS64513 family inet labeled-unicast

delete protocols bgp group TO_AS64512 family inet labeled-unicast rib inet.3
set protocols bgp group TO_AS64512 family inet labeled-unicast

And now we’ve done that, let’s take a look at all the many fun ways that our lab is
broken.

FINDING THE PROBLEMS


Now we’ve done that, let’s check that our BGP peerings are back up:

root@Reflector2> show bgp summary | match Est


5.5.5.5               64513          5          5       0       3          16 Establ
8.8.8.8               64513         33         19       0       3          16 Establ
11.11.11.11           64512          5          5       0       4          13 Establ

----------------------

root@Router5> show bgp summary | match Est


10.10.45.4            64512         13         15       0       0        4:44 Establ
22.22.22.22           64513         24         19       0       2          43 Establ

Juniper
Business
Use Only
Good stuff!

Now, we know that when VPN traffic goes from R1 to R8, R1 pushes three labels
onto the packet. We’ve made some changes to the MPLS in ISP 2, so Router 1 is
probably using new labels now. Let’s take a look:

root@Router1> show route table BARRYS_ICE_CREAM.inet.0 172.16.20.0/30

BARRYS_ICE_CREAM.inet.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

172.16.20.0/30     *[BGP/170] 00:02:55, localpref 100, from 11.11.11.11


                      AS path: 64513 I
                    > to 10.10.12.2 via ge-0/0/0.0, Push 300016, Push 300064, Push 299808(top)

As the traffic goes on its merry way, it will at some stage arrive at R4, who will then
of course pass it onto R5. By the time it arrives at Router 4 it only has two labels –
the previous top label was used just to get from R1 to R4, so that label is gone now.

With that in mind, Router 4 is going to take the current top label in the stack
(300064), and swap it for whatever label R5 told R4 to use. Again, we’ve made some
changes to our network, so let’s see what that label is:

root@Router4> show route table mpls.0 label 300064

mpls.0: 12 destinations, 12 routes (12 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

300064             *[VPN/170] 00:06:52


                    > to 10.10.45.5 via ge-0/0/3.0, Swap 300000

Crikey! 300000? A nice round number!

On the surface, everything seems like it’s working so far. Except… let’s head over to
R5, and see what it actually does when it receives a packet with this label:

root@Router5> show route table mpls.0 label 300000

mpls.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

300000             *[VPN/170] 00:06:05


                    > to 10.10.56.6 via ge-0/0/2.0, Pop
300000(S=0)        *[VPN/170] 00:06:05
                    > to 10.10.56.6 via ge-0/0/2.0, Pop

It pops it!! What??? This means that when the packet gets passed to the physical
next-hop (Router 6, in this case), it will be sent with only one label – the VPN label.
This label only has a meaning to Router 8, the PE that hosts this VPN prefix. Router 6

Juniper
Business
Use Only
will look up this label, find no mapping for it, and discard it. I don’t need to show you
the results of a ping on our CPE router to show you that the ping is going to fail!

Why is this happening? This didn’t happen until we made all these new-fangled
changes. What’s up?

The logic behind this took me a LONG time to get my head around, but this evening
I had a eureka moment. What we’re going to chat about now is what’s known in the
industry as “bloody complicated”. So, strap in:

INET.0 vs INET.3
Do you remember what the inet.3 table is used for? It’s used to resolve next-hops
for prefixes our router learned by BGP.

In fact, router 5 does indeed have a labeled path to Router 8, in inet.3:

root@Router5> show route 8.8.8.8

inet.0: 16 destinations, 16 routes (16 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

8.8.8.8/32         *[OSPF/10] 01:10:31, metric 3


                    > to 10.10.56.6 via ge-0/0/2.0

inet.3: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

8.8.8.8/32         *[RSVP/7/1] 01:02:21, metric 3


                    > to 10.10.56.6 via ge-0/0/2.0, label-switched-path PE5_to_PE8

So, if Router 5 has an LSP to Router 8, why is it popping the traffic?

Here’s the gotcha: when our BGP-LU prefixes were in inet.3, it meant that BGP-LU
would take prefixes from inet.3 and advertise them on.

However, now we’re putting our BGP-LU prefixes in inet.0. And as we can see,
8.8.8.8 is being learned by OSPF in the inet.0 table. This means that when Router 5
takes the prefix 8.8.8.8 from inet.0, it has no labelled path to it in this routing table –
but nevertheless, it generates a label for it, and sends this label to Router 4.

And for that reason, the solution to our sticky-tricky problem is to get the label-
switched path into inet.0. How do we do it? With one simple, beautiful command:

set protocols mpls traffic-engineering mpls-forwarding

Juniper
Business
Use Only
We’ve talked about this command in other posts in the past, but to save you time,
let’s quickly explain it again: with this one command, we tell our Junos router
to copy the contents of inet.3 into inet.0 – but to do it in a “safe” way.
You see, RSVP has a numerically lower, and therefore better, route preference than
OSPF. If RSVP wins the fiercely-fought battle for Best Prefix 2019, it can mess up
your network in hilarious ways, because the actual best path suddenly isn’t being
advertised as the best path (a bit like in Part 1, when we tried redistributing 1.1.1.1
into BGP at Router 1).

To fix this problem, this one command adds the RSVP labeled path into
the forwarding table on Router 5, but tricks the routing engine into thinking that the
OSPF route is still the best. Here’s what the result looks like:

root@Router5> show route 8.8.8.8 table inet.0

inet.0: 16 destinations, 18 routes (16 active, 2 holddown, 0 hidden)


@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

8.8.8.8/32         @[OSPF/10] 00:00:29, metric 3


                    > to 10.10.56.6 via ge-0/0/2.0
                   #[RSVP/7/1] 00:00:29, metric 3
                    > to 10.10.56.6 via ge-0/0/2.0, label-switched-path PE5_to_PE8

Now that Router 5 can send traffic destined to 8.8.8.8 via a label-switched path, it
can tell Router 4 that if it wants to get to 8.8.8.8, send the packet to R5 with a label
of 299776:

root@Router5> show route advertising-protocol bgp 10.10.45.4 8.8.8.8/32 detail

inet.0: 16 destinations, 18 routes (16 active, 0 holddown, 0 hidden)


@ 8.8.8.8/32 (2 entries, 2 announced)
BGP group TO_AS64512 type External
     Route Label: 299776
    {snip}

And what does R5 do when it receives a packet with label 299872? It swaps it for
whatever label is advertised on the R5-to-R8 RSVP label-switched path:

root@Router5> show route table mpls.0 label 299872

mpls.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

299872             *[VPN/170] 02:11:00


                    > to 10.10.56.6 via ge-0/0/2.0, Swap 299808

Hooray! So, now that’s fixed, can CPE 1 ping CPE 2?

Juniper
Business
Use Only
CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
CPE_BARRY_1>

D’oh! We’ve missed something! What is it? For this one, we need to go back to the
route reflector.

CONFIGURING OPTION C – ADVERTISING AND


LEARNING THE VPN PREFIXES, VIA ROUTE
REFLECTORS
We’re running BGP-LU, and we’re putting prefixes in inet.0. Our route reflectors can
now route to each other, and so the BGP peering comes up. But what happens
when they try to exchange VPN prefixes? Well…

root@Reflector2> show route table bgp.l3vpn.0 hidden

bgp.l3vpn.0: 8 destinations, 8 routes (4 active, 0 holddown, 4 hidden)


+ = Active Route, - = Last Active, * = Both

1.1.1.1:1:172.16.10.0/30
                    [BGP/170] 01:23:41, localpref 100, from 11.11.11.11
                      AS path: 64512 I
                      Unusable
                      { snip }

It looks like Reflector 2 is receiving the VPN prefixes from Reflector 1 – but it can’t
use them. Why? For the answer, let’s remind ourself about the bgp.l3vpn.0 table –
the table that stores all the VPN routes from everywhere, before they’re sorted into
the relevant VRFs.

Prefixes in this table require a next-hop that can be resolved in the inet.3 table.
There’s the problem: these VPN prefixes have a next-hop of 1.1.1.1, which our
router has indeed learned by BGP-LU – but, because of our new configuration
changes, this route is placed in the inet.0 table, not the inet.3 table:

root@Reflector2> show route 1.1.1.1

inet.0: 15 destinations, 15 routes (15 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[BGP/170] 00:20:45, MED 1, localpref 100, from 5.5.5.5


                      AS path: 64512 I

Juniper
Business
Use Only
                    > to 10.10.226.6 via ge-0/0/3.0, label-switched-path RR2-to-PE5

Now, in our topology our route reflectors are outside of the path of transit traffic. As
such, we can use the same command that we used in our Option B blog post to get
around this problem. Let’s add this command to both route reflectors:

set routing-options resolution rib bgp.l3vpn.0 resolution-ribs inet.0

Thanks to this command, our route reflectors will resolve the VPN prefixes in inet.0
– which is exactly where they’ll find all the loopback IP addresses.

And look – when we add it in, Reflector 2 has routes from Reflector 1!

root@Reflector2> show route receive-protocol bgp 11.11.11.11 table bgp.l3vpn.0 detail

bgp.l3vpn.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)


* 1.1.1.1:1:172.16.10.0/30 (1 entry, 1 announced)
     Accepted
     Route Distinguisher: 1.1.1.1:1
     VPN Label: 299840
     Nexthop: 1.1.1.1
     AS path: 64512 I
     Communities: target:64512:1
     {snip}

Hooray!

How about on our PE routers. Does the fact that our BGP-LU prefixes are in inet.0
cause any problems? Yep! Once again, they need to be in inet.3 for the MPLS VPN to
work properly. So, let’s see how we can fix it.

CONFIGURING OPTION C – TEACHING OUR PEs TO


RESOLVE VPN PREFIXES
At the moment, Router 8 is going to see VPN prefixes from ISP 1 as having a next-
hop of 1.1.1.1. At the moment, thanks to the way we’ve set up BGP-LU, 1.1.1.1 lives
only in the inet.0 table. And you and I both know by now that by default, VPN
prefixes will only be successfully installed in a VRF if the next-hop can be resolved in
inet.3.

As a result of all this, Router 8 knows about the prefixes – but is hiding them:

root@Router8> show route table BARRYS_ICE_CREAM.inet.0 hidden extensive

Juniper
Business
Use Only
BARRYS_ICE_CREAM.inet.0: 5 destinations, 5 routes (3 active, 0 holddown, 2 hidden)

172.16.10.0/30 (1 entry, 0 announced)


         BGP    Preference: 170/-101
                Route Distinguisher: 1.1.1.1:1
                Next hop type: Unusable

So, if Router 8 is going to use 1.1.1.1 as a next-hop, 1.1.1.1 needs to be in inet.3.

We fixed this on our route reflectors by telling it to resolve VPN prefixes in inet.0.
Now, on our PEs we’ve got all kinds of options available to us for
moving/copying/resolving between inet.0 and inet.3. But when we’re using Option C,
we don’t need to do anything quite so complicated – instead, there’s a handy
command available to us:

set protocols bgp group AS64513 family inet labeled-unicast resolve-vpn

This one command tells the router to copy BGP-LU prefixes (which go into inet.0 by
default) into inet.3, as a candidate for resolving VPN routes – but only if the BGP-LU
route is being used as a BGP next-hop. Want proof? Sure thing:

root@Router1> show route 8.8.8.8

inet.0: 13 destinations, 13 routes (13 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

8.8.8.8/32         *[BGP/170] 01:16:20, MED 3, localpref 100, from 11.11.11.11


                      AS path: 64513 I
                    > to 10.10.12.2 via ge-0/0/0.0, Push 299904, Push 299824(top)

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

8.8.8.8/32         *[BGP/170] 00:24:11, MED 3, localpref 100, from 11.11.11.11


                      AS path: 64513 I
                    > to 10.10.12.2 via ge-0/0/0.0, Push 299904, Push 299824(top)

Aah, would you look at that. A happy ending! So, the big question: does everything
finally work again?

CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 116/128/156 ms
CPE_BARRY_1>

At last!! Everything is working. Now, shall we all agree to never do this in production?
Yes? Good.

Juniper
Business
Use Only
 

DOWNLOAD THE FULL CONFIG FILES FOR EACH


ROUTER
The config has changed a fair bit on the ISP 2 side of things, so give this a click to
download the complete full new configurations. Take a look, try it at home, and play
until your heart is “content”.
 

THAT’S IT!
Wow, that was a long post! Are you really still here? Good work: you’ve officially
passed the exam, and earned your NFTCGJ certification (Network Fun-Times
Certificate Great Job).

I hope you’ve seen a few new scenarios in this post that you might not have even
seen in the official documentation. I hope that the mix of protocols and
philosophies in each network have shown you some of the gotchas you might face,
and how to overcome them. Now, let’s see how many of these I can remember
when I do the JNCIE exam in November! Probably… none? Yeah, I reckon probably
none.

If you enjoyed this pos then you’d make my world if you shared this post on your
favourite social media of choice, or emailed it to friends and colleagues who you
think might be interested in it.

And of course, I’d love you to follow me on Twitter, so you can see any future blog
posts I make. Alternatively, follow me in LinkedIn if you hate joy, and would prefer
your feeds were controlled by an algorithm that shows you terrible inspirational
memes from recruitment consultants rather than any of my posts.
 

One final question: Have you ever deployed Option C yourself? Did anything
interesting happen that you fancy sharing? Leave a comment, friend! I’d love to hear
your stories!

Juniper
Business
Use Only

You might also like