Professional Documents
Culture Documents
Interprovider Option C
Interprovider Option C
In previous posts we’ve seen two ways to do extend MPLS VPNs between two
ISPs. In our post on Interprovider Option A we saw how we can treat the other ISP as if
they were just another customer of ours. Then, in our Option B post we saw that we
can actually exchange VPN labels with the other ISP, over a dedicated link.
Today we’re going to look at Interprovider Option C, so-called because it’s the third
suggestion on the actual RFC for MPLS VPNs.
Option C isn’t too hard to understand, in principle. However, there’s a fair number of
pieces needed to actually get it working. In addition, depending on exactly what
protocols you’re using, and depending on where you’re storing your prefixes, there’s
a looooooot of tweaks you need to know about.
That’s why this blog post is the first of three. Why three posts? There’s plenty of
other guides out there that give you a quick run-down on the high points about
Option C. These posts are going to do the opposite. By the end of this three-parter,
we’ll know Option C inside-out:
In this first post we’ll see a “basic” Option C config, with both ISPs running LDP.
In our second post, we’ll take a look at the labels involved, because there’s something
very unique about the label stack in Option C. We’ll also talk about the use case. Why
would we ever use Option C, over Option A or B? In Part 2, we’ll find out.
Finally, in our third post, we’ll reconfigure one of our ISPs to run RSVP, and we’ll also
make a few other changes too. We’ll see how these changes break things – and then, we’ll
see how to fix them.
Fun fact: Option C is the most complicated solution to set up, but once it’s actually
up and running it’s also the most scalable. Why? In just a moment, you’ll find out.
But first:
Juniper
Business
Use Only
If you’ve just arrived here after a Google search for something like “Juniper inter-AS
option C”, you might like to know that I’ve also done posts before on Interprovider
Option A and Interprovider Option B. In fact, this post uses the same topology and
configuration. With that in mind, you might want to read those posts first.
It’s not essential to read them – but I would still highly recommend it, because in
those posts we set up the two-ISP lab that we’re going to use today. We also
introduced a few concepts that we’ll be referring to again in this post, like the
difference between service and transport labels, and the concept of BGP AFI/SAFI.
In addition, you’ll definitely want to be familiar with the concept of BGP-Labeled
Unicast. If you’re not, don’t worry: I wrote a post all about it, especially so that you can
understand it! Go give it a read if you don’t know about it already.
If you’re already comfortable with LDP, RSVP, OSPF, IS-IS, BGP, BGP-LU, and address
families, and you just want to learn some sweet sweet Option C, then let’s gamble:
jump right in! Just know that if you find yourself feeling confused at any point, you
can go back and read the previous posts to get up to speed.
And if at any stage you find yourself feeling aroused, then don’t worry: it’s a natural
reaction to reading my posts.
My advice to you: open this pic in a new tab, because we’ll be referring back to it
throughout the course of this post. Click the pic to make it big:
Juniper
Business
Use Only
Here’s how Option C works: the route reflectors in each ISP are actually going to talk
directly to each other to exchange VPN prefixes, each route reflector acting as a
client of the other. This is rare: usually route reflectors only reflect routes within
their own autonomous system. In this scenario though, we’re reflecting from one
ISP to the other. Dare to dream, friends! Dare to dream.
Immediately here, we see one of the reasons why Option C is more scalable. In
Option A, our ASBRs (Autonomous System Border Routers – Routers 4 and 5 in our
topology) needed to hold a ton of state in their memory, like all the prefixes for all
the VRFs, the VRFs themselves, plus BGP sessions. In Option B our ASBRs still
needed to hold a lot of state, remembering a label per-prefix per-VRF. However, in
Option C all of this state is moved to the PEs and the route reflectors, where the
state belongs. All our ASBRs need to be aware of is the fact that they’re part of a
transit label-switched path
The reason this is possible is thanks to the unique way our route-reflectors
advertise the next-hop for these VPN prefixes. In the pic above our PEs are Router 1
and Router 8. When Router 1 advertises a VPN prefix to its route reflector, the prefix
Juniper
Business
Use Only
has a next-hop of 1.1.1.1. Reflector 1 then advertises this to Reflector 2, at which
point Reflector 2 reflects it to Router 8 – still with a next-hop of 1.1.1.1!
Here we see another example of the scalability of Option C. Once again, we’re
relieving our border routers of keeping track of state, and keeping that
responsibility where it belongs, because the ASBRs don’t need know they’re the
next-hop for every single individual VPN prefix. As long as they know the LSP (label-
switched path) to put the traffic on, that’s all that’s needed.
Now, the sharp thinkers among you might be thinking: how does Router 8 resolve
the IP address of Router 1? Even if Router 1’s loopback (1.1.1.1) happened to be in
Router 8’s routing table, that’s not enough: for an MPLS VPN, running private IPs
over a provider network, we need a label-switched path from R8 all the way to R1.
How on earth does this work? How?!?
To understand that, we need to know about SAFI 4, otherwise known as BGP Labeled
Unicast. And it’s important that we understand it in detail. That’s why I’ve written an
entire post all about it! Click here to read my blog post on BGP Labeled-Unicast, on
Juniper routers. Once you understand what it is, how it works, the default behaviour,
and how to manipulate it, come back here. Don’t worry, I’ll wait. Take your time: I’ve
got lots of Super Mario to catch up on.
Done? Perfect. So: we have route reflectors talking directly to each other,
exchanging VPN prefixes, with the next-hop unchanged. Meanwhile, our PE routers
in each ISP have a full label-switched path to the PE in the other ISP, thanks to our
border routers talking BGP-LU with each other. Router 8 knows a label for Router 1.
If you still can’t quite picture how that works, don’t worry: later on we’ll be talking
about it in great detail.
You’ll remember that we’re running IS-IS in ISP 1, and OSPF in ISP 2. I’m going to
change my lab a little bit, and just run LDP everywhere in both ISPs. This will help us
focus on what makes Option C unique. (As I mentioned earlier, in Part 3 we’ll be
bringing RSVP back, to see how it changes things.)
Juniper
Business
Use Only
TURNING ON OPTION C: OUR PLAN OF ACTION
One of the things we’re going to configure is an eBGP multi-hop peering directly
between Reflector 1 in ISP 1, and Reflector 2 in ISP 2. This peering will only talk the
VPN unicast family.
We’re also going to run BGP-Labeled Unicast between our ASBRs, Routers 4 and 5,
so that each ISP receives labeled routes to the PEs and route reflectors of the other
ISP. We’ll add a policy onto it, to make sure we’re only advertising what we need to
advertise.
How about the VPN route-targets for each VRF? It’s common for an ISP to use their
autonomous system in the target, so if we’ve got VPN prefixes coming from a
different AS, we need to give this some thought.
In our Option B post we learned how to re-write communities as routes pass from
one AS to another, using a policy. It’s not hard, but it involves a lot of lines of config.
Because we’ve already seen what that looks like, in this post we’ll just do a simple
solution, and import both ISP 1 and ISP 2’s target communities into each VRF.
Soon enough we’ll be ready to start configuring. But before we jump into it, let’s take
a look at what should happen when it’s all working.
Juniper
Business
Use Only
Here’s what’s going to happen: ASBR Router 5 will generate a label for 8.8.8.8 (the
loopback of Router 8), and pass this label to Router 4. R4 will then generate a new
label to get to Router 8 (when a router advertises itself as the next-hop, it always
generates a new label), and R4 will advertise this labeled route throughout ISP 1. As
such, Router 1 will receive it.
So, after all that, if PE Router 1 wants to send traffic to a VPN prefix with a next-hop
of 8.8.8.8, what will Router 1 actually do? Here’s where things get really interesting:
Router 1 will actually push THREE labels onto the packet:
• First, R1 pushes an inner VPN label, useful only to R8, to identify the VPN itself.
• Next, R1 pushes a middle label that won’t be processed until we get to R4. This is the
label that tells R4 how to get to R8, via R5. We’ll talk about this label more in a
moment.
• Finally, a top outer transport label, which identifies the label-switched path from R1 to
R4.
Crikey! Cor blimey gov’nor! Apples and pears! That’s a lot of labels. Again, don’t
worry if you can’t visualise that, because once we’ve configured it all and got it
working we’ll move over to Part 2 in this series, where we’ll look at a diagram that
shows exactly how the label stack works, end-to-end.
Well, we can’t set up the reflector peering just yet, because our route reflectors don’t
have routes to each other. They’ll only know about each other’s loopback when the
BGP-LU peering comes up.
Juniper
Business
Use Only
To make this nice and clean, we’re going to configure a policy on R4 which takes the
relevant loopbacks in ISP 1, and sends them to R5 via eBGP. Labeled, naturally! Without
this policy, there’s a danger that Router 4 would advertise everything via BGP to Router
5. And of course, we’ll do the same on R5, making a policy to advertise ISP 2’s important
loopbacks as labeled prefixes.
Now, it may not be immediately clear why we’re choosing to redistribute these
loopbacks into BGP at our ASBRs in particular. For example, if Router 1 is talking
BGP to the rest of ISP 1, why don’t we redistribute 1.1.1.1 into BGP on Router 1
itself? Why not redistribute it at the source of the prefix?
The answer comes in the behaviour of route reflectors. Let’s imagine that we turned
BGP labeled-unicast on everywhere in ISP 1, and that we added a policy on Router 1
itself to redistribute its loopback into iBGP. Here’s the result: R1 sends its loopback
address, 1.1.1.1, via iBGP to its route reflector. As such, our plucky route reflector
will indeed receive the prefix, with a label:
(The command above can look confusing with the two 1.1.1.1 entries. The red one is
the address of Reflector 1’s BGP neighbour. The blue one is the actual route we’re looking up.
By coincidence, in this example the two addresses are the same!)
But look o’er yonder – Reflector 1 isn’t actually reflecting this route on to Router 4:
root@Reflector1>
Why? For our answer, let’s take a look at 1.1.1.1/32 in Reflector 1’s inet.0 table.
Juniper
Business
Use Only
It’s in there twice – once as an IS-IS route, and once as a BGP route. Reflector 1 has
rightly chosen the IS-IS route as the route that it thinks is best…. and because the
BGP route isn’t the most preferred route, it isn’t reflected to Router 4.
Remember, reflectors don’t just take every prefix they get, and reflect the whole lot.
They actually go through the BGP path selection process themselves, and only select
the winners. And as an extensive output shows, this route was not the winner:
root@Reflector1> show route table inet.0 1.1.1.1/32 protocol bgp extensive | match Inactive
Inactive reason: Route Preference
There’s ways around it, of course – but in our lab, we’re going to put the loopback
into BGP at the edge of our network. That way, we can be very specific about which
labeled unicast prefixes we choose to pass to our friends in the other ISP.
First we’re going to turn on BGP-LU between our two ISPs, at ASBRs Router 4 and
Router 5. I’m also going to apply a policy to export only the loopbacks of our PE
routers, and the loopbacks of our route reflectors. That’s quite a mouthful, so let’s
look at some config to see what’s going on. Here’s the BGP config on Router 4:
term ACCEPT_R1_LOOPBACK {
from {
route-filter 1.1.1.1/32 exact;
route-filter 11.11.11.11/32 exact
}
then accept;
}
Juniper
Business
Use Only
term ELSE_REJECT {
then reject;
}
Did it work? Let’s see what ISP 1’s ASBR (R4) is learning from ISP 2’s ASBR (R5):
Great! R5 is saying to R4 “If you want to get to R8, come to me, and put label 300016
on the packet.” And notice that these labeled prefixes are in inet.3, right where we
want them.
We also see that Router 4 has learned 22.22.22.22, which is the address of Reflector
2, in ISP 2. Router 4 will now advertise this to the rest of ISP 1, which means that the
peering between our two route reflectors will have no problems – right? We’ll find
out in a moment.
Juniper
Business
Use Only
set protocols bgp group AS64512 family inet labeled-unicast rib inet.3
We add the equivalent lines on RR1, R4, R5, RR2 and R8. Our iBGP sessions go down,
and come back up. And after a while, do we see 8.8.8.8 in Router 1’s table?
Now that’s interesting. We’re learning it, but we can’t ping it. Can you see why?
Ten points if you spotted it: it’s because the prefix is in the inet.3 table, not inet.0.
Router 1 will be using the entry in inet.3 to resolve its BGP next-hops. But if we try to
ping 8.8.8.8 directly, the lookup happens in inet.0 – and because the prefix isn’t in
inet.0, the ping fails.
In this situation, it’s not a big deal. Router 1 has no need for 8.8.8.8 to be in inet.0,
other than for testing general reachability, but that’s just a nice-to-have, not an
essential. But as we’re about to see, this lack of an entry in inet.0 is going to cause
some big problems for our route reflectors…
First of all, I’d like you to see the BGP config that Reflector 1 is using to talk with the
other routers in its own autonomous system, Routers 1 and 4. It’s nothing special:
we’re just talking unicast, VPN unicast, and labeled-unicast (in inet.3). We’ve also got
Juniper
Business
Use Only
a cluster ID, which tells our Juniper router that it’s to act as as a route reflector to its
peers:
For now, let’s build a brand new BGP peering, so Reflector 1 can to peer with
Reflector 2 in ISP 2. Notice in the config below that it’s an external peering, which
you don’t often see on route reflectors! Notice as well that we’re only talking inet-vpn
unicast, because that’s the only family we need!
Here’s the config on Reflector 1:
We’re turning on BGP multihop, because we’re now doing eBGP. Why “multihop 5”
in particular though? Because it’s precisely the number of hops between the two
reflectors!
…oh. That’s a shame: our BGP didn’t come up. How come?
Do you know: now I think about it, we never actually checked if Reflector 1 even has
a route to Reflector 2! Let’s find out:
Juniper
Business
Use Only
AS path: 64513 I
> to 10.10.113.3 via ge-0/0/3.0, Push 299968, Push 299792(top)
Twenty points if you spotted it: this prefix is in inet.3. In fact, it’s exactly the same
problem as when we tried to do that ping earlier. For this BGP peering to come up,
22.22.22.22 needs to be in Reflector 1’s inet.0 table – but we configured our router
to put BGP-LU prefixes in inet.3.
Luckily, I know how to fix it. And guess what: I’ll tell you the answer for free! I’ll show
you the config first, then I’ll explain what it does. But do you promise you’ll keep it a
secret? Even if the FBI asks you really nicely? Yes? Good.
In short, Reflector 1 is going to take the labeled prefix for Reflector 2, and copy it
into its inet.0 table. First, we add in this line to Reflector 1’s iBGP config with the rest
of its own AS (because this is the peering that it’s actually learning 22.22.22.22 from):
The main moving part here is the rib-group, which are possibly one of the most
misunderstood elements in all of Junos. Essentially, the import-rib command allows
us to take prefixes that would usually be put into one routing table, and additionally
import them into another routing table. The first table is where you’d usually find
the prefixes, and then any table listed after that is where we’re importing them to.
Essentially we made a rib-group, but we also added a policy on the group so that
only 22.22.22.22/32 actually gets imported. Here’s what the same config looks like in
hierarchy format, so it’s a little easier to read. When it’s like this, it’s easier to see
that inet.3 is the first group, and inet.0 is the secondary group.
Juniper
Business
Use Only
RR2_INTO_INET0 {
import-rib [ inet.3 inet.0 ];
import-policy RR2_LOOPBACK;
}
}
autonomous-system 64512;
Hooray! And would you look at that: Reflector 1 has learned four prefixes! Could
they be the BGP prefixes from ISP 2? Indeed they could:
We did it! We did it together, you and me! Take that, granddad!!
Juniper
Business
Use Only
CONFIGURING OPTION C – THE VRF TARGET
POLICIES
Each ISP has their own unique route-target that represents a customer. This is a
problem when we’ve got one customer split over two ISPs. So, the final step is to get
Router 1 and Router 8 to know what to do with each other’s VPN prefixes.
You’ll remember that we said we weren’t going to do anything complicated like re-
writing route-targets as they cross our AS boundary. Instead, we’re just going to tell
each PE to import not only the ISP’s own route-target for that customer, but the
other ISP’s target for that customer too. Let’s see what this config looks like on
Router 1, for one of our two customers, Barry’s Ice Creams (a company that I
desperately wish existed):
[...]
policy-statement TARGET_IMPORT_BARRYS_ICE_CREAM {
term ACCEPT_CORRECT_COMMUNITY {
from community [ TARGET_AS64512_BARRYS_ICE_CREAM
TARGET_AS64513_BARRYS_ICE_CREAM ];
then accept;
}
term REJECT {
then reject;
}
}
Gosh, it’s a thing of beauty! Would you like to marry it? Well, you can’t. Don’t even
think about it.
— First of all, we’ve turned on labeled-unicast around the place, which has
ultimately given us one conceptual full label-switched path between two PEs in
different ISPs, made up of three separately-signalled label-switched paths: R1 to R4,
R4 to R5, and R5 to R8.
— Second, now that Reflectors 1 and 2 know how to get to each other’s loopbacks,
we set up a BGP peering between the two, with the inet-vpn unicast family.
— Finally, we’ve set up our PEs to be aware of the target communities used by the
other ISP, and to import them in to the relevant VRFs.
Juniper
Business
Use Only
So now, everything should work, right? Let’s take a look at VRF on Router 1, and see
if it knows the 192.168.20.0/24 prefix from ISP 2:
Success! And just as a final test, to be sure – let’s get CPE 1 to ping CPE 2:
CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 116/132/188 ms
THAT’S IT!
If you’ve read this far, you’re an absolute hero. But I’m sure you’re itching for that
explanation I promised you of the labels involved in Option C? If so, click here for
Part 2 of this mighty deep-dive into Option C!
(Now you’ve come this far, I bet you can see why I split this post up into chunks. I
feel like I’ve written an entire book!)
As always, if you enjoyed this post, I’d love you to share it on your favourite social
media of choice. The more people read my blog, the more it inspires me to make
even more posts. So, if you want more posts like this, share it far and wide!
Juniper
Business
Use Only
And hey: if you’re on Twitter, give me a follow! I’lll tell you when I make new posts, plus
I’ll occasionally share what some people (me) have described as “the very best
opinions in the entire networking industry”. Wow, high praise indeed (from me)!
First, we didn’t talk about why we’d ever choose to use Option C, over Options A or
B. Second, we only skimmed over this idea of the three-label stack. And third, we
didn’t phone the people who mean the most to us, and tell them that we love them.
So, in this post we’re going to do all those things, and more. Along the way, we’ll be
asking ourselves such questions as “What do labels taste like?”, “What is the
meaning of life?”, and “If you put a plane on a massive conveyor belt, would it ever
take off?”
…Sorry, I just re-read my notes, and actually we won’t be answering any of those
questions. Apologies for the confusion there. This blog post is purely about MPLS,
not the meaning of life. My mistake. Whoopsie-daisy!!
Juniper
Business
Use Only
Okay, time to jump into the forwarding plane. If a host behind CPE_BARRY_1 (let’s
say 192.168.10.5) sent a ping to a host behind CPE_BARRY_2 (let’s say 192.168.20.5),
what would happen?
The packet leaves the source, and gets sent to its gateway, which is CPE_BARRY_1.
This CPE router passes the packet to Router 1, at which point it enters the MPLS
VPN. The packet gets label-switched to Router 4, crosses an AS boundary to Router
5, transits an MPLS network in ISP 2, reaches Router 8, at which point it gets passed
to CPE_BARRY_2 – and then, finally, the packet arrives at its destination.
The full path is like this: CPE 1 > Router 1 > Router 2 > Router 3 > Router 4 > Router 5
> Router 6 > Router 7 > Router 8 > CPE 2.
What does Router 1 actually do when it receives the packet? First it notices that the
packet came in on an interface that’s in a VRF, so it knows to look up the prefix in
the VRF’s dedicated routing table. In doing so, Router 1 sees that 192.168.20.0/24
has a next-hop of 8.8.8.8 – and that it needs to add three labels before sending the
packet on its merry way. In just a moment we’re going to look at a picture of all
these labels. For now, follow us on the journey from end to end.
The inner label (299936) is the VPN label that Router 8 is advertising for this VRF.
Nothing unusual there! This label only has meaning to Router 8. If any other router
Juniper
Business
Use Only
tries interpreting this label, it will be meaningless. This label is also known as
the service label.
The outer label (299808) is the transport label that ISP 1’s LDP instance generated for
getting to Router 4. There’s also nothing unusual here: this is the usual outer
transport label we’d find when going between two routers within a single ISP.
The middle label (299984) is where things get interesting.
This middle label has a meaning only to Router 4. When the packet is sent by Router
3 to Router 4, Router 3 pops the outer transport label, as per the usual penultimate-
hop-popping process. As a result, the label stack, which previously had three labels,
now only has two.
This means that the label which was previously the “middle” label has been
promoted to being the outer transport label! And, because it’s the outer label, it’s
precisely this label that Router 4 processes. Router 4 knows that if it receives a
packet with an outer label of 299984, it should swap it for label 300016 and pass it
to Router 5:
And when Router 5 receives this packet, with an outer label of 300016, it knows to
put the packet into the LSP towards Router 8:
From here, it’s business as usual. The packet goes down Router 5’s LSP to Router 8,
and arrives at R8 with just one label (because of penultimate-hop popping). Router 8
looks up this label, sees it belongs to a VRF, pops the label, and passes the packet
straight down the relevant AC (attachment circuit – basically a fancy word for the WAN
link). This is the default behaviour in Junos. No need to inspect the destination IP –
the label itself tells our PE router which WAN link to pass the packet down.
Juniper
Business
Use Only
mpls.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!
Finally, notice that between R5>R6 and R6>R7, the label is actually the same. This is
a complete coincidence. This happened because my lab was extremely simple, so
there’s not many prefixes that need labels. Make no mistake though: in effect, R6 is
“swapping” a label for the same label!
Juniper
Business
Use Only
root@Router1> show route table bgp.l3vpn.0 extensive | match Label
Label operation: Push 299936, Push 300320, Push 299808(top)
Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
VPN Label: 299936
Label operation: Push 299936, Push 300320, Push 299808(top)
Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
VPN Label: 299936
Label operation: Push 300032, Push 300320, Push 299808(top)
Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
VPN Label: 300032
Label operation: Push 300032, Push 300320, Push 299808(top)
Label TTL action: prop-ttl, prop-ttl, prop-ttl(top)
VPN Label: 300032
Perhaps it still isn’t clear to you why it’s important to have control over both
autonomous system. After all, it’s only labels. It’s not like we’re extending OSPF or IS-
IS between the two. So, let’s take a look at the kind of action that the admins of one
autonomous system could take which might severely impact the other AS.
Juniper
Business
Use Only
Let’s imagine that the two labels our PE router generates are 69 (nice), and 420.
So, imagine that ISP 2 is running vrf-table-label. Imagine as well that there were 100
MPLS VPNs shared between ISPs 1 and 2, and that in total ISP 2 hosted around
30,000 VPN prefixes. Thanks to vrf-table-label, let’s imagine we have something like
400 labels to represent all of these prefixes. PE1 can easily handle this many.
Now imagine that for some reason, ISP 2 turns off vrf-table-label. Suddenly, the PEs
in ISP 1 are going to receive a massive increase in the number of labels that ISP 2
sends, because instead of one label per-VRF per-PE, it’s one label per-VRF per-CPE!
That’s a BIG increase in state. Suddenly we could go from 400 labels to many
thousands of labels. I say thousands: that’s a complete guess. I’ve not done the
maths, and I’m not going to. You get my point though: you better cross your fingers
that the routers in ISP 1 have enough memory for the sudden increase in labels.
If the other autonomous system really is owned by someone else, you have no
control over whether they choose to do this. Actions by the other autonomous
system could send a lot of new state to your PE router – and depending on what
you’re getting that box to do, this could well be a problem!
This is why Option C is suited to situations where you’re the admin of both domains.
Otherwise, the admins of ISP 1 might retaliate by doing something really nasty. You
know: like swearing through ISP 2’s letterbox, or taking them off their Christmas
Card list. Noooo, it’s too sad to even think about!!
THAT’S IT!
Juniper
Business
Use Only
Except not really, because of course, this post is part 2 of three. And if you’ve read
this far, I’m absolutely positive that you’re going to want the complete story.
Therefore, it is your mission, neigh your DUTY, to click here and read Part 3, where we
first introduce RSVP into our network, and then we configure BGP-LU in a slightly
different way. Will it take out MPLS VPNs down? Yep? Will we fix it? Click to find out…
In the mean time, you’ll do my the truest honour if you share this post on your
favourite social media of choice. And of course, if you enjoy my nonsense then I’d
absolutely love it if you followed me on Twitter or LinkedIn, where I’ll share any
more posts I make in the future. That’s right, Twitter and LinkedIn: the two literal
worst websites on the entire internet!
With that in mind, it’s a true honour for this blog post to be listed (by me) alongside
the greatest trilogies of all time, as we embark on the third and final part of this
series on using Interprovider Option C to extend MPLS VPNs between two ISPs.
In Part 1 we learned how to do a basic config, and in Part 2 we looked at the actual
use case for Option C, as well as taking a detailed look at how the labels work end-
to-end. To keep things simple, we ran LDP everywhere, and we also put all our BGP-
Labeled Unicast routes into inet.3.
In this third and final post, we’re going to learn some things to be aware of in
different configurations. In particular, we’re going to turn on RSVP in ISP 2, and we’re
going to put our BGP-LU prefixes in inet.0. We’ll see how it breaks our setup, and
then we’ll look at the configuration we need to get it working again.
Then afterwards, when we’re all done… we could go for a meal maybe? Or just a
drink? Or we could go catch a movie? …what? You’re busy? Oh okay, no worries,
never mind. No it’s cool, it was only an idea anyway, I don’t mind. No worries! No
worries.
Juniper
Business
Use Only
Let’s remind ourselves of our topology. As always, I recommend opening this pic up
in a new tab, as we’re going to be referring back to it a lot:
First, let’s remove all LDP config from ISP 2. We add this line to all routers in ISP 2:
Next, I’m going to turn on RSVP, and create two label-switched paths. On Router 5,
I’m making a path to Router 8. And on Router 8, I’m making a path back to Router 5.
Here’s some new config on Router 5:
The last line is important: in IS-IS, the traffic-engineering extensions for RSVP are
turned on by default in Junos, so RSVP’s Constrained Shortest Path First (CSPF)
algorithm can run with no problems. In OSPF, we need to add that command to
every router in ISP 2.
We add the equivalent config on Router 8. We also turn on RSVP and MPLS on the
relevant infrastructure interfaces on Routers 6 and 7. Notice that I haven’t turned on
RSVP on ISP 2’s route reflector, Reflector 2.
Now, in previous posts on this blog we’ve talked about how it’s common to not turn
on MPLS on your route reflectors, when your route reflectors aren’t also acting as
transit routers. After all, why do they need label-switched paths to routers that
they’re not sending ingress or transit label-switched traffic to? All the route reflector
is doing is reflecting prefixes.
Juniper
Business
Use Only
With that in mind, in the past we’ve used some commands to make the router
*think* it can successfully resolve the next-hop of VPN prefixes, just to trick it into
actually reflecting the prefixes. In this story though, things are a little bit different.
Imagine we turn on RSVP everywhere apart from our route reflector. Now,
remember that ISP 2 has learned about 11.11.11.11, ISP 1’s route reflector, via BGP
Labeled-Unicast. With that in mind, does the BGP peering between the two
reflectors come up?
Aah! That’s interesting. So we had a route when we were talking LDP everywhere,
but not when we’re only talking RSVP between our PEs, and any router in between.
Let’s see if the route is being learned and ignored.
(A note about this output: you’ll remember that we added a policy on our route reflector to copy
our 11.11.11.11 BGP-LU route from inet.3 into inet.0. When typing the command above, and the
ones below, we also get identical output for the inet.3 table. I’ve removed it to keep the post
short.)
Now that IS odd. It’s hidden – and yet, it’s saying that it’s “accepted”! Not particularly
helpful. Perhaps our routing table will tell us more?
Juniper
Business
Use Only
{snip}
Primary Routing Table inet.3
Indirect next hops: 1
Protocol next hop: 5.5.5.5
Push 299872
Indirect next hop: 0 -
Router 5 is telling Reflector 2 “to get to 11.11.11.11, come to me and add label
299872 to the packet” Can you see the problem? Reflector 2 needs to be able to add
at least one more transport label to this packet. Otherwise, if Reflector 2 just added
this one label to the packet, and passed it to the next hop (Router 6), then Router 6
would have no idea what to do with it! Label 299872 only has meaning to Router 5.
The fact that we don’t have an indirect next-hop means that we can’t resolve 5.5.5.5
in a labeled way. All we have is an unlabeled path:
All of this is a very long way of saying: if you’re doing Option C, and you’re using
RSVP, it’s essential that you have label-switched paths on your route reflectors, even
if the reflectors are outside of the MPLS transit path. Otherwise, they won’t be able
to get to the route reflector in the other ISP.
There’s a few ways we could fix this, but because we’re in a simple lab, let’s just
make RSVP LSPs between routers 5 and 8. We add this config to Reflector 2:
Juniper
Business
Use Only
set protocols rsvp interface ge-0/0/2.0
set protocols rsvp interface ge-0/0/3.0
set protocols mpls label-switched-path RR2-to-PE5 to 5.5.5.5
set protocols mpls label-switched-path RR2-to-PE8 to 8.8.8.8
set protocols mpls interface ge-0/0/2.0
set protocols mpls interface ge-0/0/3.0
set protocols ospf traffic-engineering
And then:
----------------------
----------------------
CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 116/125/148 ms
CPE_BARRY_1>
Hooray!
Now everything is working again, let’s make the second big change: instead of
putting my BGP-Labeled Unicast prefixes in inet.3, I’m actually going to put them in
inet.0.
Juniper
Business
Use Only
In fact, although I’ve personally only worked at a small number of ISPs, I can still tell
you these two things: 1) Everywhere I’ve ever worked has decided to put BGP-LU
prefixes in inet.3. 2) Everyone I’ve talked to about it has only ever put BGP-LU
prefixes in inet.3. In the real world, in my experience, not a single person puts their
BGP-LU prefixes in inet.0.
So then, why are we bothering to put them into inet.0 now? Two reasons.
First of all, the vast majority of other posts on the internet about Option C put them
in inet.0. That includes the official Juniper knowledge base articles on Option C!! This
tells me that there must at least be some people in the world who do this in
production.
Secondly, there may be times when it’s actually preferable to have them in inet.0,
such as if you need to use most/all the prefixes for general reachability. If you also
need to use them to resolve BGP next-hops then you can then leak them the other
way, into inet.3. The problem with this setup though is that you can’t run BGP
unicast and BGP-labeled unicast at the same time. Well, you can, it’s just that you
get a lot more labels. It’s complicated, and we talked about the different variations
in my post all about BGP-LU, so give that post a read if this paragraph is news to you.
Anyway, let’s put this config on Router 5, and apply similar config throughout ISP 2,
including on our route reflector:
delete protocols bgp group TO_AS64512 family inet labeled-unicast rib inet.3
set protocols bgp group TO_AS64512 family inet labeled-unicast
And now we’ve done that, let’s take a look at all the many fun ways that our lab is
broken.
----------------------
Juniper
Business
Use Only
Good stuff!
Now, we know that when VPN traffic goes from R1 to R8, R1 pushes three labels
onto the packet. We’ve made some changes to the MPLS in ISP 2, so Router 1 is
probably using new labels now. Let’s take a look:
As the traffic goes on its merry way, it will at some stage arrive at R4, who will then
of course pass it onto R5. By the time it arrives at Router 4 it only has two labels –
the previous top label was used just to get from R1 to R4, so that label is gone now.
With that in mind, Router 4 is going to take the current top label in the stack
(300064), and swap it for whatever label R5 told R4 to use. Again, we’ve made some
changes to our network, so let’s see what that label is:
On the surface, everything seems like it’s working so far. Except… let’s head over to
R5, and see what it actually does when it receives a packet with this label:
It pops it!! What??? This means that when the packet gets passed to the physical
next-hop (Router 6, in this case), it will be sent with only one label – the VPN label.
This label only has a meaning to Router 8, the PE that hosts this VPN prefix. Router 6
Juniper
Business
Use Only
will look up this label, find no mapping for it, and discard it. I don’t need to show you
the results of a ping on our CPE router to show you that the ping is going to fail!
Why is this happening? This didn’t happen until we made all these new-fangled
changes. What’s up?
The logic behind this took me a LONG time to get my head around, but this evening
I had a eureka moment. What we’re going to chat about now is what’s known in the
industry as “bloody complicated”. So, strap in:
INET.0 vs INET.3
Do you remember what the inet.3 table is used for? It’s used to resolve next-hops
for prefixes our router learned by BGP.
Here’s the gotcha: when our BGP-LU prefixes were in inet.3, it meant that BGP-LU
would take prefixes from inet.3 and advertise them on.
However, now we’re putting our BGP-LU prefixes in inet.0. And as we can see,
8.8.8.8 is being learned by OSPF in the inet.0 table. This means that when Router 5
takes the prefix 8.8.8.8 from inet.0, it has no labelled path to it in this routing table –
but nevertheless, it generates a label for it, and sends this label to Router 4.
And for that reason, the solution to our sticky-tricky problem is to get the label-
switched path into inet.0. How do we do it? With one simple, beautiful command:
Juniper
Business
Use Only
We’ve talked about this command in other posts in the past, but to save you time,
let’s quickly explain it again: with this one command, we tell our Junos router
to copy the contents of inet.3 into inet.0 – but to do it in a “safe” way.
You see, RSVP has a numerically lower, and therefore better, route preference than
OSPF. If RSVP wins the fiercely-fought battle for Best Prefix 2019, it can mess up
your network in hilarious ways, because the actual best path suddenly isn’t being
advertised as the best path (a bit like in Part 1, when we tried redistributing 1.1.1.1
into BGP at Router 1).
To fix this problem, this one command adds the RSVP labeled path into
the forwarding table on Router 5, but tricks the routing engine into thinking that the
OSPF route is still the best. Here’s what the result looks like:
Now that Router 5 can send traffic destined to 8.8.8.8 via a label-switched path, it
can tell Router 4 that if it wants to get to 8.8.8.8, send the packet to R5 with a label
of 299776:
And what does R5 do when it receives a packet with label 299872? It swaps it for
whatever label is advertised on the R5-to-R8 RSVP label-switched path:
Juniper
Business
Use Only
CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
CPE_BARRY_1>
D’oh! We’ve missed something! What is it? For this one, we need to go back to the
route reflector.
1.1.1.1:1:172.16.10.0/30
[BGP/170] 01:23:41, localpref 100, from 11.11.11.11
AS path: 64512 I
Unusable
{ snip }
It looks like Reflector 2 is receiving the VPN prefixes from Reflector 1 – but it can’t
use them. Why? For the answer, let’s remind ourself about the bgp.l3vpn.0 table –
the table that stores all the VPN routes from everywhere, before they’re sorted into
the relevant VRFs.
Prefixes in this table require a next-hop that can be resolved in the inet.3 table.
There’s the problem: these VPN prefixes have a next-hop of 1.1.1.1, which our
router has indeed learned by BGP-LU – but, because of our new configuration
changes, this route is placed in the inet.0 table, not the inet.3 table:
Juniper
Business
Use Only
> to 10.10.226.6 via ge-0/0/3.0, label-switched-path RR2-to-PE5
Now, in our topology our route reflectors are outside of the path of transit traffic. As
such, we can use the same command that we used in our Option B blog post to get
around this problem. Let’s add this command to both route reflectors:
Thanks to this command, our route reflectors will resolve the VPN prefixes in inet.0
– which is exactly where they’ll find all the loopback IP addresses.
And look – when we add it in, Reflector 2 has routes from Reflector 1!
Hooray!
How about on our PE routers. Does the fact that our BGP-LU prefixes are in inet.0
cause any problems? Yep! Once again, they need to be in inet.3 for the MPLS VPN to
work properly. So, let’s see how we can fix it.
As a result of all this, Router 8 knows about the prefixes – but is hiding them:
Juniper
Business
Use Only
BARRYS_ICE_CREAM.inet.0: 5 destinations, 5 routes (3 active, 0 holddown, 2 hidden)
We fixed this on our route reflectors by telling it to resolve VPN prefixes in inet.0.
Now, on our PEs we’ve got all kinds of options available to us for
moving/copying/resolving between inet.0 and inet.3. But when we’re using Option C,
we don’t need to do anything quite so complicated – instead, there’s a handy
command available to us:
This one command tells the router to copy BGP-LU prefixes (which go into inet.0 by
default) into inet.3, as a candidate for resolving VPN routes – but only if the BGP-LU
route is being used as a BGP next-hop. Want proof? Sure thing:
Aah, would you look at that. A happy ending! So, the big question: does everything
finally work again?
CPE_BARRY_1>ping 192.168.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 116/128/156 ms
CPE_BARRY_1>
At last!! Everything is working. Now, shall we all agree to never do this in production?
Yes? Good.
Juniper
Business
Use Only
THAT’S IT!
Wow, that was a long post! Are you really still here? Good work: you’ve officially
passed the exam, and earned your NFTCGJ certification (Network Fun-Times
Certificate Great Job).
I hope you’ve seen a few new scenarios in this post that you might not have even
seen in the official documentation. I hope that the mix of protocols and
philosophies in each network have shown you some of the gotchas you might face,
and how to overcome them. Now, let’s see how many of these I can remember
when I do the JNCIE exam in November! Probably… none? Yeah, I reckon probably
none.
If you enjoyed this pos then you’d make my world if you shared this post on your
favourite social media of choice, or emailed it to friends and colleagues who you
think might be interested in it.
And of course, I’d love you to follow me on Twitter, so you can see any future blog
posts I make. Alternatively, follow me in LinkedIn if you hate joy, and would prefer
your feeds were controlled by an algorithm that shows you terrible inspirational
memes from recruitment consultants rather than any of my posts.
One final question: Have you ever deployed Option C yourself? Did anything
interesting happen that you fancy sharing? Leave a comment, friend! I’d love to hear
your stories!
Juniper
Business
Use Only