诺不轻信,故人不负我。
诺不轻许,故我不负人。
Wednesday, December 15, 2010
Tuesday, November 23, 2010
Thanksgiving Quotes
Best of all is it to preserve everything in a pure, still heart, and let there be for every pulse a thanksgiving, and for every breath a song.
- Konrad von Gesner
How wonderful it would be if we could help our children and grandchildren to learn thanksgiving at an early age. Thanksgiving opens the doors. It changes a child's personality. A child is resentful, negative-or thankful. Thankful children want to give, they radiate happiness, they draw people.
- Sir John Templeton
In the past I always thought of gratitude as a spontaneous response to the awareness of gifts received, but now I realize that gratitude can also be lived as a discipline. The discipline of gratitude is the explicit effort to acknowledge that all I am and have is given to me as a gift of love, a gift to be celebrated with joy.
- Henri J. M. Nouwen
As we express our gratitude, we must never forget that the highest appreciation is not to utter words, but to live by them.
- John Fitzgerald Kennedy
- Konrad von Gesner
How wonderful it would be if we could help our children and grandchildren to learn thanksgiving at an early age. Thanksgiving opens the doors. It changes a child's personality. A child is resentful, negative-or thankful. Thankful children want to give, they radiate happiness, they draw people.
- Sir John Templeton
In the past I always thought of gratitude as a spontaneous response to the awareness of gifts received, but now I realize that gratitude can also be lived as a discipline. The discipline of gratitude is the explicit effort to acknowledge that all I am and have is given to me as a gift of love, a gift to be celebrated with joy.
- Henri J. M. Nouwen
As we express our gratitude, we must never forget that the highest appreciation is not to utter words, but to live by them.
- John Fitzgerald Kennedy
Monday, August 23, 2010
Affirmation:
8/25:
"Qualities you see in others, reveal something about you."
You may not realize it, but the characteristics you like or dislike in others delivers a special message.
If you respond to a person that you see as outgoing, positive and energetic, these are most likely qualities you possessed, but have not fully embraced or developed. Likewise, if you react to the overbearing nature of another, then do a self check,you may have tendency to be overbearing yourself.
Either way, turn into what you do and don't admire in others. Take notes of your emotion response. With this new found awareness you can't help but discover something you didn't know about yourself.
Today's affirmation:The qualities I respond to in others reflect who I am.
8/26:
I will waste not even a precious second today in anger or hate or jealousy or selfishness. I know that the seeds I sow I will harvest, because every action, good or bad, is always followed by an equal reaction. I will plant only good seeds this day.
-Og Mandino 1923-1996, Author and Speaker
9/17:
"Happiness is not the destination."
Many people slog for a lifetime looking for happiness. They feel guilty doing things they enjoy. So they take care of their jobs, their families and their children thinking that the time for happiness will come later. In the end, they discover that it doesn't.
Believe that you deserve happiness right now!
You know those early hours in the morning when the house is quiet and the mist is floating outside the window? That is happiness. Remember how it felt to hold your baby for the first time? That is happiness. Remember how you felt when you achieved what seemed like an impossible goal? That is happiness.
Do not defer your happiness to a later date. Find joy in what you do every day.
Today's affirmation: I deserve to be happy right now.
10/5
"Admit your mistake! It's your most empowering choice."
We all make mistakes. It's a fact of life. What matters is what you do afterward.
You have a choice. You can let your mistake torment you, embarrass you, and hold you back. Or, you can admit you made a mistake, learn from it, and move on.
Mistakes are lessons in disguise. You can learn something from each and every one. Next time, you make a mistake, act quickly and decisively. Take responsibility and look for the lesson. Use the lesson to reach greater success and good fortune in your life.
Learn from your mistakes. Only then can they empower you to greater heights.
Today's affirmation: I admit my mistakes and learn from them.
"Qualities you see in others, reveal something about you."
You may not realize it, but the characteristics you like or dislike in others delivers a special message.
If you respond to a person that you see as outgoing, positive and energetic, these are most likely qualities you possessed, but have not fully embraced or developed. Likewise, if you react to the overbearing nature of another, then do a self check,you may have tendency to be overbearing yourself.
Either way, turn into what you do and don't admire in others. Take notes of your emotion response. With this new found awareness you can't help but discover something you didn't know about yourself.
Today's affirmation:The qualities I respond to in others reflect who I am.
8/26:
I will waste not even a precious second today in anger or hate or jealousy or selfishness. I know that the seeds I sow I will harvest, because every action, good or bad, is always followed by an equal reaction. I will plant only good seeds this day.
-Og Mandino 1923-1996, Author and Speaker
9/17:
"Happiness is not the destination."
Many people slog for a lifetime looking for happiness. They feel guilty doing things they enjoy. So they take care of their jobs, their families and their children thinking that the time for happiness will come later. In the end, they discover that it doesn't.
Believe that you deserve happiness right now!
You know those early hours in the morning when the house is quiet and the mist is floating outside the window? That is happiness. Remember how it felt to hold your baby for the first time? That is happiness. Remember how you felt when you achieved what seemed like an impossible goal? That is happiness.
Do not defer your happiness to a later date. Find joy in what you do every day.
Today's affirmation: I deserve to be happy right now.
10/5
"Admit your mistake! It's your most empowering choice."
We all make mistakes. It's a fact of life. What matters is what you do afterward.
You have a choice. You can let your mistake torment you, embarrass you, and hold you back. Or, you can admit you made a mistake, learn from it, and move on.
Mistakes are lessons in disguise. You can learn something from each and every one. Next time, you make a mistake, act quickly and decisively. Take responsibility and look for the lesson. Use the lesson to reach greater success and good fortune in your life.
Learn from your mistakes. Only then can they empower you to greater heights.
Today's affirmation: I admit my mistakes and learn from them.
HA: NSB NSR ISSU
Modern high-performance routers architecturally separate the forwarding plane and the control plane into separate physical components, each with its own memory and processors. The control plane runs the routing protocols, maintains the necessary databases for route processing, and derives a forwarding table (FIB). The FIB is given to the forwarding plane, which is responsible for packet forwarding.
In fact the control plane could stop functioning altogether and because the forwarding plane is a separate entity with its own processors it can continue forwarding packets based on its copy of the FIB. This is Non-Stop Forwarding (NSF): The ability of the forwarding plane to continue running “headless” if the control plane stops.
Of course this is dangerous; if the network topology changes while the control plane is down there is no way to process new route information and the forwarding plane’s FIB can become invalid, resulting in incorrectly forwarded packets. So why would you even want NSF?
The answer is redundant control planes (Cisco calls their control planes Route Processors; Juniper calls them Routing Engines). NSF allows you to switch from a primary to a backup control plane without disrupting forwarding. The FIB could still become invalid during the period between when the primary control plane goes down and the backup control plane takes over, but the risk in this period is usually an acceptable compromise.
So if the backup control plane maintains a copy of the active configuration and current state on system components such as interfaces, it can become active much faster than if it had to learn all this information first. Cisco calls this Stateful Switchover (SSO) and Juniper calls it Graceful Routing Engine Switchover (GRES).
The problem with control plane switchovers as so far described, even if it uses stateful procedures to decrease the switchover time, is that routing protocol adjacencies are broken by the switchover. When a primary control plane goes down any neighboring router that had a peering session with it sees the peering session fail. When the backup control plane becomes active it re-establishes the adjacency, but in the interim the neighbor has advertised to its own neighbors that router X is no longer a valid next hop to any destinations beyond it, and the neighbors should find another path. And of course when the backup control plane comes on-line and reestablishes adjacencies its neighbors advertise the information that router X is again available as a next hop and everyone should again recalculate best paths. All of this is can be highly disruptive to the network.
The objective of NSR is to prevent, or at least minimize, the effect of broken peering sessions.
A first attempt at controlling broken adjacencies during control plane switchovers is Graceful Restart (GR) protocol extensions. Each routing protocol has its own specific GR extensions, but they all work pretty much the same. When a router’s control plane goes down its neighbors, rather than immediately reporting to their own neighbors that the router has become unavailable, wait a certain amount of time (the grace period). If the router’s control plane comes back up and reestablishes its peering sessions before the grace period expires, as would be the case during a control plane switchover, the temporarily broken peering sessions do not effect the network beyond the neighbors.
There are, however, a couple of problems with GR:
.Neighbors are required to support the GR protocol extensions. yet small CE routers are less likely to support GR.
.If there is a complete control plane or router failure rather than just a switchover, the GR grace period can slow network reconvergence.
A newer generation of NSR uses internal processes to keep the backup control plane aware of routing protocol state and adjacency maintenance activities, so that after a switchover the backup control plane can take charge of the existing peering sessions rather than having to establish new ones. The switchover is then transparent to the neighbors, and because the NSR process is internal (and vendor specific) there is no need for the neighbors to support any kind of protocol extension.
Here’s where the confusion comes in: Different vendors use these terms differently. Juniper, for example, calls its graceful restart implementation Graceful Restart, whereas Cisco calls its graceful restart implementation Non-Stop Forwarding Awareness (even though GR applies to routing, not forwarding). Juniper users often confuse GRES and GR: Although the “G” in both acronyms stands for “Graceful,” GRES and GR are two different things. And both Cisco and Juniper have internal NSR capabilities, but the circumstances in which each can be used are quite different.
So enjoy the circus, but be aware that different vendors sometimes use different names for essentially the same act. When a vendor talks about NSF, GR, and NSR, be sure you know that vendor’s.
RPR
RPR enables a quicker switchover between an active and standby RSP if the active RSP experiences a fatal error. When you configure RPR, the standby RSP loads a Cisco IOS image on bootup and initializes itself in standby mode. In the event of a fatal error on the active RSP, the system switches to the standby RSP, which reinitializes itself as the active RSP, reloads all of the line cards, and restarts the system.
RPR+
The RPR+ feature is an enhancement of the RPR feature. RPR+ keeps the VIPs from being reset and reloaded when a switchover occurs between the active and standby RSPs. Because VIPs are not reset and microcode is not reloaded on the VIPs, and the time needed to parse the configuration is eliminated, switchover time is reduced to 30 seconds.
SSO
SSO establishes one of the supervisor engines as active while the other supervisor engine is designated as standby, and then SSO synchronizes information between them. A switchover from the active to the redundant supervisor engine occurs when the active supervisor engine fails, or is removed from the router, or is manually shut down for maintenance. This type of switchover ensures that Layer 2 traffic is not interrupted.
SSO switchover preserves FIB and adjacency entries and can forward Layer 3 traffic after a switchover. Configuration information and data structures are synchronized from the active to the redundant supervisor engine at startup and whenever changes to the active supervisor engine configuration occur.
ISSU: In-Service Software Upgrade (ISSU) CISCO
Requires Dual RE
1. Primary and Standby Supervisors Running Current Image
2. Load New Image on Standby Supervisor
3. Make Standby Supervisor “Active” (<150ms)—Switch Now Running New Image
4. Rapid Rollback Option (<150ms) if Necessary
5. Load New Image on Primary Supervisor and Commit Change
In fact the control plane could stop functioning altogether and because the forwarding plane is a separate entity with its own processors it can continue forwarding packets based on its copy of the FIB. This is Non-Stop Forwarding (NSF): The ability of the forwarding plane to continue running “headless” if the control plane stops.
Of course this is dangerous; if the network topology changes while the control plane is down there is no way to process new route information and the forwarding plane’s FIB can become invalid, resulting in incorrectly forwarded packets. So why would you even want NSF?
The answer is redundant control planes (Cisco calls their control planes Route Processors; Juniper calls them Routing Engines). NSF allows you to switch from a primary to a backup control plane without disrupting forwarding. The FIB could still become invalid during the period between when the primary control plane goes down and the backup control plane takes over, but the risk in this period is usually an acceptable compromise.
So if the backup control plane maintains a copy of the active configuration and current state on system components such as interfaces, it can become active much faster than if it had to learn all this information first. Cisco calls this Stateful Switchover (SSO) and Juniper calls it Graceful Routing Engine Switchover (GRES).
The problem with control plane switchovers as so far described, even if it uses stateful procedures to decrease the switchover time, is that routing protocol adjacencies are broken by the switchover. When a primary control plane goes down any neighboring router that had a peering session with it sees the peering session fail. When the backup control plane becomes active it re-establishes the adjacency, but in the interim the neighbor has advertised to its own neighbors that router X is no longer a valid next hop to any destinations beyond it, and the neighbors should find another path. And of course when the backup control plane comes on-line and reestablishes adjacencies its neighbors advertise the information that router X is again available as a next hop and everyone should again recalculate best paths. All of this is can be highly disruptive to the network.
The objective of NSR is to prevent, or at least minimize, the effect of broken peering sessions.
A first attempt at controlling broken adjacencies during control plane switchovers is Graceful Restart (GR) protocol extensions. Each routing protocol has its own specific GR extensions, but they all work pretty much the same. When a router’s control plane goes down its neighbors, rather than immediately reporting to their own neighbors that the router has become unavailable, wait a certain amount of time (the grace period). If the router’s control plane comes back up and reestablishes its peering sessions before the grace period expires, as would be the case during a control plane switchover, the temporarily broken peering sessions do not effect the network beyond the neighbors.
There are, however, a couple of problems with GR:
.Neighbors are required to support the GR protocol extensions. yet small CE routers are less likely to support GR.
.If there is a complete control plane or router failure rather than just a switchover, the GR grace period can slow network reconvergence.
A newer generation of NSR uses internal processes to keep the backup control plane aware of routing protocol state and adjacency maintenance activities, so that after a switchover the backup control plane can take charge of the existing peering sessions rather than having to establish new ones. The switchover is then transparent to the neighbors, and because the NSR process is internal (and vendor specific) there is no need for the neighbors to support any kind of protocol extension.
Here’s where the confusion comes in: Different vendors use these terms differently. Juniper, for example, calls its graceful restart implementation Graceful Restart, whereas Cisco calls its graceful restart implementation Non-Stop Forwarding Awareness (even though GR applies to routing, not forwarding). Juniper users often confuse GRES and GR: Although the “G” in both acronyms stands for “Graceful,” GRES and GR are two different things. And both Cisco and Juniper have internal NSR capabilities, but the circumstances in which each can be used are quite different.
So enjoy the circus, but be aware that different vendors sometimes use different names for essentially the same act. When a vendor talks about NSF, GR, and NSR, be sure you know that vendor’s.
RPR
RPR enables a quicker switchover between an active and standby RSP if the active RSP experiences a fatal error. When you configure RPR, the standby RSP loads a Cisco IOS image on bootup and initializes itself in standby mode. In the event of a fatal error on the active RSP, the system switches to the standby RSP, which reinitializes itself as the active RSP, reloads all of the line cards, and restarts the system.
RPR+
The RPR+ feature is an enhancement of the RPR feature. RPR+ keeps the VIPs from being reset and reloaded when a switchover occurs between the active and standby RSPs. Because VIPs are not reset and microcode is not reloaded on the VIPs, and the time needed to parse the configuration is eliminated, switchover time is reduced to 30 seconds.
SSO
SSO establishes one of the supervisor engines as active while the other supervisor engine is designated as standby, and then SSO synchronizes information between them. A switchover from the active to the redundant supervisor engine occurs when the active supervisor engine fails, or is removed from the router, or is manually shut down for maintenance. This type of switchover ensures that Layer 2 traffic is not interrupted.
SSO switchover preserves FIB and adjacency entries and can forward Layer 3 traffic after a switchover. Configuration information and data structures are synchronized from the active to the redundant supervisor engine at startup and whenever changes to the active supervisor engine configuration occur.
ISSU: In-Service Software Upgrade (ISSU) CISCO
Requires Dual RE
1. Primary and Standby Supervisors Running Current Image
2. Load New Image on Standby Supervisor
3. Make Standby Supervisor “Active” (<150ms)—Switch Now Running New Image
4. Rapid Rollback Option (<150ms) if Necessary
5. Load New Image on Primary Supervisor and Commit Change
Friday, August 13, 2010
IP Header
Defined at RFC 791
A summary of the contents of the internet header follows:
Version: 4 bits
The Version field indicates the format of the internet header.
4 - ipv4
6 - ipv6
IHL: 4 bits
Internet Header Length is the length of the internet header in 32
bit words, and thus points to the beginning of the data.
5 - minimum value without option
Type of Service: 8 bits
Bits 0-2: Precedence.
Bit 3: 0 = Normal Delay, 1 = Low Delay.
Bits 4: 0 = Normal Throughput, 1 = High Throughput.
Bits 5: 0 = Normal Relibility, 1 = High Relibility.
Bit 6-7: Reserved for Future Use.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
Precedence
111 - Network Control
110 - Internetwork Control
101 - CRITIC/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine
Total Length: 16 bits
Total Length is the length of the datagram, measured in octets, including internet header and data.
Identification: 16 bits
An identifying value assigned by the sender to aid in assembling the
fragments of a datagram.
Flags: 3 bits
Various Control Flags.
Bit 0: reserved, must be zero
Bit 1: (DF) 0 = May Fragment, 1 = Don't Fragment.
Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.
0 1 2
+---+---+---+
| | D | M |
| 0 | F | F |
+---+---+---+
Fragment Offset: 13 bits
This field indicates where in the datagram this fragment belongs.
The fragment offset is measured in units of 8 octets (64 bits). The
first fragment has offset zero.
Time to Live: 8 bits
This field indicates the maximum time the datagram is allowed to
remain in the internet system.
Protocol: 8 bits
This field indicates the next level protocol used in the data
portion of the internet datagram. The values for various protocols
are specified in "Assigned Numbers" RFC 1700
1 - ICMP
2 - IGMP
6 - tcp
17 - udp
47 - GRE
88 - IGRP
Header Checksum: 16 bits
A checksum on the header only. Since some header fields change
(e.g., time to live), this is recomputed and verified at each point
that the internet header is processed.
Source Address: 32 bits
The source address.
Destination Address: 32 bits
The destination address.
Options: variable
The options may appear or not in datagrams.
A summary of the contents of the internet header follows:
Version: 4 bits
The Version field indicates the format of the internet header.
4 - ipv4
6 - ipv6
IHL: 4 bits
Internet Header Length is the length of the internet header in 32
bit words, and thus points to the beginning of the data.
5 - minimum value without option
Type of Service: 8 bits
Bits 0-2: Precedence.
Bit 3: 0 = Normal Delay, 1 = Low Delay.
Bits 4: 0 = Normal Throughput, 1 = High Throughput.
Bits 5: 0 = Normal Relibility, 1 = High Relibility.
Bit 6-7: Reserved for Future Use.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
Precedence
111 - Network Control
110 - Internetwork Control
101 - CRITIC/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine
Total Length: 16 bits
Total Length is the length of the datagram, measured in octets, including internet header and data.
Identification: 16 bits
An identifying value assigned by the sender to aid in assembling the
fragments of a datagram.
Flags: 3 bits
Various Control Flags.
Bit 0: reserved, must be zero
Bit 1: (DF) 0 = May Fragment, 1 = Don't Fragment.
Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.
0 1 2
+---+---+---+
| | D | M |
| 0 | F | F |
+---+---+---+
Fragment Offset: 13 bits
This field indicates where in the datagram this fragment belongs.
The fragment offset is measured in units of 8 octets (64 bits). The
first fragment has offset zero.
Time to Live: 8 bits
This field indicates the maximum time the datagram is allowed to
remain in the internet system.
Protocol: 8 bits
This field indicates the next level protocol used in the data
portion of the internet datagram. The values for various protocols
are specified in "Assigned Numbers" RFC 1700
1 - ICMP
2 - IGMP
6 - tcp
17 - udp
47 - GRE
88 - IGRP
Header Checksum: 16 bits
A checksum on the header only. Since some header fields change
(e.g., time to live), this is recomputed and verified at each point
that the internet header is processed.
Source Address: 32 bits
The source address.
Destination Address: 32 bits
The destination address.
Options: variable
The options may appear or not in datagrams.
IGMP
There are three versions of IGMP, IGMP v1 is defined by RFC 1112, IGMP v2 is defined by RFC 2236 and IGMP v3 is defined by RFC 3376.
IGMPv3 improves over IGMPv2 mainly by adding the ability to listen to multicast originating from a set of IP addresses only.
IGMP V1
IGMP V2
IGMP V3
IGMPv3 improves over IGMPv2 mainly by adding the ability to listen to multicast originating from a set of IP addresses only.
IGMP V1
IGMP V2
IGMP V3
Thursday, August 12, 2010
Ip routing Q&A
1.
Q: what are the administrative distances for routing protocols ?
A:
Directly connected route ------------ 0
Static route out an interface -------- 0
Static route to next-hop address ----- 1
EIGRP summary route ------------------ 5
External BGP ------------------------- 20
Internal EIGRP ----------------------- 90
IGRP --------------------------------- 100
OSPF --------------------------------- 110
IS-IS -------------------------------- 115
RIP ---------------------------------- 120
EGP ---------------------------------- 140
ODR ---------------------------------- 160
External EIGRP ----------------------- 170
Internal BGP ------------------------- 200
DHCP-learned ------------------------- 254
Unknown ------------------------------ 255
Notes:
An administrative distance of 255 will cause the router to disbelieve the route entirely and not use it.
Since IOS 12.2, the administrative distance of a static route with an exit interface is 1. Prior to the release of 12.2 it was in fact 0.
2.
Q: Can administrative distance be changed ?
A: You can modify the administrative distance of a protocol through the distance command. (CISCO)
3.
Q. What are private IP addresses?
A. The Internet Assigned Numbers Authority (IANA) has reserved the following three blocks of the IP address space for private internets (RFC1918)
10.0.0.0 - 10.255.255.255 (10/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16/12 prefix)
192.168.0.0 - 192.168.255.255 (192.168/16 prefix)
4.
Q. well known reserved ipv4 multicast address ?
A. reserved for IP multicasting and registered with the Internet Assigned Numbers Authority (IANA)
224.0.0.0 Base address (reserved)
224.0.0.1 All systems on the same network segment
224.0.0.2 All routers on the same network segment
224.0.0.5 OSPF AllSPFRouters address. Used to send Hello packets to all OSPF routers on a network segment
224.0.0.6 The OSPF AllDRouters address. Used to send OSPF routing information to OSPF designated routers on a network segment
224.0.0.9 The RIP version 2 group address. Used to send routing information using the RIP protocol to all RIP v2-aware routers on a network segment
224.0.0.10 EIGRP group address. Used to send EIGRP routing information to all EIGRP routers on a network segment
224.0.0.13 PIM Version 2 (Protocol Independent Multicast)
224.0.0.18 VRRP
224.0.0.19 - 21 IS-IS over IP
224.0.0.22 IGMP Version 3 (Internet Group Management Protocol)
224.0.0.102 Hot Standby Router Protocol Version 2
224.0.0.251 Multicast DNS address
224.0.0.252 Link-local Multicast Name Resolution address
224.0.1.1 Network Time Protocol address
224.0.1.39 Cisco Auto-RP-Announce address
224.0.1.40 Cisco Auto-RP-Discovery address
224.0.1.41 H.323 Gatekeeper discovery address
5.
Q. what is unregistered multicast packet ?
A. RFC4541: An unregistered packet is defined as an IPv4 multicast packet with a destination address which does not match any of the groups announced in earlier IGMP Membership Reports.
If a switch receives an unregistered packet, it must forward that packet on all ports to which an IGMP router is attached. A switch may default to forwarding unregistered packets on all ports.
Switches that do not forward unregistered packets to all ports must include a configuration option to force the flooding of unregistered packets on specified ports.
6.
Q: ethernet header format ?
A.
Ethernet type II: Type at the type/Length
IEEE 802.3 Frame:
IEEE 802.3 with SNAP header
802.3 Raw: Length at type/length. Novell's non-standart Same as 802.3, without the IEEE 802.2 LLC header. Novell's IPX is the only protocol that uses the 802.3 raw frame type
7.
Q: how to identify Ethernet type ?
A:
1. if the Type/Length field has higher than (0x05Dc), then it is Ethernet II, the Type/Length is type and date is followed immeditately.
2.If it is length, if DSAP is 0xAA, then it has SANP header.
3. For 802.3 Raw: Novell decided to use the first two bytes in the data portion of the packet, the IPX checksum field, to identify an 802.3 raw frame using the IPX/SPX protocol. All LAN drivers would use the value 0xFFFF in these two bytes to designate the packet as 802.3 raw.
3.
8.
Q. well known ether type:
A: For detail, please check RFC5342
0x0000 - 0x05DC: IEEE 802.3 length
0x0800 : IP
0x0806: ARP
0x86DD: Ipv6
9.
Q. what is L2TP ?
A: The Layer 2 Tunnel Protocol (L2TP) is IETF standard that combines two existing tunneling protocols: Cisco's Layer 2 Forwarding (L2F) and Microsoft's Point-to-Point Tunneling Protocol (PPTP). L2TP is an extension to the Point-to-Point Protocol (PPP), which is an important component for VPNs.
Although L2TP acts like a Data Link Layer protocol in the OSI model, L2TP is in fact a Session Layer protocol,[2] and uses the registered UDP port 1701.
10.
Q. What is L2PT ?
A. Layer 2 protocol tunneling allows Layer 2 protocol data units (PDUs) (CDP, STP, and VTP) to be tunneled through a network.
Juniper EX switch support L2PT from 10.0.
Q: what are the administrative distances for routing protocols ?
A:
Directly connected route ------------ 0
Static route out an interface -------- 0
Static route to next-hop address ----- 1
EIGRP summary route ------------------ 5
External BGP ------------------------- 20
Internal EIGRP ----------------------- 90
IGRP --------------------------------- 100
OSPF --------------------------------- 110
IS-IS -------------------------------- 115
RIP ---------------------------------- 120
EGP ---------------------------------- 140
ODR ---------------------------------- 160
External EIGRP ----------------------- 170
Internal BGP ------------------------- 200
DHCP-learned ------------------------- 254
Unknown ------------------------------ 255
Notes:
An administrative distance of 255 will cause the router to disbelieve the route entirely and not use it.
Since IOS 12.2, the administrative distance of a static route with an exit interface is 1. Prior to the release of 12.2 it was in fact 0.
2.
Q: Can administrative distance be changed ?
A: You can modify the administrative distance of a protocol through the distance command. (CISCO)
3.
Q. What are private IP addresses?
A. The Internet Assigned Numbers Authority (IANA) has reserved the following three blocks of the IP address space for private internets (RFC1918)
10.0.0.0 - 10.255.255.255 (10/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16/12 prefix)
192.168.0.0 - 192.168.255.255 (192.168/16 prefix)
4.
Q. well known reserved ipv4 multicast address ?
A. reserved for IP multicasting and registered with the Internet Assigned Numbers Authority (IANA)
224.0.0.0 Base address (reserved)
224.0.0.1 All systems on the same network segment
224.0.0.2 All routers on the same network segment
224.0.0.5 OSPF AllSPFRouters address. Used to send Hello packets to all OSPF routers on a network segment
224.0.0.6 The OSPF AllDRouters address. Used to send OSPF routing information to OSPF designated routers on a network segment
224.0.0.9 The RIP version 2 group address. Used to send routing information using the RIP protocol to all RIP v2-aware routers on a network segment
224.0.0.10 EIGRP group address. Used to send EIGRP routing information to all EIGRP routers on a network segment
224.0.0.13 PIM Version 2 (Protocol Independent Multicast)
224.0.0.18 VRRP
224.0.0.19 - 21 IS-IS over IP
224.0.0.22 IGMP Version 3 (Internet Group Management Protocol)
224.0.0.102 Hot Standby Router Protocol Version 2
224.0.0.251 Multicast DNS address
224.0.0.252 Link-local Multicast Name Resolution address
224.0.1.1 Network Time Protocol address
224.0.1.39 Cisco Auto-RP-Announce address
224.0.1.40 Cisco Auto-RP-Discovery address
224.0.1.41 H.323 Gatekeeper discovery address
5.
Q. what is unregistered multicast packet ?
A. RFC4541: An unregistered packet is defined as an IPv4 multicast packet with a destination address which does not match any of the groups announced in earlier IGMP Membership Reports.
If a switch receives an unregistered packet, it must forward that packet on all ports to which an IGMP router is attached. A switch may default to forwarding unregistered packets on all ports.
Switches that do not forward unregistered packets to all ports must include a configuration option to force the flooding of unregistered packets on specified ports.
6.
Q: ethernet header format ?
A.
Ethernet type II: Type at the type/Length
IEEE 802.3 Frame:
IEEE 802.3 with SNAP header
802.3 Raw: Length at type/length. Novell's non-standart Same as 802.3, without the IEEE 802.2 LLC header. Novell's IPX is the only protocol that uses the 802.3 raw frame type
7.
Q: how to identify Ethernet type ?
A:
1. if the Type/Length field has higher than (0x05Dc), then it is Ethernet II, the Type/Length is type and date is followed immeditately.
2.If it is length, if DSAP is 0xAA, then it has SANP header.
3. For 802.3 Raw: Novell decided to use the first two bytes in the data portion of the packet, the IPX checksum field, to identify an 802.3 raw frame using the IPX/SPX protocol. All LAN drivers would use the value 0xFFFF in these two bytes to designate the packet as 802.3 raw.
3.
8.
Q. well known ether type:
A: For detail, please check RFC5342
0x0000 - 0x05DC: IEEE 802.3 length
0x0800 : IP
0x0806: ARP
0x86DD: Ipv6
9.
Q. what is L2TP ?
A: The Layer 2 Tunnel Protocol (L2TP) is IETF standard that combines two existing tunneling protocols: Cisco's Layer 2 Forwarding (L2F) and Microsoft's Point-to-Point Tunneling Protocol (PPTP). L2TP is an extension to the Point-to-Point Protocol (PPP), which is an important component for VPNs.
Although L2TP acts like a Data Link Layer protocol in the OSI model, L2TP is in fact a Session Layer protocol,[2] and uses the registered UDP port 1701.
10.
Q. What is L2PT ?
A. Layer 2 protocol tunneling allows Layer 2 protocol data units (PDUs) (CDP, STP, and VTP) to be tunneled through a network.
Juniper EX switch support L2PT from 10.0.
Thursday, August 5, 2010
BGP
The BGP, which is defined at RFC 1771 allows you to create loop-free interdomain routing between autonomous systems (ASs). An AS is a set of routers under a single technical administration. Routers in an AS can use IGP to exchange routing information.
BGP uses TCP with port 179. Two BGP routers form a TCP connection between one another.
When BGP runs between routers that belong to two different ASs, this is called exterior BGP (eBGP). When BGP runs between routers in the same AS, this is called iBGP.
The use of a loopback interface to define neighbors is common with iBGP, but is not common with eBGP.
If you use the IP address of a loopback interface in the neighbor command, you need some extra configuration on the neighbor router.
neighbor ip-address update-source interface
For eBGP, if you use non directed connected interface address as neighbor, ebgp-mulihop is needed. The multihop is only for eBGP and not for iBGP. You still need to make sure the neighor is reachable at both sides. You can configure an IGP or static routing.
There is heavy use of route maps with BGP. In the BGP context, the route map is a method to control and modify routing information. The control and modification of routing information occurs through the definition of conditions for route redistribution from one routing protocol to another. Or the control of routing information can occur at injection in and out of BGP.
There are two instances of the route map defined at below, with the name MYMAP. The first instance has a sequence number of 10, and the second has a sequence number of 20.
route-map MYMAP permit 10 (The first set of conditions goes here.)
route-map MYMAP permit 20 (The second set of conditions goes here.)
When you apply route map MYMAP to incoming or outgoing routes, the first set of conditions are applied via instance 10. If the first set of conditions is not met, you proceed to a higher instance of the route map.
Each route map consists of a list of match and set configuration commands. The match specifies a match criteria, and set specifies a set action if the criteria that the match command enforces are met.
If the match criteria are met and you have a permit, there is a redistribution or control of the routes, as the set action specifies. You break out of the list.
If the match criteria are met and you have a deny, there is no redistribution or control of the route. You break out of the list.
If the match criteria are not met and you have a permit or deny, the next instance of the route map is checked. This next-instance check continues until you either break out or finish all the instances of the route map. If you finish the list without a match, the route is not accepted nor forwarded.
The related commands for match are:
match as-path
match community
match clns
match interface
match ip address
match ip next-hop
match ip route-source
match metric
match route-type
match tag
The related commands for set are:
set as-path
set clns
set automatic-tag
set community
set interface
set default interface
set ip default next-hop
set level
set local-preference
set metric
set metric-type
set next-hop
set origin
set tag
set weight
There are multiple ways to send network information with use of BGP:network Command, Redistribution, Static Routes and Redistribution
network network-number [mask network-mask]
The network command controls the networks that originate from this box. The command uses a mask portion because BGP version 4 (BGP4) can handle subnetting and supernetting. A maximum of 200 entries of the network command are acceptable. The network command works if the router knows the network that you attempt to advertise, whether connected, static, or learned dynamically.
Another way is to redistribute your IGP into BGP. Apply careful filtering to make sure that you send to the Internet-only routes that you want to advertise and not to all the routes that you have.
You can always use static routes to originate a network or a subnet. The only difference is that BGP considers these routes to have an origin that is incomplete, or unknown
Redistribution is always the method for injection of BGP into IGP
Remember that when a BGP speaker receives an update from other BGP speakers in its own AS (iBGP), the BGP speaker that receives the update does not redistribute that information to other BGP speakers in its own AS. The BGP speaker that receives the update redistributes the information to other BGP speakers outside of its AS. Therefore, sustain a full mesh between the iBGP speakers within an AS.
BGP Decision Algorithm
After BGP receives updates about different destinations from different autonomous systems, the protocol must choose paths to reach a specific destination. BGP chooses only a single path to reach a specific destination.
BGP bases the decision on different attributes, such as next hop, administrative weights, local preference, route origin, path length, origin code, metric, and other attributes.
BGP always propagates the best path to the neighbors.
BGP assigns the first valid path as the current best path. BGP then compares the best path with the next path in the list, until BGP reaches the end of the list of valid paths. This list provides the rules that are used to determine the best path:
1. Prefer the path with the highest WEIGHT.
Note: WEIGHT is a Cisco-specific parameter. It is local to the router on which it is configured.
2. Prefer the path with the highest LOCAL_PREF.
Note: A path without LOCAL_PREF is considered to have had the value set with the bgp default local-preference command, or to have a value of 100 by default.
3. Prefer the path that was locally originated via a network or aggregate BGP subcommand or through redistribution from an IGP.
Local paths that are sourced by the network or redistribute commands are preferred over local aggregates that are sourced by the aggregate-address command.
4.Prefer the path with the shortest AS_PATH.
. This step is skipped if you have configured the bgp bestpath as-path ignore command.
.An AS_SET counts as 1, no matter how many ASs are in the set.
.The AS_CONFED_SEQUENCE and AS_CONFED_SET are not included in the AS_PATH length.
5. Prefer the path with the lowest origin type.
Note: IGP is lower than Exterior Gateway Protocol (EGP), and EGP is lower than INCOMPLETE.
6. Prefer the path with the lowest multi-exit discriminator (MED).
Note: Be aware of these items:
.This comparison only occurs if the first (the neighboring) AS is the same in the two paths. Any confederation sub-ASs are ignored.
In other words, MEDs are compared only if the first AS in the AS_SEQUENCE is the same for multiple paths. Any preceding AS_CONFED_SEQUENCE is ignored.
.If bgp always-compare-med is enabled, MEDs are compared for all paths.
You must disable this option over the entire AS. Otherwise, routing loops can occur.
. If bgp bestpath med-confed is enabled, MEDs are compared for all paths that consist only of AS_CONFED_SEQUENCE.
These paths originated within the local confederation.
.THE MED of paths that are received from a neighbor with a MED of 4,294,967,295 is changed before insertion into the BGP table. The MED changes to to 4,294,967,294.
.Paths received with no MED are assigned a MED of 0, unless you have enabled bgp bestpath med missing-as-worst .
If you have enabled bgp bestpath med missing-as-worst, the paths are assigned a MED of 4,294,967,294.
.The bgp deterministic med command can also influence this step.
Refer to How BGP Routers Use the Multi-Exit Discriminator for Best Path Selection
7. Prefer eBGP over iBGP paths.
If bestpath is selected, go to Step 9 (multipath).
Note: Paths that contain AS_CONFED_SEQUENCE and AS_CONFED_SET are local to the confederation. Therefore, these paths are treated as internal paths. There is no distinction between Confederation External and Confederation Internal.
8. Prefer the path with the lowest IGP metric to the BGP next hop.
Continue, even if bestpath is already selected.
9. Determine if multiple paths require installation in the routing table for BGP Multipath.
Continue, if bestpath is not yet selected.
10. When both paths are external, prefer the path that was received first (the oldest one).
Skip this step if any of these items is true:
.You have enabled the bgp best path compare-routerid command.
. The router ID is the same for multiple paths because the routes were received from the same router.
. There is no current best path.
The current best path can be lost when, for example, the neighbor that offers the path goes down.
11. Prefer the route that comes from the BGP router with the lowest router ID.
The router ID is the highest IP address on the router, with preference given to loopback addresses. Also, you can use the bgp router-id command to manually set the router ID.
Note: If a path contains route reflector (RR) attributes, the originator ID is substituted for the router ID in the path selection process.
12. If the originator or router ID is the same for multiple paths, prefer the path with the minimum cluster list length.
This is only present in BGP RR environments. It allows clients to peer with RRs or clients in other clusters. In this scenario, the client must be aware of the RR-specific BGP attribute.
13. Prefer the path that comes from the lowest neighbor address.
This address is the IP address that is used in the BGP neighbor configuration. The address corresponds to the remote peer that is used in the TCP connection with the local router.
BGP attribute
As_path Attribute
Whenever a route update passes through an AS, the AS number is prepended to that update. The AS_path attribute is actually the list of AS numbers that a route has traversed in order to reach a destination. An AS-SET is an ordered mathematical set {} of all the ASs that have been traversed.
In the above example, network 190.10.0.0 is advertised by RTB in AS200, when that route traverses AS300 and RTC will append its own AS number to it. So when 190.10.0.0 reaches RTA it will have two AS numbers attached to it: first 200 then 300. So as far as RTA is concerned the path to reach 190.10.0.0 is (300,200).
Origin Attribute
The origin is a mandatory attribute that defines the origin of the path information. The origin attribute can assume three values:
IGP: Network Layer Reachability Information (NLRI) is interior to the originating AS. This normally happens when we use the bgp network command or when IGP is redistributed into BGP, then the origin of the path info will be IGP. This is indicated with an "i" in the BGP table.
EGP: NLRI is learned via EGP (Exterior Gateway Protocol). This is
indicated with an "e" in the BGP table.
INCOMPLETE: NLRI is unknown or learned via some other means. This usually occurs when we redistribute a static route into BGP and the origin of the route will be incomplete. This is indicated with an "?" in the BGP table.
Nexthop Attribute
The BGP nexthop attribute is the next hop IP address that is going to be used to reach a certain destination. For EBGP, the next hop is always the IP address of the neighbor specified in the neighbor command.
Special care should be taken when dealing with multiaccess and NBMA networks.
BGP Nexthop (Multiaccess Networks)
Assume that RTC and RTD in AS300 are running OSPF. RTC is running BGP with RTA. RTC can reach network 180.20.0.0 via 170.10.20.3. When RTC sends a BGP update to RTA regarding 180.20.0.0 it will use as next hop 170.10.20.3 and not its own IP address (170.10.20.2). This is because the network between RTA, RTC and RTD is a multiaccess network and it makes more sense for RTA to use RTD as a next hop to reach 180.20.0.0 rather than making an extra hop via RTC.
*RTC will advertise 180.20.0.0 to RTA with a NextHop 170.10.20.3.
BGP Nexthop (NBMA)
BGP backdoor
Usually when a route is learned via EBGP, it is installed in the IP routing table because of its distance (20). Sometimes, however, two ASs have an IGP-learned backdoor route and an EBGP-learned route. Their policy might be to use the IGP-learned path as the preferred path and to use the EBGP-learned path when the IGP path is down.
All igps default distances are higher than the default distance of EBGP (which is 20). Usually, the route with the lowest distance is preferred.
If you want igp routes be chosen, you could use one of the following techniques:
.Change the external distance of EBGP. (Not recommended because the distance will affect all updates, which might lead to undesirable behavior when multiple routing protocols interact with one another.)
•Change the distance of the IGP. (Not recommended because the distance will affect all updates, which might lead to undesirable behavior when multiple routing protocols interact with one another.)
•Establish a BGP back door. (Recommended)
To establish a BGP back door, use the network backdoor router configuration command.
router bgp 100
network 160.10.0.0 backdoor
with the network backdoor command, Router A treats the EBGP-learned route as local and installs it in the IP routing table with a distance of 200. The network is also learned igp, so it is successfully installed in the IP routing table and is used to forward traffic. If the Enhanced IGP-learned route goes down, the EBGP-learned route will be installed in the IP routing table and used to forward traffic.
Configuration sample:
links:
bgp case study
bgp tutorial
BGP uses TCP with port 179. Two BGP routers form a TCP connection between one another.
When BGP runs between routers that belong to two different ASs, this is called exterior BGP (eBGP). When BGP runs between routers in the same AS, this is called iBGP.
The use of a loopback interface to define neighbors is common with iBGP, but is not common with eBGP.
If you use the IP address of a loopback interface in the neighbor command, you need some extra configuration on the neighbor router.
neighbor ip-address update-source interface
For eBGP, if you use non directed connected interface address as neighbor, ebgp-mulihop is needed. The multihop is only for eBGP and not for iBGP. You still need to make sure the neighor is reachable at both sides. You can configure an IGP or static routing.
There is heavy use of route maps with BGP. In the BGP context, the route map is a method to control and modify routing information. The control and modification of routing information occurs through the definition of conditions for route redistribution from one routing protocol to another. Or the control of routing information can occur at injection in and out of BGP.
There are two instances of the route map defined at below, with the name MYMAP. The first instance has a sequence number of 10, and the second has a sequence number of 20.
route-map MYMAP permit 10 (The first set of conditions goes here.)
route-map MYMAP permit 20 (The second set of conditions goes here.)
When you apply route map MYMAP to incoming or outgoing routes, the first set of conditions are applied via instance 10. If the first set of conditions is not met, you proceed to a higher instance of the route map.
Each route map consists of a list of match and set configuration commands. The match specifies a match criteria, and set specifies a set action if the criteria that the match command enforces are met.
If the match criteria are met and you have a permit, there is a redistribution or control of the routes, as the set action specifies. You break out of the list.
If the match criteria are met and you have a deny, there is no redistribution or control of the route. You break out of the list.
If the match criteria are not met and you have a permit or deny, the next instance of the route map is checked. This next-instance check continues until you either break out or finish all the instances of the route map. If you finish the list without a match, the route is not accepted nor forwarded.
The related commands for match are:
match as-path
match community
match clns
match interface
match ip address
match ip next-hop
match ip route-source
match metric
match route-type
match tag
The related commands for set are:
set as-path
set clns
set automatic-tag
set community
set interface
set default interface
set ip default next-hop
set level
set local-preference
set metric
set metric-type
set next-hop
set origin
set tag
set weight
There are multiple ways to send network information with use of BGP:network Command, Redistribution, Static Routes and Redistribution
network network-number [mask network-mask]
The network command controls the networks that originate from this box. The command uses a mask portion because BGP version 4 (BGP4) can handle subnetting and supernetting. A maximum of 200 entries of the network command are acceptable. The network command works if the router knows the network that you attempt to advertise, whether connected, static, or learned dynamically.
Another way is to redistribute your IGP into BGP. Apply careful filtering to make sure that you send to the Internet-only routes that you want to advertise and not to all the routes that you have.
You can always use static routes to originate a network or a subnet. The only difference is that BGP considers these routes to have an origin that is incomplete, or unknown
Redistribution is always the method for injection of BGP into IGP
Remember that when a BGP speaker receives an update from other BGP speakers in its own AS (iBGP), the BGP speaker that receives the update does not redistribute that information to other BGP speakers in its own AS. The BGP speaker that receives the update redistributes the information to other BGP speakers outside of its AS. Therefore, sustain a full mesh between the iBGP speakers within an AS.
BGP Decision Algorithm
After BGP receives updates about different destinations from different autonomous systems, the protocol must choose paths to reach a specific destination. BGP chooses only a single path to reach a specific destination.
BGP bases the decision on different attributes, such as next hop, administrative weights, local preference, route origin, path length, origin code, metric, and other attributes.
BGP always propagates the best path to the neighbors.
BGP assigns the first valid path as the current best path. BGP then compares the best path with the next path in the list, until BGP reaches the end of the list of valid paths. This list provides the rules that are used to determine the best path:
1. Prefer the path with the highest WEIGHT.
Note: WEIGHT is a Cisco-specific parameter. It is local to the router on which it is configured.
2. Prefer the path with the highest LOCAL_PREF.
Note: A path without LOCAL_PREF is considered to have had the value set with the bgp default local-preference command, or to have a value of 100 by default.
3. Prefer the path that was locally originated via a network or aggregate BGP subcommand or through redistribution from an IGP.
Local paths that are sourced by the network or redistribute commands are preferred over local aggregates that are sourced by the aggregate-address command.
4.Prefer the path with the shortest AS_PATH.
. This step is skipped if you have configured the bgp bestpath as-path ignore command.
.An AS_SET counts as 1, no matter how many ASs are in the set.
.The AS_CONFED_SEQUENCE and AS_CONFED_SET are not included in the AS_PATH length.
5. Prefer the path with the lowest origin type.
Note: IGP is lower than Exterior Gateway Protocol (EGP), and EGP is lower than INCOMPLETE.
6. Prefer the path with the lowest multi-exit discriminator (MED).
Note: Be aware of these items:
.This comparison only occurs if the first (the neighboring) AS is the same in the two paths. Any confederation sub-ASs are ignored.
In other words, MEDs are compared only if the first AS in the AS_SEQUENCE is the same for multiple paths. Any preceding AS_CONFED_SEQUENCE is ignored.
.If bgp always-compare-med is enabled, MEDs are compared for all paths.
You must disable this option over the entire AS. Otherwise, routing loops can occur.
. If bgp bestpath med-confed is enabled, MEDs are compared for all paths that consist only of AS_CONFED_SEQUENCE.
These paths originated within the local confederation.
.THE MED of paths that are received from a neighbor with a MED of 4,294,967,295 is changed before insertion into the BGP table. The MED changes to to 4,294,967,294.
.Paths received with no MED are assigned a MED of 0, unless you have enabled bgp bestpath med missing-as-worst .
If you have enabled bgp bestpath med missing-as-worst, the paths are assigned a MED of 4,294,967,294.
.The bgp deterministic med command can also influence this step.
Refer to How BGP Routers Use the Multi-Exit Discriminator for Best Path Selection
7. Prefer eBGP over iBGP paths.
If bestpath is selected, go to Step 9 (multipath).
Note: Paths that contain AS_CONFED_SEQUENCE and AS_CONFED_SET are local to the confederation. Therefore, these paths are treated as internal paths. There is no distinction between Confederation External and Confederation Internal.
8. Prefer the path with the lowest IGP metric to the BGP next hop.
Continue, even if bestpath is already selected.
9. Determine if multiple paths require installation in the routing table for BGP Multipath.
Continue, if bestpath is not yet selected.
10. When both paths are external, prefer the path that was received first (the oldest one).
Skip this step if any of these items is true:
.You have enabled the bgp best path compare-routerid command.
. The router ID is the same for multiple paths because the routes were received from the same router.
. There is no current best path.
The current best path can be lost when, for example, the neighbor that offers the path goes down.
11. Prefer the route that comes from the BGP router with the lowest router ID.
The router ID is the highest IP address on the router, with preference given to loopback addresses. Also, you can use the bgp router-id command to manually set the router ID.
Note: If a path contains route reflector (RR) attributes, the originator ID is substituted for the router ID in the path selection process.
12. If the originator or router ID is the same for multiple paths, prefer the path with the minimum cluster list length.
This is only present in BGP RR environments. It allows clients to peer with RRs or clients in other clusters. In this scenario, the client must be aware of the RR-specific BGP attribute.
13. Prefer the path that comes from the lowest neighbor address.
This address is the IP address that is used in the BGP neighbor configuration. The address corresponds to the remote peer that is used in the TCP connection with the local router.
BGP attribute
As_path Attribute
Whenever a route update passes through an AS, the AS number is prepended to that update. The AS_path attribute is actually the list of AS numbers that a route has traversed in order to reach a destination. An AS-SET is an ordered mathematical set {} of all the ASs that have been traversed.
In the above example, network 190.10.0.0 is advertised by RTB in AS200, when that route traverses AS300 and RTC will append its own AS number to it. So when 190.10.0.0 reaches RTA it will have two AS numbers attached to it: first 200 then 300. So as far as RTA is concerned the path to reach 190.10.0.0 is (300,200).
Origin Attribute
The origin is a mandatory attribute that defines the origin of the path information. The origin attribute can assume three values:
IGP: Network Layer Reachability Information (NLRI) is interior to the originating AS. This normally happens when we use the bgp network command or when IGP is redistributed into BGP, then the origin of the path info will be IGP. This is indicated with an "i" in the BGP table.
EGP: NLRI is learned via EGP (Exterior Gateway Protocol). This is
indicated with an "e" in the BGP table.
INCOMPLETE: NLRI is unknown or learned via some other means. This usually occurs when we redistribute a static route into BGP and the origin of the route will be incomplete. This is indicated with an "?" in the BGP table.
Nexthop Attribute
The BGP nexthop attribute is the next hop IP address that is going to be used to reach a certain destination. For EBGP, the next hop is always the IP address of the neighbor specified in the neighbor command.
Special care should be taken when dealing with multiaccess and NBMA networks.
BGP Nexthop (Multiaccess Networks)
Assume that RTC and RTD in AS300 are running OSPF. RTC is running BGP with RTA. RTC can reach network 180.20.0.0 via 170.10.20.3. When RTC sends a BGP update to RTA regarding 180.20.0.0 it will use as next hop 170.10.20.3 and not its own IP address (170.10.20.2). This is because the network between RTA, RTC and RTD is a multiaccess network and it makes more sense for RTA to use RTD as a next hop to reach 180.20.0.0 rather than making an extra hop via RTC.
*RTC will advertise 180.20.0.0 to RTA with a NextHop 170.10.20.3.
BGP Nexthop (NBMA)
BGP backdoor
Usually when a route is learned via EBGP, it is installed in the IP routing table because of its distance (20). Sometimes, however, two ASs have an IGP-learned backdoor route and an EBGP-learned route. Their policy might be to use the IGP-learned path as the preferred path and to use the EBGP-learned path when the IGP path is down.
All igps default distances are higher than the default distance of EBGP (which is 20). Usually, the route with the lowest distance is preferred.
If you want igp routes be chosen, you could use one of the following techniques:
.Change the external distance of EBGP. (Not recommended because the distance will affect all updates, which might lead to undesirable behavior when multiple routing protocols interact with one another.)
•Change the distance of the IGP. (Not recommended because the distance will affect all updates, which might lead to undesirable behavior when multiple routing protocols interact with one another.)
•Establish a BGP back door. (Recommended)
To establish a BGP back door, use the network backdoor router configuration command.
router bgp 100
network 160.10.0.0 backdoor
with the network backdoor command, Router A treats the EBGP-learned route as local and installs it in the IP routing table with a distance of 200. The network is also learned igp, so it is successfully installed in the IP routing table and is used to forward traffic. If the Enhanced IGP-learned route goes down, the EBGP-learned route will be installed in the IP routing table and used to forward traffic.
Configuration sample:
links:
bgp case study
bgp tutorial
Tuesday, August 3, 2010
OSPF
OSPF is a link-state protocol. A link is an interface on the router. The state of the link is a description of that interface and of its relationship to its neighboring routers. A description of the interface would include, for example, the IP address of the interface, the mask, the type of network it is connected to, the routers connected to that network and so on. The collection of all these link-states would form a link-state database.
OSPF uses a shorted path first algorithm in order to build and calculate the shortest path to all known destinations.The shortest path is calculated with the use of the Dijkstra algorithm.
The cost of an interface is inversely proportional to the bandwidth of that interface. A higher bandwidth indicates a lower cost. The formula used to calculate the cost is:
cost= 10000 0000/bandwith in bps
By default, the cost of an interface is calculated based on the bandwidth; you can force the cost of an interface.
OSPF uses flooding to exchange link-state updates between routers. Any change in routing information is flooded to all routers in the network. Areas are introduced to put a boundary on the explosion of link-state updates. Flooding and calculation of the Dijkstra algorithm on a router is limited to changes within an area. All routers within an area have the exact link-state database.
A router that has all of its interfaces within the same area is called an internal router (IR). A router that has interfaces in multiple areas is called an area border router (ABR). Routers that act as gateways (redistribution)between OSPF and other routing protocols (IGRP, EIGRP, IS-IS, RIP, BGP, Static) or other instances of the OSPF routing process are called autonomous system boundary router (ASBR). Any router can be an ABR or an ASBR.
There are different types of Link State Packets. The different types are illustrated in the following diagram:
the router links are an indication of the state of the interfaces on a router belonging to a certain area. Each router will generate a router link for all of its interfaces.
Summary links are generated by ABRs; this is how network reachability information is disseminated between areas. Normally, all information is injected into the backbone (area 0) and in turn the backbone will pass it on to other areas. ABRs also have the task of propagating the reachability of the ASBR. This is how routers know how to get to external routes in other ASs.
Network Links are generated by a Designated Router (DR) on a segment (DRs will be discussed later). This information is an indication of all routers connected to a particular multi-access segment such as Ethernet, Token Ring and FDDI (NBMA also).
External Links are an indication of networks outside of the AS. These networks are injected into OSPF via redistribution. The ASBR has the task of injecting these routes into an autonomous system.
It is possible to authenticate the OSPF packets such that routers can participate in routing domains based on predefined passwords. By default, a router uses a Null authentication which means that routing exchanges over a network are not authenticated. Two other authentication methods exist: Simple password authentication and Message Digest authentication (MD-5).
OSPF has special restrictions when multiple areas are involved. If more than one area is configured, one of these areas has be to be area 0. This is called the backbone.
all areas have to be directly connected to the backbone. In the rare situations where a new area is introduced that cannot have a direct physical access to the backbone, a virtual link will have to be configured. Routes that are generated from within an area (the destination belongs to the area) are called intra-area routes. These routes are normally represented by the letter O in the IP routing table. Routes that originate from other areas are called inter-area or Summary routes. The notation for these routes is O IA in the IP routing table. Routes that originate from other routing protocols (or different OSPF processes) and that are injected into OSPF via redistribution are called external routes. These routes are represented by O E2 or O E1 in the IP routing table. Multiple routes to the same destination are preferred in the following order: intra-area, inter-area, external E1, external E2.
The OSPF router-id is usually the highest IP address on the box, or the highest loopback address if one exists.
Virtual links are used for two purposes:
.Linking an area that does not have a physical connection to the backbone.
.Patching the backbone in case discontinuity of area 0 occurs.
Neighbors
Routers that share a common segment become neighbors on that segment. Neighbors are elected via the Hello protocol. Hello packets are sent periodically out of each interface using IP multicast (Appendix B). Routers become neighbors as soon as they see themselves listed in the neighbor's Hello packet. This way, a two way communication is guaranteed. Neighbor negotiation applies to the primary address only. Secondary addresses can be configured on an interface with a restriction that they have to belong to the same area as the primary address.
Two routers will not become neighbors unless they agree on the following:
Area-id: Two routers having a common segment; their interfaces have to belong to the same area on that segment. Of course, the interfaces should belong to the same subnet and have a similar mask.
Authentication: OSPF allows for the configuration of a password for a specific area. Routers that want to become neighbors have to exchange the same password on a particular segment.
Hello and Dead Intervals: OSPF exchanges Hello packets on each segment. This is a form of keepalive used by routers in order to acknowledge their existence on a segment and in order to elect a designated router (DR) on multiaccess segments.The Hello interval specifies the length of time, in seconds, between the hello packets that a router sends on an OSPF interface. The dead interval is the number of seconds that a router's Hello packets have not been seen before its neighbors declare the OSPF router down.
Stub area flag: Two routers have to also agree on the stub area flag in the Hello packets in order to become neighbors. Stub areas will be discussed in a later section. Keep in mind for now that defining stub areas will affect the neighbor election process.
Adjacencies
Adjacency is the next step after the neighboring process. Adjacent routers are routers that go beyond the simple Hello exchange and proceed into the database exchange process. In order to minimize the amount of information exchange on a particular segment, OSPF elects one router to be a designated router (DR), and one router to be a backup designated router (BDR), on each multi-access segment. The BDR is elected as a backup mechanism in case the DR goes down. The idea behind this is that routers have a central point of contact for information exchange. Instead of each router exchanging updates with every other router on the segment, every router exchanges information with the DR and BDR. The DR and BDR relay the information to everybody else. In mathematical terms, this cuts the information exchange from O(n*n) to O(n) where n is the number of routers on a multi-access segment.
DR and BDR election is done via the Hello protocol. Hello packets are exchanged via IP multicast packets on each segment. The router with the highest OSPF priority on a segment will become the DR for that segment. The same process is repeated for the BDR. In case of a tie, the router with the highest RID will win. The default for the interface OSPF priority is one. Remember that the DR and BDR concepts are per multiaccess segment. A priority value of zero indicates an interface which is not to be elected as DR or BDR. The state of the interface with priority zero will be DROTHER.
Building the Adjacency
The adjacency building process takes effect after multiple stages have been fulfilled. Routers that become adjacent will have the exact link-state database.
Down: No information has been received from anybody on the segment.
Attempt: On non-broadcast multi-access clouds such as Frame Relay and X.25, this state indicates that no recent information has been received from the neighbor. An effort should be made to contact the neighbor by sending Hello packets at the reduced rate PollInterval.
Init: The interface has detected a Hello packet coming from a neighbor but bi-directional communication has not yet been established.
Two-way: There is bi-directional communication with a neighbor. The router has seen itself in the Hello packets coming from a neighbor. At the end of this stage the DR and BDR election would have been done. At the end of the 2way stage, routers will decide whether to proceed in building an adjacency or not. The decision is based on whether one of the routers is a DR or BDR or the link is a point-to-point or a virtual link.
Exstart: Routers are trying to establish the initial sequence number that is going to be used in the information exchange packets. The sequence number insures that routers always get the most recent information. One router will become the primary and the other will become secondary master/slave). The primary router will poll the secondary for information.
Exchange: Routers will describe their entire link-state database by sending database description packets. At this state, packets could be flooded to other interfaces on the router.
Loading: At this state, routers are finalizing the information exchange. Routers have built a link-state request list and a link-state retransmission list. Any information that looks incomplete or outdated will be put on the request list. Any update that is sent will be put on the retransmission list until it gets acknowledged.
Full: At this state, the adjacency is complete. The neighboring routers are fully adjacent. Adjacent routers will have a similar link-state database.
OSPF will always form an adjacency with the neighbor on the other side of a point-to-point interface such as point-to-point serial lines. There is no concept of DR or BDR. The state of the serial interfaces is point to point.
Special care should be taken when configuring OSPF over multi-access non-broadcast medias such as Frame Relay, X.25, ATM. The protocol considers these media like any other broadcast media such as Ethernet. NBMA clouds are usually built in a hub and spoke topology.
OSPF and Route Summarization
Summarizing is the consolidation of multiple routes into one single advertisement. This is normally done at the boundaries of Area Border Routers (ABRs). Although summarization could be configured between any two areas, it is better to summarize in the direction of the backbone. This way the backbone receives all the aggregate addresses and in turn will injects them, already summarized, into other areas. There are two types of summarization:
.Inter-area route summarization
.External route summarization
Inter-area route summarization is done on ABRs and it applies to routes from within the AS. It does not apply to external routes injected into OSPF via redistribution. In order to take advantage of summarization, network numbers in areas should be assigned in a contiguous way to be able to lump these addresses into one range.
External route summarization is specific to external routes that are injected into OSPF via redistribution. Also, make sure that external ranges that are being summarized are contiguous.
OSPF allows certain areas to be configured as stub areas. External networks, such as those redistributed from other protocols into OSPF, are not allowed to be flooded into a stub area. Routing from these areas to the outside world is based on a default route.
An area could be qualified a stub when there is a single exit point from that area or if routing to outside of the area does not have to take an optimal path.
Other stub area restrictions are that a stub area cannot be used as a transit area for virtual links. Also, an ASBR cannot be internal to a stub area.
All OSPF routers inside a stub area have to be configured as stub routers.
An extension to stub areas is what is called "totally stubby areas". Cisco indicates this by adding a "no-summary" keyword to the stub area configuration. A totally stubby area is one that blocks external routes and summary routes (inter-area routes) from going into the area. This way, intra-area routes and the default of 0.0.0.0 are the only routes injected into that area.
External routes fall under two categories, external type 1 and external type 2. The difference between the two is in the way the cost (metric) of the route is being calculated. The cost of a type 2 route is always the external cost, irrespective of the interior cost to reach that route. A type 1 cost is the addition of the external cost and the internal cost used to reach that route. A type 1 route is always preferred over a type 2 route for the same destination.
Injecting Defaults into OSPF
An autonomous system boundary router (ASBR) can be forced to generate a default route into the OSPF domain. As discussed earlier, a router becomes an ASBR whenever routes are redistributed into an OSPF domain. However, an ASBR does not, by default, generate a default route into the OSPF routing domain.
To have OSPF generate a default route use the following:
default-information originate [always] [metric metric-value] [metric-type type-value] [route-map map-name]
There are two ways to generate a default. The first is to advertise 0.0.0.0 inside the domain, but only if the ASBR itself already has a default route. The second is to advertise 0.0.0.0 regardless whether the ASBR has a default route. The latter can be set by adding the keyword always. You should be careful when using the always keyword. If your router advertises a default (0.0.0.0) inside the domain and does not have a default itself or a path to reach the destinations, routing will be broken.
Lsa type:
1 Router Link advertisements. Generated by each router for each area it belongs to. They describe the states of the router's link to the area. These are only flooded within a particular area.
2 Network Link advertisements. Generated by Designated Routers. They describe the set of routers attached to a particular network. Flooded in the area that contains the network.
3 or 4 Summary Link advertisements. Generated by Area Border routers. They describe inter-area (between areas) routes. Type 3 describes routes to networks, also used for aggregating routes. Type 4 describes routes to ASBR.
5 AS external link advertisements. Originated by ASBR. They describe routes to destinations external to the AS. Flooded all over except stub areas.
The OSPF not-so-stubby area (NSSA) feature is described by RFC 1587.
Redistribution into an NSSA area creates a special type of link-state advertisement (LSA) known as type 7, which can only exist in an NSSA area. An NSSA autonomous system boundary router (ASBR) generates this LSA and an NSSA area border router (ABR) translates it into a type 5 LSA, which gets propagated into the OSPF domain.
In order to make a stub area into an NSSA, issue this command under the OSPF configuration,This command must be configured on every single router in Area 1.
router ospf 1
Area 1 nssa
In order to configure an NSSA totally stub area, issue this command under the OSPF configuration,Configure this command on NSSA ABRs only
router ospf 1
Area 1 nssa no-summary
Original link:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094e9e.shtml
OSPF uses a shorted path first algorithm in order to build and calculate the shortest path to all known destinations.The shortest path is calculated with the use of the Dijkstra algorithm.
The cost of an interface is inversely proportional to the bandwidth of that interface. A higher bandwidth indicates a lower cost. The formula used to calculate the cost is:
cost= 10000 0000/bandwith in bps
By default, the cost of an interface is calculated based on the bandwidth; you can force the cost of an interface.
OSPF uses flooding to exchange link-state updates between routers. Any change in routing information is flooded to all routers in the network. Areas are introduced to put a boundary on the explosion of link-state updates. Flooding and calculation of the Dijkstra algorithm on a router is limited to changes within an area. All routers within an area have the exact link-state database.
A router that has all of its interfaces within the same area is called an internal router (IR). A router that has interfaces in multiple areas is called an area border router (ABR). Routers that act as gateways (redistribution)between OSPF and other routing protocols (IGRP, EIGRP, IS-IS, RIP, BGP, Static) or other instances of the OSPF routing process are called autonomous system boundary router (ASBR). Any router can be an ABR or an ASBR.
There are different types of Link State Packets. The different types are illustrated in the following diagram:
the router links are an indication of the state of the interfaces on a router belonging to a certain area. Each router will generate a router link for all of its interfaces.
Summary links are generated by ABRs; this is how network reachability information is disseminated between areas. Normally, all information is injected into the backbone (area 0) and in turn the backbone will pass it on to other areas. ABRs also have the task of propagating the reachability of the ASBR. This is how routers know how to get to external routes in other ASs.
Network Links are generated by a Designated Router (DR) on a segment (DRs will be discussed later). This information is an indication of all routers connected to a particular multi-access segment such as Ethernet, Token Ring and FDDI (NBMA also).
External Links are an indication of networks outside of the AS. These networks are injected into OSPF via redistribution. The ASBR has the task of injecting these routes into an autonomous system.
It is possible to authenticate the OSPF packets such that routers can participate in routing domains based on predefined passwords. By default, a router uses a Null authentication which means that routing exchanges over a network are not authenticated. Two other authentication methods exist: Simple password authentication and Message Digest authentication (MD-5).
OSPF has special restrictions when multiple areas are involved. If more than one area is configured, one of these areas has be to be area 0. This is called the backbone.
all areas have to be directly connected to the backbone. In the rare situations where a new area is introduced that cannot have a direct physical access to the backbone, a virtual link will have to be configured. Routes that are generated from within an area (the destination belongs to the area) are called intra-area routes. These routes are normally represented by the letter O in the IP routing table. Routes that originate from other areas are called inter-area or Summary routes. The notation for these routes is O IA in the IP routing table. Routes that originate from other routing protocols (or different OSPF processes) and that are injected into OSPF via redistribution are called external routes. These routes are represented by O E2 or O E1 in the IP routing table. Multiple routes to the same destination are preferred in the following order: intra-area, inter-area, external E1, external E2.
The OSPF router-id is usually the highest IP address on the box, or the highest loopback address if one exists.
Virtual links are used for two purposes:
.Linking an area that does not have a physical connection to the backbone.
.Patching the backbone in case discontinuity of area 0 occurs.
Neighbors
Routers that share a common segment become neighbors on that segment. Neighbors are elected via the Hello protocol. Hello packets are sent periodically out of each interface using IP multicast (Appendix B). Routers become neighbors as soon as they see themselves listed in the neighbor's Hello packet. This way, a two way communication is guaranteed. Neighbor negotiation applies to the primary address only. Secondary addresses can be configured on an interface with a restriction that they have to belong to the same area as the primary address.
Two routers will not become neighbors unless they agree on the following:
Area-id: Two routers having a common segment; their interfaces have to belong to the same area on that segment. Of course, the interfaces should belong to the same subnet and have a similar mask.
Authentication: OSPF allows for the configuration of a password for a specific area. Routers that want to become neighbors have to exchange the same password on a particular segment.
Hello and Dead Intervals: OSPF exchanges Hello packets on each segment. This is a form of keepalive used by routers in order to acknowledge their existence on a segment and in order to elect a designated router (DR) on multiaccess segments.The Hello interval specifies the length of time, in seconds, between the hello packets that a router sends on an OSPF interface. The dead interval is the number of seconds that a router's Hello packets have not been seen before its neighbors declare the OSPF router down.
Stub area flag: Two routers have to also agree on the stub area flag in the Hello packets in order to become neighbors. Stub areas will be discussed in a later section. Keep in mind for now that defining stub areas will affect the neighbor election process.
Adjacencies
Adjacency is the next step after the neighboring process. Adjacent routers are routers that go beyond the simple Hello exchange and proceed into the database exchange process. In order to minimize the amount of information exchange on a particular segment, OSPF elects one router to be a designated router (DR), and one router to be a backup designated router (BDR), on each multi-access segment. The BDR is elected as a backup mechanism in case the DR goes down. The idea behind this is that routers have a central point of contact for information exchange. Instead of each router exchanging updates with every other router on the segment, every router exchanges information with the DR and BDR. The DR and BDR relay the information to everybody else. In mathematical terms, this cuts the information exchange from O(n*n) to O(n) where n is the number of routers on a multi-access segment.
DR and BDR election is done via the Hello protocol. Hello packets are exchanged via IP multicast packets on each segment. The router with the highest OSPF priority on a segment will become the DR for that segment. The same process is repeated for the BDR. In case of a tie, the router with the highest RID will win. The default for the interface OSPF priority is one. Remember that the DR and BDR concepts are per multiaccess segment. A priority value of zero indicates an interface which is not to be elected as DR or BDR. The state of the interface with priority zero will be DROTHER.
Building the Adjacency
The adjacency building process takes effect after multiple stages have been fulfilled. Routers that become adjacent will have the exact link-state database.
Down: No information has been received from anybody on the segment.
Attempt: On non-broadcast multi-access clouds such as Frame Relay and X.25, this state indicates that no recent information has been received from the neighbor. An effort should be made to contact the neighbor by sending Hello packets at the reduced rate PollInterval.
Init: The interface has detected a Hello packet coming from a neighbor but bi-directional communication has not yet been established.
Two-way: There is bi-directional communication with a neighbor. The router has seen itself in the Hello packets coming from a neighbor. At the end of this stage the DR and BDR election would have been done. At the end of the 2way stage, routers will decide whether to proceed in building an adjacency or not. The decision is based on whether one of the routers is a DR or BDR or the link is a point-to-point or a virtual link.
Exstart: Routers are trying to establish the initial sequence number that is going to be used in the information exchange packets. The sequence number insures that routers always get the most recent information. One router will become the primary and the other will become secondary master/slave). The primary router will poll the secondary for information.
Exchange: Routers will describe their entire link-state database by sending database description packets. At this state, packets could be flooded to other interfaces on the router.
Loading: At this state, routers are finalizing the information exchange. Routers have built a link-state request list and a link-state retransmission list. Any information that looks incomplete or outdated will be put on the request list. Any update that is sent will be put on the retransmission list until it gets acknowledged.
Full: At this state, the adjacency is complete. The neighboring routers are fully adjacent. Adjacent routers will have a similar link-state database.
OSPF will always form an adjacency with the neighbor on the other side of a point-to-point interface such as point-to-point serial lines. There is no concept of DR or BDR. The state of the serial interfaces is point to point.
Special care should be taken when configuring OSPF over multi-access non-broadcast medias such as Frame Relay, X.25, ATM. The protocol considers these media like any other broadcast media such as Ethernet. NBMA clouds are usually built in a hub and spoke topology.
OSPF and Route Summarization
Summarizing is the consolidation of multiple routes into one single advertisement. This is normally done at the boundaries of Area Border Routers (ABRs). Although summarization could be configured between any two areas, it is better to summarize in the direction of the backbone. This way the backbone receives all the aggregate addresses and in turn will injects them, already summarized, into other areas. There are two types of summarization:
.Inter-area route summarization
.External route summarization
Inter-area route summarization is done on ABRs and it applies to routes from within the AS. It does not apply to external routes injected into OSPF via redistribution. In order to take advantage of summarization, network numbers in areas should be assigned in a contiguous way to be able to lump these addresses into one range.
External route summarization is specific to external routes that are injected into OSPF via redistribution. Also, make sure that external ranges that are being summarized are contiguous.
OSPF allows certain areas to be configured as stub areas. External networks, such as those redistributed from other protocols into OSPF, are not allowed to be flooded into a stub area. Routing from these areas to the outside world is based on a default route.
An area could be qualified a stub when there is a single exit point from that area or if routing to outside of the area does not have to take an optimal path.
Other stub area restrictions are that a stub area cannot be used as a transit area for virtual links. Also, an ASBR cannot be internal to a stub area.
All OSPF routers inside a stub area have to be configured as stub routers.
An extension to stub areas is what is called "totally stubby areas". Cisco indicates this by adding a "no-summary" keyword to the stub area configuration. A totally stubby area is one that blocks external routes and summary routes (inter-area routes) from going into the area. This way, intra-area routes and the default of 0.0.0.0 are the only routes injected into that area.
External routes fall under two categories, external type 1 and external type 2. The difference between the two is in the way the cost (metric) of the route is being calculated. The cost of a type 2 route is always the external cost, irrespective of the interior cost to reach that route. A type 1 cost is the addition of the external cost and the internal cost used to reach that route. A type 1 route is always preferred over a type 2 route for the same destination.
Injecting Defaults into OSPF
An autonomous system boundary router (ASBR) can be forced to generate a default route into the OSPF domain. As discussed earlier, a router becomes an ASBR whenever routes are redistributed into an OSPF domain. However, an ASBR does not, by default, generate a default route into the OSPF routing domain.
To have OSPF generate a default route use the following:
default-information originate [always] [metric metric-value] [metric-type type-value] [route-map map-name]
There are two ways to generate a default. The first is to advertise 0.0.0.0 inside the domain, but only if the ASBR itself already has a default route. The second is to advertise 0.0.0.0 regardless whether the ASBR has a default route. The latter can be set by adding the keyword always. You should be careful when using the always keyword. If your router advertises a default (0.0.0.0) inside the domain and does not have a default itself or a path to reach the destinations, routing will be broken.
Lsa type:
1 Router Link advertisements. Generated by each router for each area it belongs to. They describe the states of the router's link to the area. These are only flooded within a particular area.
2 Network Link advertisements. Generated by Designated Routers. They describe the set of routers attached to a particular network. Flooded in the area that contains the network.
3 or 4 Summary Link advertisements. Generated by Area Border routers. They describe inter-area (between areas) routes. Type 3 describes routes to networks, also used for aggregating routes. Type 4 describes routes to ASBR.
5 AS external link advertisements. Originated by ASBR. They describe routes to destinations external to the AS. Flooded all over except stub areas.
The OSPF not-so-stubby area (NSSA) feature is described by RFC 1587.
Redistribution into an NSSA area creates a special type of link-state advertisement (LSA) known as type 7, which can only exist in an NSSA area. An NSSA autonomous system boundary router (ASBR) generates this LSA and an NSSA area border router (ABR) translates it into a type 5 LSA, which gets propagated into the OSPF domain.
In order to make a stub area into an NSSA, issue this command under the OSPF configuration,This command must be configured on every single router in Area 1.
router ospf 1
Area 1 nssa
In order to configure an NSSA totally stub area, issue this command under the OSPF configuration,Configure this command on NSSA ABRs only
router ospf 1
Area 1 nssa no-summary
Original link:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094e9e.shtml
Tuesday, July 27, 2010
BPDU protection
EX-series switches provide Layer 2 loop prevention through Spanning Tree Protocol (STP), Rapid Spanning Tree protocol (RSTP), and Multiple Spanning Tree Protocol (MSTP). Configure BPDU protection on interfaces to prevent them from receiving BPDUs that could result in STP misconfigurations, which could lead to network outages.
Enable BPDU protection on switch interfaces connected to user devices or on interfaces on which no BPDUs are expected, such as edge ports. If a BPDU is received on a BPDU-protected interface, the interface is disabled and stops forwarding frames.
Configure BPDU protection, use below CLI:
set protocols rstp interface ge-0/0/5 edge
set protocols rstp interface ge-0/0/6 edge
set protocols rstp bpdu-block-on-edge
Use below cli to check edge bpdu protoction is configured correctly:
ser@switch> show spanning-tree interface
Spanning tree interface parameters for instance 0
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/2.0 128:515 128:515 32768.0019e2503f00 20000 BLK DIS
ge-0/0/3.0 128:516 128:516 32768.0019e2503f00 20000 FWD DESG
ge-0/0/4.0 128:517 128:517 32768.0019e2503f00 20000 FWD DESG
ge-0/0/5.0 128:518 128:518 32768.0019e2503f00 20000 BLK DIS (Bpdu—Incon) <<<<<<<
ge-0/0/6.0 128:519 128:519 32768.0019e2503f00 20000 BLK DIS (Bpdu—Incon) <<<<<<<
ge-0/0/7.0 128:520 128:1 16384.00aabbcc0348 20000 FWD ROOT
ge-0/0/8.0 128:521 128:521 32768.0019e2503f00 20000 FWD DESG
When BPDUs arereceived from interface ge-0/0/5.0 and interface ge-0/0/6.0 the output from the operational mode command show spanning-tree interface shows that the interfaces have transitioned to a BPDU inconsistent state. The BPDU inconsistent state makes the interfaces block and prevents them from forwarding traffic.
Disabling the BPDU protection configuration on an interface does not unblock the interface. If the disable-timeout statement has been included in the BPDU configuration, the interface automatically returns to service after the timer expires. Otherwise, use the operational mode command clear ethernet-switching bpdu-error to unblock the interface.
Enable BPDU protection on switch interfaces connected to user devices or on interfaces on which no BPDUs are expected, such as edge ports. If a BPDU is received on a BPDU-protected interface, the interface is disabled and stops forwarding frames.
Configure BPDU protection, use below CLI:
set protocols rstp interface ge-0/0/5 edge
set protocols rstp interface ge-0/0/6 edge
set protocols rstp bpdu-block-on-edge
Use below cli to check edge bpdu protoction is configured correctly:
ser@switch> show spanning-tree interface
Spanning tree interface parameters for instance 0
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/2.0 128:515 128:515 32768.0019e2503f00 20000 BLK DIS
ge-0/0/3.0 128:516 128:516 32768.0019e2503f00 20000 FWD DESG
ge-0/0/4.0 128:517 128:517 32768.0019e2503f00 20000 FWD DESG
ge-0/0/5.0 128:518 128:518 32768.0019e2503f00 20000 BLK DIS (Bpdu—Incon) <<<<<<<
ge-0/0/6.0 128:519 128:519 32768.0019e2503f00 20000 BLK DIS (Bpdu—Incon) <<<<<<<
ge-0/0/7.0 128:520 128:1 16384.00aabbcc0348 20000 FWD ROOT
ge-0/0/8.0 128:521 128:521 32768.0019e2503f00 20000 FWD DESG
When BPDUs arereceived from interface ge-0/0/5.0 and interface ge-0/0/6.0 the output from the operational mode command show spanning-tree interface shows that the interfaces have transitioned to a BPDU inconsistent state. The BPDU inconsistent state makes the interfaces block and prevents them from forwarding traffic.
Disabling the BPDU protection configuration on an interface does not unblock the interface. If the disable-timeout statement has been included in the BPDU configuration, the interface automatically returns to service after the timer expires. Otherwise, use the operational mode command clear ethernet-switching bpdu-error to unblock the interface.
Monday, July 12, 2010
pvst/pvst+
In Ethernet switched environments where multiple Virtual LANs exist, spanning tree can be deployed per Virtual LAN. Cisco's name for this is per VLAN spanning tree (PVST and PVST+, which is the default protocol used by Cisco switches). Both PVST and PVST+ protocols are Cisco proprietary protocols and they cannot be used on 3rd party switches, although Force10 Networks and Extreme Networks support PVST+, Extreme Networks does so with two limitations (lack of support on ports where the VLAN is untagged/native and also on the VLAN with ID 1). PVST works only with ISL (Cisco's proprietary protocol for VLAN encapsulation) due to its embedded Spanning tree ID. Due to high penetration of the IEEE 802.1Q VLAN trunking standard and PVST's dependence on ISL, Cisco defined a different PVST+ standard for 802.1Q encapsulation. PVST+ can tunnel across a MSTP Region.
PVST works only on ISL trunks. That is because an ISL trunk natively
supports multiple spanning trees per vlan. A ISL header has a bit
dedicated to indicate that the packet is a BPDU packet or not, so it is
very easy to seperate different BPDUs of different VLANs:
BPDU Flag VLAN TAG
BPDU VLAN 10 1 10
Normal packet VLAN 10 0 10
BPDU VLAN 15 1 15
Normal packet VLAN 15 0 15
etc.....
PVST+ is a modification of PVST which allows per vlan spanning trees
over standard 802.1q links.
802.1q does NOT natively support multiple spanning tree instances, only
one instance. BPDU packets on a 802.1q link are not tagged and are
transported on the native vlan. So only one spanning tree could be
supported.
Also, Cisco could not change the packet header of an 802.1q packet,
since it was a standard. So how did they manage to transport different
spanning trees over a standard 802.1q trunk ???
A standard BPDU packet is sent to the mac address: 01-80-C2-00-00-00,
untagged.
in PVST+, the BPDUs of the native vlan are transported like the
standard, untagged. the BPDUs of the other vlans are transported TAGGED
to the Cisco shared spanning tree mac address: 01-00-0C-CC-CC-CD
The other end - also a Cisco - understands this mac address and finds
the BPDUs of the other vlans. If the other end is NOT a Cisco, the mac address is flooded across the native vlan. This allows Cisco switches to maintain a per-vlan spanning tree across non-cisco switches.
----------
Cisco switches run different types of STP protocol, depending on whether the connected port is access, ISL trunk or 802.1q trunk. Natively, a Cisco switch runs a separate STP instance for each configured and active VLAN (this is called Per-VLAN Spanning Tree or PVST) and standard IEEE compliant switches run just one instance of STP protocol shared by all VLANs.
Access Ports
Cisco switches run classic version of IEEE STP protocol on the access ports. The IEEE STP BPDUs are sent to IEEE reserved multicast MAC address “0180.C200.0000” using IEEE 802.2 LLC SAP encapsulation with both SSAP and DSAP fields equal to “0×42”. Note that you can plug any standard IEEE compliant switch into a Cisco switch access port and they will interoperate perfectly, joining the respective access VLAN STP instance with the IEEE STP instance (MST).
ISL Trunks
Across ISL trunks, Cisco switches run PVST (Per-VLAN Spanning Tree). (Note that PVST feature is limited to ISL trunks only). The same IEEE STP BPDUs are sent for each VLAN, encapsulated in additional ISL header (which also carries the VLAN number). Since PVST BPDUs have the same format as IEEE BPDUs (that is IEEE 802.2 LLC SAP) they can be matched using the same SSAP/DSAP values of “0×42” for the purpose of Layer 2 filtering. The group of Cisco switches connected using ISL trunks only is called PVST region.
802.1q Trunks
Across 802.1q trunks, Cisco switches run PVST+ (Per VLAN Spanning Tree Plus). The goal of PVST+ is to interoperate with standard IEEE STP (MST) and allow transparent tunneling of PVST instance BPDUs across MST region (to potentially connect to other Cisco switches across the MST region).
interoperability
a group of Cisco switches connected using ISL trunks only is called PVST region.
a group of Cisco switches connected using 802.1q trunks as PVST+ region.
PVST+ region may connect to a PVST region using an ISL trunk and connect to MST region using a 802.1q trunk. The STP instances in PVST and PVST+ regions maps directly to each other, so no special interoperability solution is required. However, on MST side only one STP instance exists, contrary to many STP instances of PVST+ region. The first question is: if we want to interoperate with MST, which PVST VLAN’s STP instance should be joined with MST? Cisco chooses VLAN 1 for this purpose. The joined together instances of Cisco VLAN 1 STP and MST are called “Common Spanning Tree” or CST (naturally, CST spans PVST, PVST+ and MST regions).
Case 1: Cisco switch connects to MST switch across a 802.1q trunk with default native VLAN (VLAN 1)
MST (standard IEEE switch) side sends IEEE STP BPDUs to IEEE multicast MAC address. Those BPDUs are consumed and processed by VLAN 1 STP instance on Cisco switch (PVST+ region).
PVST+ side (Cisco switch) sends IEEE STP BPDUs corresponding to local VLAN 1 STP to IEEE MAC address as untagged frames across the link. At the same time, special new SSTP (shared spanning tree, synonym to PVST+) BPDUs are being sent to SSTP multicast MAC address “0100.0ccc.cccd” also untagged. Those SSTP BPDUs are encapsulated using IEEE 802.2 LLC SNAP header (SSAP=DSAP=”0xAA” and SNAP PID=”0×010B”). The BPDUs contain the same information as the parallel IEEE STP BPDUs for VLAN 1, but have some additional fields, notably special TLV with the source VLAN number. Note that IEEE switches do not interpret the SSTP BPDUs, but simply them flood through the respective VLAN topology, in case there are other Cisco switches connected to MST cloud.
As for non-native VLANs (VLANs 2-4095) Cisco switch sends only SSTP BPDUs, tagged with respective VLAN number and destined to the SSTP MAC address. (Please remember that all SSTP BPDUs carry a VLAN number they belong to). The respective VLAN STP instances are “transparently expanded” across the MST region, considering it as a “virtual hub”. (Note that this may have some traffic engineering implications, since to non-CST VLANs the cost of traversing MSTP region equals to the cost of the link used to connect to the first MSTP switch).
Now the question is, why would Cisco switch send the same VLAN1 BPDU twice – towards IEEE and SSTP multicast MAC addresses? Isn’t it supposed for the Cisco switch to join its VLAN 1 STP instance with the MST? The reason for sending additional SSTP BPDUs across VLAN 1 is purely informational, to perform consistency checking. The idea is to inform all other potential Cisco switches attached to MST cloud about our native VLAN. The receiving switch will only use IEEE BPDUs for VLAN 1 (CST) computations and will ignore SSTP BPDUs sent on VLAN 1.
Lastly, for the purpose of layer 2 filtering, remember that you can match SSTP BPDUs using an ethertype value “0×010B”.This works with multilayer switches even though SSTP BPDUs are SNAP encapsulated, and the actual field is not “ethertype” but rather a SNAP Protocol ID.
Case 2: Cisco switch connects to MST switch across a 802.1q trunk with non-default native VLAN (e.g VLAN 100).
MST (standard switch) side sends IEEE STP BPDUs to IEEE multicast MAC address and those BPDUs are processed by VLAN 1 (CST) STP instance in the Cisco switch.
PVST+ side (Cisco switch) sends untagged IEEE STP BPDUs corresponding to VLAN 1 (CST) STP to IEEE MAC address across the link. This is done for the purpose of joining the local VLAN 1 instance and the MSTP instance into CST. At the same time, VLAN 1 BPDUs are replicated to SSTP multicast address, tagged with VLAN 1 number (to inform other Cisco switches that VLAN 1 is non-native on our switch). Finally, BPDUs of the native VLAN instance (VLAN 100 in our case) are sent untagged using SSTP encapsulation and destination address. Of course, native VLAN100 BPDUs, (even though they are untagged) carry VLAN number inside a special TLV SSTP header.
As in Case 1 for the remaining non-native VLANs (VLANs 2-4095) Cisco switch sends SSTP BPDU only, tagged with respective VLAN tag and destined to the SSTP MAC address. The other Cisco switches connected to the MSTP cloud receive the SSTP BPDUs and process the using the respective VLAN STP instances.
PVST works only on ISL trunks. That is because an ISL trunk natively
supports multiple spanning trees per vlan. A ISL header has a bit
dedicated to indicate that the packet is a BPDU packet or not, so it is
very easy to seperate different BPDUs of different VLANs:
BPDU Flag VLAN TAG
BPDU VLAN 10 1 10
Normal packet VLAN 10 0 10
BPDU VLAN 15 1 15
Normal packet VLAN 15 0 15
etc.....
PVST+ is a modification of PVST which allows per vlan spanning trees
over standard 802.1q links.
802.1q does NOT natively support multiple spanning tree instances, only
one instance. BPDU packets on a 802.1q link are not tagged and are
transported on the native vlan. So only one spanning tree could be
supported.
Also, Cisco could not change the packet header of an 802.1q packet,
since it was a standard. So how did they manage to transport different
spanning trees over a standard 802.1q trunk ???
A standard BPDU packet is sent to the mac address: 01-80-C2-00-00-00,
untagged.
in PVST+, the BPDUs of the native vlan are transported like the
standard, untagged. the BPDUs of the other vlans are transported TAGGED
to the Cisco shared spanning tree mac address: 01-00-0C-CC-CC-CD
The other end - also a Cisco - understands this mac address and finds
the BPDUs of the other vlans. If the other end is NOT a Cisco, the mac address is flooded across the native vlan. This allows Cisco switches to maintain a per-vlan spanning tree across non-cisco switches.
----------
Cisco switches run different types of STP protocol, depending on whether the connected port is access, ISL trunk or 802.1q trunk. Natively, a Cisco switch runs a separate STP instance for each configured and active VLAN (this is called Per-VLAN Spanning Tree or PVST) and standard IEEE compliant switches run just one instance of STP protocol shared by all VLANs.
Access Ports
Cisco switches run classic version of IEEE STP protocol on the access ports. The IEEE STP BPDUs are sent to IEEE reserved multicast MAC address “0180.C200.0000” using IEEE 802.2 LLC SAP encapsulation with both SSAP and DSAP fields equal to “0×42”. Note that you can plug any standard IEEE compliant switch into a Cisco switch access port and they will interoperate perfectly, joining the respective access VLAN STP instance with the IEEE STP instance (MST).
ISL Trunks
Across ISL trunks, Cisco switches run PVST (Per-VLAN Spanning Tree). (Note that PVST feature is limited to ISL trunks only). The same IEEE STP BPDUs are sent for each VLAN, encapsulated in additional ISL header (which also carries the VLAN number). Since PVST BPDUs have the same format as IEEE BPDUs (that is IEEE 802.2 LLC SAP) they can be matched using the same SSAP/DSAP values of “0×42” for the purpose of Layer 2 filtering. The group of Cisco switches connected using ISL trunks only is called PVST region.
802.1q Trunks
Across 802.1q trunks, Cisco switches run PVST+ (Per VLAN Spanning Tree Plus). The goal of PVST+ is to interoperate with standard IEEE STP (MST) and allow transparent tunneling of PVST instance BPDUs across MST region (to potentially connect to other Cisco switches across the MST region).
interoperability
a group of Cisco switches connected using ISL trunks only is called PVST region.
a group of Cisco switches connected using 802.1q trunks as PVST+ region.
PVST+ region may connect to a PVST region using an ISL trunk and connect to MST region using a 802.1q trunk. The STP instances in PVST and PVST+ regions maps directly to each other, so no special interoperability solution is required. However, on MST side only one STP instance exists, contrary to many STP instances of PVST+ region. The first question is: if we want to interoperate with MST, which PVST VLAN’s STP instance should be joined with MST? Cisco chooses VLAN 1 for this purpose. The joined together instances of Cisco VLAN 1 STP and MST are called “Common Spanning Tree” or CST (naturally, CST spans PVST, PVST+ and MST regions).
Case 1: Cisco switch connects to MST switch across a 802.1q trunk with default native VLAN (VLAN 1)
MST (standard IEEE switch) side sends IEEE STP BPDUs to IEEE multicast MAC address. Those BPDUs are consumed and processed by VLAN 1 STP instance on Cisco switch (PVST+ region).
PVST+ side (Cisco switch) sends IEEE STP BPDUs corresponding to local VLAN 1 STP to IEEE MAC address as untagged frames across the link. At the same time, special new SSTP (shared spanning tree, synonym to PVST+) BPDUs are being sent to SSTP multicast MAC address “0100.0ccc.cccd” also untagged. Those SSTP BPDUs are encapsulated using IEEE 802.2 LLC SNAP header (SSAP=DSAP=”0xAA” and SNAP PID=”0×010B”). The BPDUs contain the same information as the parallel IEEE STP BPDUs for VLAN 1, but have some additional fields, notably special TLV with the source VLAN number. Note that IEEE switches do not interpret the SSTP BPDUs, but simply them flood through the respective VLAN topology, in case there are other Cisco switches connected to MST cloud.
As for non-native VLANs (VLANs 2-4095) Cisco switch sends only SSTP BPDUs, tagged with respective VLAN number and destined to the SSTP MAC address. (Please remember that all SSTP BPDUs carry a VLAN number they belong to). The respective VLAN STP instances are “transparently expanded” across the MST region, considering it as a “virtual hub”. (Note that this may have some traffic engineering implications, since to non-CST VLANs the cost of traversing MSTP region equals to the cost of the link used to connect to the first MSTP switch).
Now the question is, why would Cisco switch send the same VLAN1 BPDU twice – towards IEEE and SSTP multicast MAC addresses? Isn’t it supposed for the Cisco switch to join its VLAN 1 STP instance with the MST? The reason for sending additional SSTP BPDUs across VLAN 1 is purely informational, to perform consistency checking. The idea is to inform all other potential Cisco switches attached to MST cloud about our native VLAN. The receiving switch will only use IEEE BPDUs for VLAN 1 (CST) computations and will ignore SSTP BPDUs sent on VLAN 1.
Lastly, for the purpose of layer 2 filtering, remember that you can match SSTP BPDUs using an ethertype value “0×010B”.This works with multilayer switches even though SSTP BPDUs are SNAP encapsulated, and the actual field is not “ethertype” but rather a SNAP Protocol ID.
Case 2: Cisco switch connects to MST switch across a 802.1q trunk with non-default native VLAN (e.g VLAN 100).
MST (standard switch) side sends IEEE STP BPDUs to IEEE multicast MAC address and those BPDUs are processed by VLAN 1 (CST) STP instance in the Cisco switch.
PVST+ side (Cisco switch) sends untagged IEEE STP BPDUs corresponding to VLAN 1 (CST) STP to IEEE MAC address across the link. This is done for the purpose of joining the local VLAN 1 instance and the MSTP instance into CST. At the same time, VLAN 1 BPDUs are replicated to SSTP multicast address, tagged with VLAN 1 number (to inform other Cisco switches that VLAN 1 is non-native on our switch). Finally, BPDUs of the native VLAN instance (VLAN 100 in our case) are sent untagged using SSTP encapsulation and destination address. Of course, native VLAN100 BPDUs, (even though they are untagged) carry VLAN number inside a special TLV SSTP header.
As in Case 1 for the remaining non-native VLANs (VLANs 2-4095) Cisco switch sends SSTP BPDU only, tagged with respective VLAN tag and destined to the SSTP MAC address. The other Cisco switches connected to the MSTP cloud receive the SSTP BPDUs and process the using the respective VLAN STP instances.
VSTP
VLAN Spanning Tree Protocol (VSTP) allows switches to run one or more STP or RSTP instances for each VLAN on which VSTP is enabled. For networks with multiple VLANs, this enables more intelligent tree spanning, because each VLAN can have interfaces enabled or disabled depending on the paths available to that specific VLAN.
Prior to Junos 10.2, only one spanning tree protocol can be enabled: IEEE or VSTP.
Starting with 10.2, concurrent spanning-tree protocol (rstp + vstp ) is allowed, can interoperate with either PVST+ or R-PVST+.
Scalability numbers:
. 253 vstp instances plus 1 rstp
=========================
example 1:
show protocols
rstp;
vstp {
vlan 2;
vlan 3;
vlan 4;
}
Any vlans that are not configured under vstp is part of the RSTP instance, vlan 2-4 are in each of their own vstp instance.
Example 2:
show protocols
rstp ;
vstp {
vlan-group {
group TRY {
vlan 2-100 ;
bridge-priority 4k;
}
group BACK {
vlan 101-200;
bridge-priority 16k;
}
}
Any vlans that ate configured under vstp is part of rstp, vlans 2-200 are in each of their own vstp instance. vlans that are grouped together will inherit there same stp parameters.
Prior to Junos 10.2, only one spanning tree protocol can be enabled: IEEE or VSTP.
Starting with 10.2, concurrent spanning-tree protocol (rstp + vstp ) is allowed, can interoperate with either PVST+ or R-PVST+.
Scalability numbers:
. 253 vstp instances plus 1 rstp
=========================
example 1:
show protocols
rstp;
vstp {
vlan 2;
vlan 3;
vlan 4;
}
Any vlans that are not configured under vstp is part of the RSTP instance, vlan 2-4 are in each of their own vstp instance.
Example 2:
show protocols
rstp ;
vstp {
vlan-group {
group TRY {
vlan 2-100 ;
bridge-priority 4k;
}
group BACK {
vlan 101-200;
bridge-priority 16k;
}
}
Any vlans that ate configured under vstp is part of rstp, vlans 2-200 are in each of their own vstp instance. vlans that are grouped together will inherit there same stp parameters.
Wednesday, July 7, 2010
MSTP II: Outside a Region
The concept of CIST
Every MSTP region runs special instance of spanning-tree known as IST or Internal Spanning Tree (=MSTI0). This instance is active on all links inside a region and serves the purpose of disseminating STP topology information for other STP instances. As usual, IST has a root bridge, elected based on the lowest bridge ID (priority/MAC address). However, situation changes when you have different regions in the network (e.g. switches with different region names, different revisions, etc). When a switch detects BPDU messages sourced from another region on any link, it marks the link as MSTP boundary.
Now two regions should build a common spanning tree known as CIST – Common and Internal Spanning Tree. This tree is result of joining ISTs of each region in a special manner. Here is a detailed description of the process.
1) In addition to sending IST configuration BPDUs, every switch initially declares itself as the root of CIST. The switches pass CIST configuration information along with IST information in additional BPDU fields. Switches inside a region never change the path cost to the CIST Root, known as CIST External Root Path Cost. Instead of that, the external path cost only changes on the boundary ports. Thus, this external cost only accounts for the cost of boundary links, not the cost of the paths inside a region. Essentially, CIST External Path Cost information “tunnels” across a region.
2) On boundary ports, switches exchange their CIST BPDU information ONLY. That is, switches hide IST information between regions, but pass CIST metrics. The usual RSTP synchronization process takes places between switches on a border link, and eventually ONE switch with the lowest BID (Bridge ID = Priority + MAC Address) among all regions is elected as the CIST Root. Note that each region still elects a local IST root, known as CIST Regional Root, as described further.
3) The region that contains the CIST Root, declares this switch as the root of local IST as well. However, things start to differ for regions that don’t contain the CIST root. All of them elect one of the border switches (switches connected to other regions) as IST root (aka CIST Regional Root). This procedure elects the IST Root (CIST Regional Root) based on the lowest CIST External Root Path Cost. Note that this procedure differs from elections based purely on BID, which take place inside a single region. In this case, the procedure uses BIDs as tiebreaker, if two or more switches have the same CIST Root External Path Cost. MSTP blocks all redundant boundary “uplinks” marking them as “alternate” paths to the CIST Root. The boundary switches do so by receiving “extra” CIST BPDUs on top of IST BPDUs with external root path cost values and comparing them with the ones received on local boundary ports.
4) Inside a region, switches build regular IST, using the CIST Regional Root as the root of IST. Note that this tree uses so-called Internal Root Path Cost stored in the local IST BPDUs. This cost increments along all the links inside a region, but it never leaks out of the region. Between regions, switches exchange information about CIST External Root Path cost only.
Note that switch with non-optimal priority value may become the CIST Regional Root (local IST Root). For example, if you configure a switch inside a region with a lowest BID among all switches it may not necessarily become the CIST Regional Root. Only if the switch has the lowest BID among all regions it would be elected as CIST Root.
The concept of CST and STP interoperation
From the above information, we conclude that CIST essentially has organization of a two-level hierarchy. The first level treats all regions as “pseudo-bridges” and operates with the External Root Path Cost. The first-level spanning tree roots in CIST Root Bridge and encompasses the pseudo-bridges. They call it CST or Common Spanning Tree. Effectively, this has no idea of the internal MSTP regions structure, but sees each region as a virtual bridge.
This is the point where MSTP interoperates with legacy IEEE STP/RSTP regions as well. The legacy switch regions have no concept of IST, so they simply join their STP instance with the CST and perceive MSTP regions as “transparent” pseudo-bridges, staying unaware of their internal topology. (Note that it may happen so that a switch with the lowest BID belongs to RSTP/STP region. This situation results in all MSTP regions electing local CIST Regional Roots and considering the new CIST Root located outside MSTP “domain”). Naturally, MSTP detects the appropriate STP version on a boundary link and switches to the respective mode of operations (e.g. RSTP/STP).
The second level of CIST hierarchy consists of various regional ISTs. Every MSTP region builds IST instance using the internal path costs and following the optimal “internal” topology. As you remember, this “internal topology” transparently transports the “external” BPDU information to the border switches, so that they can elect Regional CIST Root. The fist level topology (CST) is somewhat independent of the second level topology (IST) and bases upon the CIST root and cost of the boundary links. The second level topology (“internal”, IST) may change in case if boundary link cost changes or something happens to the CIST Root, since this affect the election of the Regional CIST Root.
MSTP pseudo-bridges do not strictly emulate the real bridges. For example, different boundary switches send their own BID in BPDUs, so the pseudo-bridge may appear to have many BIDs on various boundary links. However, this has no impact on the process of transparent tunneling of CIST Root information across the pseudo-bridge. Other things that don’t see pseudo-bridge as non-transparent include MSTP hop count or MaxAge timer value, for they may change asynchronously, as information travels along different paths inside a region.
MSTIs and CIST
Now, what could be said about the MSTIs – individual STP instances used inside regions? From what we learned so far, it is easy to conclude that the only logical solution is to map all MSTP instances to the CIST on the boundary links. This implies that you cannot load-balance VLAN traffic on the boundary links by mapping VLANs to different instances. All VLANs use the same non-blocking uplink that CIST elected as the optimal path to the CIST Root. But this only applies to the “CST” paths connecting the regional virtual bridges – inside any region VLANs follow the internal topology paths, based on the respective MSTI configurations.
It is important to note that MSTIs have no idea of the CIST Root whatsoever; they only use internal paths and internal MSTI root to build the spanning tree. However, all MSTP instances see the root port (towards the CIST Root) of the CIST Regional Bridge as the special “Master Port” connecting them to the “outside” world. This port serves the purpose of the “gateway” linking MSTI’s to other regions. Notice that switches do not send M-records (MSTI information) out of boundary ports, only CIST information and thus . Thus, the CIST and MSTI’s may converge independently and in parallel. The master port will ony beging forwarding when all respective MSTI ports are in sync and forwarding to avoid temporary bridging loops.
MSTP and Fault Isolation
Ethernet is known for its broadcast nature that tends propagating issues across the whole Layer 2 domain. There are tree main problems with Ethernet that affect MSTP designs:
* Unknown unicast flooding results in traffic surges under topology changes. Every topology change may cause massive invalidation of MAC address tables and unicast traffic flooding. This process is the result of Ethernet topology unawareness – the bridges don’t know MAC addresses location.
* Broadcas and Multicast flooding. This is a separate problem as many core protocols (ARP, IGP, PIM) rely on multicasting or broadcasting. Thos packets should be delivered to every node in a broadcast domain and under intense load network could be congested at every point.
* Spanning-Tree Convergence. MSTP uses RSTP procedure for STP re-negotiation. Since it is based on distance-vector behavior, it is prone to convergence issue, such as counting to infinity (old information circulation). This is especially noticeable in larger topologies with 10 switches and more and under special conditions, such as failure of the root bridge.
The concept of MSTP region allows for bounding STP re-computations. Since MSTIs in every region are independent, any change affecting MSTI in one region will not affect MSTIs in other regions. This is a direct result of the fact that M-record information is not exchanged between the regions. However, CIST recalculations affect every region and might be slow converging. This is why it is a good idea not to map any VLAN to CIST.
Topology changes in MSTP are treated the same way as in RSTP. That is, only non-edge links going to forwarding state will cause a topology change. A single physical link may be forwarding for one MSTI and blocking for another. Thus, a single physical change may have different effect on MSTIs and the CIST. Topology changes in MSTIs are bounded to a single region, while topology changes to the CIST propagate through all regions. Every region treats the TC notification from another region as “external” and applies them to CIST-associated ports only.
A topology change to CST (the tree connecting the virtual bridges) will affect all MSTIs in all regions and the CIST. This is due to the fact that new link becoming forwarding between the virtual bridges may change all paths in the topology and thus require massive MAC address re-learning. Thus, from the standpoint of topology change, something happening to the CST will have most massive impact of flooding in the set of interconnected MSTP regions.
The above observations advise a good design rule for MSTP networks – separated “meshy” topologies in their own regions and interconnect regions using “sparse” mesh, keeping in mind balance between redundancy and topology changes effect. This is an adaptation of well-know design principle – separate complexity from complexity to keep networks more stable and isolate fault domains.
Interoperating with PVST+
This task poses a tough issue. We know that PSVST+ runs an STP instance for every VLAN. On a contrary, MSTP maps VLANs to MSTIs, so one-to-one mapping between VLAN and STP instance no longer holds true. How should an MSTP switch operate on a border link connected to the PVST+ domain? As we remember, on the border with IEEE STP domain, PVST+ simply joins VLAN 1 STP with IEEE STP and tunnels SSTP BPDUs across the IEEE STP domain (refer to PVST+ Explained article for more information).
However, MSTP runs multiple MSTIs inside a region and maps them all to CIST on the border link. That means we need to make sure that internal MSTIs could be aware of changes in PVST+ trees. It’s hard to automatically map VLAN-based STPs to MSTI and so the simplest way to accomplish the desired behavior is to join all PVST+ trees with CIST. This way, changes in any of PVST+ STP instances propagate to CIST/IST and affect all MSTIs in result. While not the optimal solution, it ensure that no changes go unnoticed and no black holes occur in a single VLAN due to the topology changes.
The MSTP implementation simulates PVST+ by replicating CIST BPDUs on the link facing the PVST+ domain and sending the BPDUs on ALL VLANs active on the trunk. The MSTP switch consumes all BDPUs received from PVST+ domain and processes them using the CIST/IST instance. The PSVT+ side sees the MSTP domain as a special PVST+ domain with all per-VLAN instances claiming the CIST Root as the root of their STP. Note that PVST+ also interprets the whole region as a single pseudo-bridge, but operating in PSVT+ mode. The two possible options are allowed here:
1) MSTP domain (either a single region or multiple regions) contains the root bridge for ALL VLANs. This is only true if CIST Root BID is better than any PVST+ STP root BID. This is the preferred design, for you can manipulate uplink costs on the PVST+ side and obtain optimal traffic engineering results.
2) PVST+ contains the root bridges for ALL VLANs, including VLAN1, which maps to CST of STP. This is only true is all PVST+ root bridges BIDs for all VLANs are better than CIST Root BID. This is not the preferred design, since all MSTIs map to CIST on the border link, and you cannot load-balance the MSTIs as the enter the PVST+ domain.
Cisco implementation does not support the second option. MSTP domain should contain the bridge with the best BID, to ensure that the CIST Root is also the root for all PVST+ trees. If any other case, MSTP border switch will complain and place the ports that receive superior BPDUs from PVST+ region in root-inconsistent state. To fix this issue, ensure that PVST+ domain does not have any bridges with BIDs better than the CIST Root Bridge ID.
Every MSTP region runs special instance of spanning-tree known as IST or Internal Spanning Tree (=MSTI0). This instance is active on all links inside a region and serves the purpose of disseminating STP topology information for other STP instances. As usual, IST has a root bridge, elected based on the lowest bridge ID (priority/MAC address). However, situation changes when you have different regions in the network (e.g. switches with different region names, different revisions, etc). When a switch detects BPDU messages sourced from another region on any link, it marks the link as MSTP boundary.
Now two regions should build a common spanning tree known as CIST – Common and Internal Spanning Tree. This tree is result of joining ISTs of each region in a special manner. Here is a detailed description of the process.
1) In addition to sending IST configuration BPDUs, every switch initially declares itself as the root of CIST. The switches pass CIST configuration information along with IST information in additional BPDU fields. Switches inside a region never change the path cost to the CIST Root, known as CIST External Root Path Cost. Instead of that, the external path cost only changes on the boundary ports. Thus, this external cost only accounts for the cost of boundary links, not the cost of the paths inside a region. Essentially, CIST External Path Cost information “tunnels” across a region.
2) On boundary ports, switches exchange their CIST BPDU information ONLY. That is, switches hide IST information between regions, but pass CIST metrics. The usual RSTP synchronization process takes places between switches on a border link, and eventually ONE switch with the lowest BID (Bridge ID = Priority + MAC Address) among all regions is elected as the CIST Root. Note that each region still elects a local IST root, known as CIST Regional Root, as described further.
3) The region that contains the CIST Root, declares this switch as the root of local IST as well. However, things start to differ for regions that don’t contain the CIST root. All of them elect one of the border switches (switches connected to other regions) as IST root (aka CIST Regional Root). This procedure elects the IST Root (CIST Regional Root) based on the lowest CIST External Root Path Cost. Note that this procedure differs from elections based purely on BID, which take place inside a single region. In this case, the procedure uses BIDs as tiebreaker, if two or more switches have the same CIST Root External Path Cost. MSTP blocks all redundant boundary “uplinks” marking them as “alternate” paths to the CIST Root. The boundary switches do so by receiving “extra” CIST BPDUs on top of IST BPDUs with external root path cost values and comparing them with the ones received on local boundary ports.
4) Inside a region, switches build regular IST, using the CIST Regional Root as the root of IST. Note that this tree uses so-called Internal Root Path Cost stored in the local IST BPDUs. This cost increments along all the links inside a region, but it never leaks out of the region. Between regions, switches exchange information about CIST External Root Path cost only.
Note that switch with non-optimal priority value may become the CIST Regional Root (local IST Root). For example, if you configure a switch inside a region with a lowest BID among all switches it may not necessarily become the CIST Regional Root. Only if the switch has the lowest BID among all regions it would be elected as CIST Root.
The concept of CST and STP interoperation
From the above information, we conclude that CIST essentially has organization of a two-level hierarchy. The first level treats all regions as “pseudo-bridges” and operates with the External Root Path Cost. The first-level spanning tree roots in CIST Root Bridge and encompasses the pseudo-bridges. They call it CST or Common Spanning Tree. Effectively, this has no idea of the internal MSTP regions structure, but sees each region as a virtual bridge.
This is the point where MSTP interoperates with legacy IEEE STP/RSTP regions as well. The legacy switch regions have no concept of IST, so they simply join their STP instance with the CST and perceive MSTP regions as “transparent” pseudo-bridges, staying unaware of their internal topology. (Note that it may happen so that a switch with the lowest BID belongs to RSTP/STP region. This situation results in all MSTP regions electing local CIST Regional Roots and considering the new CIST Root located outside MSTP “domain”). Naturally, MSTP detects the appropriate STP version on a boundary link and switches to the respective mode of operations (e.g. RSTP/STP).
The second level of CIST hierarchy consists of various regional ISTs. Every MSTP region builds IST instance using the internal path costs and following the optimal “internal” topology. As you remember, this “internal topology” transparently transports the “external” BPDU information to the border switches, so that they can elect Regional CIST Root. The fist level topology (CST) is somewhat independent of the second level topology (IST) and bases upon the CIST root and cost of the boundary links. The second level topology (“internal”, IST) may change in case if boundary link cost changes or something happens to the CIST Root, since this affect the election of the Regional CIST Root.
MSTP pseudo-bridges do not strictly emulate the real bridges. For example, different boundary switches send their own BID in BPDUs, so the pseudo-bridge may appear to have many BIDs on various boundary links. However, this has no impact on the process of transparent tunneling of CIST Root information across the pseudo-bridge. Other things that don’t see pseudo-bridge as non-transparent include MSTP hop count or MaxAge timer value, for they may change asynchronously, as information travels along different paths inside a region.
MSTIs and CIST
Now, what could be said about the MSTIs – individual STP instances used inside regions? From what we learned so far, it is easy to conclude that the only logical solution is to map all MSTP instances to the CIST on the boundary links. This implies that you cannot load-balance VLAN traffic on the boundary links by mapping VLANs to different instances. All VLANs use the same non-blocking uplink that CIST elected as the optimal path to the CIST Root. But this only applies to the “CST” paths connecting the regional virtual bridges – inside any region VLANs follow the internal topology paths, based on the respective MSTI configurations.
It is important to note that MSTIs have no idea of the CIST Root whatsoever; they only use internal paths and internal MSTI root to build the spanning tree. However, all MSTP instances see the root port (towards the CIST Root) of the CIST Regional Bridge as the special “Master Port” connecting them to the “outside” world. This port serves the purpose of the “gateway” linking MSTI’s to other regions. Notice that switches do not send M-records (MSTI information) out of boundary ports, only CIST information and thus . Thus, the CIST and MSTI’s may converge independently and in parallel. The master port will ony beging forwarding when all respective MSTI ports are in sync and forwarding to avoid temporary bridging loops.
MSTP and Fault Isolation
Ethernet is known for its broadcast nature that tends propagating issues across the whole Layer 2 domain. There are tree main problems with Ethernet that affect MSTP designs:
* Unknown unicast flooding results in traffic surges under topology changes. Every topology change may cause massive invalidation of MAC address tables and unicast traffic flooding. This process is the result of Ethernet topology unawareness – the bridges don’t know MAC addresses location.
* Broadcas and Multicast flooding. This is a separate problem as many core protocols (ARP, IGP, PIM) rely on multicasting or broadcasting. Thos packets should be delivered to every node in a broadcast domain and under intense load network could be congested at every point.
* Spanning-Tree Convergence. MSTP uses RSTP procedure for STP re-negotiation. Since it is based on distance-vector behavior, it is prone to convergence issue, such as counting to infinity (old information circulation). This is especially noticeable in larger topologies with 10 switches and more and under special conditions, such as failure of the root bridge.
The concept of MSTP region allows for bounding STP re-computations. Since MSTIs in every region are independent, any change affecting MSTI in one region will not affect MSTIs in other regions. This is a direct result of the fact that M-record information is not exchanged between the regions. However, CIST recalculations affect every region and might be slow converging. This is why it is a good idea not to map any VLAN to CIST.
Topology changes in MSTP are treated the same way as in RSTP. That is, only non-edge links going to forwarding state will cause a topology change. A single physical link may be forwarding for one MSTI and blocking for another. Thus, a single physical change may have different effect on MSTIs and the CIST. Topology changes in MSTIs are bounded to a single region, while topology changes to the CIST propagate through all regions. Every region treats the TC notification from another region as “external” and applies them to CIST-associated ports only.
A topology change to CST (the tree connecting the virtual bridges) will affect all MSTIs in all regions and the CIST. This is due to the fact that new link becoming forwarding between the virtual bridges may change all paths in the topology and thus require massive MAC address re-learning. Thus, from the standpoint of topology change, something happening to the CST will have most massive impact of flooding in the set of interconnected MSTP regions.
The above observations advise a good design rule for MSTP networks – separated “meshy” topologies in their own regions and interconnect regions using “sparse” mesh, keeping in mind balance between redundancy and topology changes effect. This is an adaptation of well-know design principle – separate complexity from complexity to keep networks more stable and isolate fault domains.
Interoperating with PVST+
This task poses a tough issue. We know that PSVST+ runs an STP instance for every VLAN. On a contrary, MSTP maps VLANs to MSTIs, so one-to-one mapping between VLAN and STP instance no longer holds true. How should an MSTP switch operate on a border link connected to the PVST+ domain? As we remember, on the border with IEEE STP domain, PVST+ simply joins VLAN 1 STP with IEEE STP and tunnels SSTP BPDUs across the IEEE STP domain (refer to PVST+ Explained article for more information).
However, MSTP runs multiple MSTIs inside a region and maps them all to CIST on the border link. That means we need to make sure that internal MSTIs could be aware of changes in PVST+ trees. It’s hard to automatically map VLAN-based STPs to MSTI and so the simplest way to accomplish the desired behavior is to join all PVST+ trees with CIST. This way, changes in any of PVST+ STP instances propagate to CIST/IST and affect all MSTIs in result. While not the optimal solution, it ensure that no changes go unnoticed and no black holes occur in a single VLAN due to the topology changes.
The MSTP implementation simulates PVST+ by replicating CIST BPDUs on the link facing the PVST+ domain and sending the BPDUs on ALL VLANs active on the trunk. The MSTP switch consumes all BDPUs received from PVST+ domain and processes them using the CIST/IST instance. The PSVT+ side sees the MSTP domain as a special PVST+ domain with all per-VLAN instances claiming the CIST Root as the root of their STP. Note that PVST+ also interprets the whole region as a single pseudo-bridge, but operating in PSVT+ mode. The two possible options are allowed here:
1) MSTP domain (either a single region or multiple regions) contains the root bridge for ALL VLANs. This is only true if CIST Root BID is better than any PVST+ STP root BID. This is the preferred design, for you can manipulate uplink costs on the PVST+ side and obtain optimal traffic engineering results.
2) PVST+ contains the root bridges for ALL VLANs, including VLAN1, which maps to CST of STP. This is only true is all PVST+ root bridges BIDs for all VLANs are better than CIST Root BID. This is not the preferred design, since all MSTIs map to CIST on the border link, and you cannot load-balance the MSTIs as the enter the PVST+ domain.
Cisco implementation does not support the second option. MSTP domain should contain the bridge with the best BID, to ensure that the CIST Root is also the root for all PVST+ trees. If any other case, MSTP border switch will complain and place the ports that receive superior BPDUs from PVST+ region in root-inconsistent state. To fix this issue, ensure that PVST+ domain does not have any bridges with BIDs better than the CIST Root Bridge ID.
MSTP I: Inside a Region
In the beginning, there was IEEE STP protocol (originally, there also was DEC variant [the original] invented by Radia Perlman and IBM STP protocols, but those are fossils now), which was adapted for use with multiple VLANs and 802.1q trunks. A single shared tree, sometimes called Mono Spanning Tree by Cisco, or more often – Common Spanning Tree is shared by all VLANs. The obvious drawback of this design is impossibility to perform VLAN traffic engineering across redundant links: if a link is blocked, it is blocked for all VLANs. To overcome this, Cisco suggested its proprietary PVST/PVST+ solution, running a separate STP instance for each VLAN. This solution permits using different logical topology for each VLAN, effectively allowing for L2 traffic engineering. However, with the number of VLANs growing, PVST becomes a waste of switch resources and management burden, for the number of logical topologies is usually much smaller than the number of active VLANs.
As time passed, STP evolved into RSTP and Cisco answered with Rapid-PVST+: the fast STP, but with the same per-VLAN instance concept. The single spanning-tree instance used by IEEE and per-VLAN STP implemented by Cisco represents two poles in the space of possible solutions. Seeing the limitations of PVST approach, Cisco came with idea of decoupling the STP instance from a VLAN (they were bound together in PVST). The initial implementation was called MISTP (Multiple Instances Spanning Tree) and later evolved into new IEEE 802.1s standard called MSTP (Multiple Spanning Trees Protocol).
Logical and Physical Topologies
Instead of running an STP instance for each VLAN, run a number of VLAN-independent STP instances (representing logical topologies) and then map each VLAN to the most appropriate logical topology (instance). Thus, the number of STP instances is kept to minimum (saving switch resources), but the network capacity is utilized in optimal fashion, by using all possible paths for VLAN traffic. The switch forwarding logic for VLAN traffic was changed a little bit. In order for a frame to be forwarded out of a port, two conditions must be met: first, VLAN must be active on this port (e.g. not filtered) and second, the STP instance the VLAN maps to, must be in non-discarding state for this port. Obviously, due to multiple logical topologies a single port could be blocking for one instance and forwarding for another.
Implementing MSTP
The following questions need to be answered:
* Topology Calculation. How to build multiple STP instances (logical topologies) in a single physical topology? Should we run multiple STP instances each with own BPDUs? If yes, then how would we distinguish every instance’s BPDUs: PVST+ uses VLAN tags for that, but now STP instances are independent of VLANs?
* Information Distribution.How to make all switches aware of VLAN to instance mappings? Should we distribute instance ID along with VLAN number? If yet, then how could we ensure all switches use consistent numbering?
* Consistency Check. How to ensure the above mapping is consistent across all switches? That is, would switch1 know that switch2 maps VLAN2 to the same instance 1 and not insance2?
IEEE’s implementation, MSTP region is a collection of switches, sharing the same view of physical topology partitioning into set of logical topologies. For two switches to become members of the same region, the following attributes must match:
* Configuration name
* Configuration revision number (16 bits)
* The table of 4096 elements which map the respective VLAN to STP instance number
IEEE 802.1s implementation does not send a BDPU for each active STP instance, nor does it encapsulate VLAN list in each configuration message. Instead of that, a special STP instance number 0 called Internal Spanning Tree (IST or MSTI0) is designated to carry all “signaling” information. The BPDUs for IST contain all standard RSTP information for IST itself, as well as carry additional informational fields. Among others fields there are configuration name, revision number and a hash value computed over VLAN to STP instance mapping table contents. Using just this compact information it’s easy to detect misconfiguration on two neighboring switches.
What about other instances, besides the IST thing? Well, obviously, all VLANs could be mapped to IST – this is the default configuration. Effectively, this represents the case of classic IEEE RSTP with all VLANs sharing the same spanning-tree. Of course, other instances also exit, and they are called MSTIs – multiple spanning tree instances. Each MSTI may assign different priorities to switches, may have different link costs, port priorities and thus end up with it’s own logical topology. Now if the 802.1s standard implementation does not send separate BDPUs for each MSTI, how does it accomplish separate topologies? The MSTIs information is piggybacked into IST BPDUs in special MRecord fields (one for every active MSTI), which carries root priority, designated bridge priority, port priority and root path cost among others. Let’s see how this whole thing works.
First of all, since MSTP convergence mechanism stems from RSTP, there is no BDPU relaying process downstream from the root bridge. Every switch emits configuration BPDUs on it’s own, every Hello interval seconds. Every BDPU has full information about IST, and also MRecord for every MSTI . Using the RSTP convergence mechanics, separate STP instances are built for IST and every MSTI, using the information from IST BPDU and MRecords (root/designated bridge priorities, port priority, root path cost etc). Note that STP timers such as Hello, ForwardTime, MaxAge could only be tuned for IST, the instance 0. All other instances (MSTIs) inherit the timers from IST – this is the natural result of all MSTI information being piggybacked in IST BPDUs. Just as a side note, MSTP does not use MaxAge timer to age out old information, like RSTP/STP do. Instead of this, IST BDPUs has special field called MaxHops. IST root sends BPDUs with hop count equal to MaxHops and every other downstream switch decrements the hop count field on reception of IST BPDU. As soon as hop count becomes zero, the information in BPDU is ignored, and the switch may start declaring itself as new IST root. The old MaxAge/ForwardDelay timers are still used when MSTP interacts with RSTP, STP or (R)PVST+ bridges.
Caveats arising from VLAN/STP decoupling
There are some issues, which may arise from the fact that spanning-tree instances now are not directly tied to VLANs. The general rule should be as following: “If a VLAN is active on a particular primary link (e.g. this link is non-backup in your logical topology), ensure the STP instance it maps to is forwarding on this link”.
So, do not use “VLAN pruning” static method of distributing VLANs across trunks when you have MSTP enabled.
Use separate STP for each logical topology and avoid mapping VLANs to IST. Keep IST only for information distribution, but load-balance traffic using MSTIs.
As time passed, STP evolved into RSTP and Cisco answered with Rapid-PVST+: the fast STP, but with the same per-VLAN instance concept. The single spanning-tree instance used by IEEE and per-VLAN STP implemented by Cisco represents two poles in the space of possible solutions. Seeing the limitations of PVST approach, Cisco came with idea of decoupling the STP instance from a VLAN (they were bound together in PVST). The initial implementation was called MISTP (Multiple Instances Spanning Tree) and later evolved into new IEEE 802.1s standard called MSTP (Multiple Spanning Trees Protocol).
Logical and Physical Topologies
Instead of running an STP instance for each VLAN, run a number of VLAN-independent STP instances (representing logical topologies) and then map each VLAN to the most appropriate logical topology (instance). Thus, the number of STP instances is kept to minimum (saving switch resources), but the network capacity is utilized in optimal fashion, by using all possible paths for VLAN traffic. The switch forwarding logic for VLAN traffic was changed a little bit. In order for a frame to be forwarded out of a port, two conditions must be met: first, VLAN must be active on this port (e.g. not filtered) and second, the STP instance the VLAN maps to, must be in non-discarding state for this port. Obviously, due to multiple logical topologies a single port could be blocking for one instance and forwarding for another.
Implementing MSTP
The following questions need to be answered:
* Topology Calculation. How to build multiple STP instances (logical topologies) in a single physical topology? Should we run multiple STP instances each with own BPDUs? If yes, then how would we distinguish every instance’s BPDUs: PVST+ uses VLAN tags for that, but now STP instances are independent of VLANs?
* Information Distribution.How to make all switches aware of VLAN to instance mappings? Should we distribute instance ID along with VLAN number? If yet, then how could we ensure all switches use consistent numbering?
* Consistency Check. How to ensure the above mapping is consistent across all switches? That is, would switch1 know that switch2 maps VLAN2 to the same instance 1 and not insance2?
IEEE’s implementation, MSTP region is a collection of switches, sharing the same view of physical topology partitioning into set of logical topologies. For two switches to become members of the same region, the following attributes must match:
* Configuration name
* Configuration revision number (16 bits)
* The table of 4096 elements which map the respective VLAN to STP instance number
IEEE 802.1s implementation does not send a BDPU for each active STP instance, nor does it encapsulate VLAN list in each configuration message. Instead of that, a special STP instance number 0 called Internal Spanning Tree (IST or MSTI0) is designated to carry all “signaling” information. The BPDUs for IST contain all standard RSTP information for IST itself, as well as carry additional informational fields. Among others fields there are configuration name, revision number and a hash value computed over VLAN to STP instance mapping table contents. Using just this compact information it’s easy to detect misconfiguration on two neighboring switches.
What about other instances, besides the IST thing? Well, obviously, all VLANs could be mapped to IST – this is the default configuration. Effectively, this represents the case of classic IEEE RSTP with all VLANs sharing the same spanning-tree. Of course, other instances also exit, and they are called MSTIs – multiple spanning tree instances. Each MSTI may assign different priorities to switches, may have different link costs, port priorities and thus end up with it’s own logical topology. Now if the 802.1s standard implementation does not send separate BDPUs for each MSTI, how does it accomplish separate topologies? The MSTIs information is piggybacked into IST BPDUs in special MRecord fields (one for every active MSTI), which carries root priority, designated bridge priority, port priority and root path cost among others. Let’s see how this whole thing works.
First of all, since MSTP convergence mechanism stems from RSTP, there is no BDPU relaying process downstream from the root bridge. Every switch emits configuration BPDUs on it’s own, every Hello interval seconds. Every BDPU has full information about IST, and also MRecord for every MSTI . Using the RSTP convergence mechanics, separate STP instances are built for IST and every MSTI, using the information from IST BPDU and MRecords (root/designated bridge priorities, port priority, root path cost etc). Note that STP timers such as Hello, ForwardTime, MaxAge could only be tuned for IST, the instance 0. All other instances (MSTIs) inherit the timers from IST – this is the natural result of all MSTI information being piggybacked in IST BPDUs. Just as a side note, MSTP does not use MaxAge timer to age out old information, like RSTP/STP do. Instead of this, IST BDPUs has special field called MaxHops. IST root sends BPDUs with hop count equal to MaxHops and every other downstream switch decrements the hop count field on reception of IST BPDU. As soon as hop count becomes zero, the information in BPDU is ignored, and the switch may start declaring itself as new IST root. The old MaxAge/ForwardDelay timers are still used when MSTP interacts with RSTP, STP or (R)PVST+ bridges.
Caveats arising from VLAN/STP decoupling
There are some issues, which may arise from the fact that spanning-tree instances now are not directly tied to VLANs. The general rule should be as following: “If a VLAN is active on a particular primary link (e.g. this link is non-backup in your logical topology), ensure the STP instance it maps to is forwarding on this link”.
So, do not use “VLAN pruning” static method of distributing VLANs across trunks when you have MSTP enabled.
Use separate STP for each logical topology and avoid mapping VLANs to IST. Keep IST only for information distribution, but load-balance traffic using MSTIs.
MSTP
CST
The original IEEE 802.1q standard defines much more than simply trunking. This standard defines a Common Spanning Tree (CST) that only assumes one spanning tree instance for the entire bridged network, regardless of the number of VLANs.
In a network running the CST, these statements are true:
. No load balancing is possible; one Uplink needs to block for all VLANs.
. The CPU is spared; only one instance needs to be computed.
MST Case
Several VLANs can be mapped to a reduced number of spanning tree instances because most networks do not need more than a few logical topologies.
. The desired load balancing scheme can still be achieved.
. The CPU is spared because only limited instances are computed.
MST Region
The main enhancement introduced by MST is that several VLANs can be mapped to a single spanning tree instance. This raises the problem of how to determine which VLAN is to be associated with which instance. More precisely, how to tag BPDUs so that the receiving devices can identify the instances and the VLANs to which each device applies.
The IEEE 802.1s committee adopted a much easier and simpler approach that introduced MST regions. Think of a region as the equivalent of Border Gateway Protocol (BGP) Autonomous Systems, which is a group of switches placed under a common administration.
MST Configuration and MST Region
Each switch running MST in the network has a single MST configuration that consists of these three attributes:
1. An alphanumeric configuration name (32 bytes)
2. A configuration revision number (two bytes)
3. A 4096-element table that associates each of the potential 4096 VLANs supported on the chassis to a given instance
In order to be part of a common MST region, a group of switches must share the same configuration attributes.
Note: If for any reason two switches differ on one or more configuration attribute, the switches are part of different regions.
Region Boundary
In order to ensure consistent VLAN-to-instance mapping, it is necessary for the protocol to be able to exactly identify the boundaries of the regions. For that purpose, the characteristics of the region are included in the BPDUs. The exact VLANs-to-instance mapping is not propagated in the BPDU, because the switches only need to know whether they are in the same region as a neighbor. Therefore, only a digest of the VLANs-to-instance mapping table is sent, along with the revision number and the name. Once a switch receives a BPDU, the switch extracts the digest (a numerical value derived from the VLAN-to-instance mapping table through a mathematical function) and compares this digest with its own computed digest. If the digests differ, the port on which the BPDU was received is at the boundary of a region.
In generic terms, a port is at the boundary of a region if the designated bridge on its segment is in a different region or if it receives legacy 802.1d BPDUs.
MST Instances
According to the IEEE 802.1s specification, an MST bridge must be able to handle at least these two instances:
. One Internal Spanning Tree (IST)
. One or more Multiple Spanning Tree Instance(s) (MSTIs)
The terminology continues to evolve, as 802.1s is actually in a pre-standard phase. It is likely these names will change in the final release of 802.1s.
IST Instances
In order to clearly understand the role of the IST instance, remember that MST originates from the IEEE. Therefore, MST must be able to interact with 802.1q-based networks, because 802.1q is another IEEE standard. For 802.1q, a bridged network only implements a single spanning tree (CST). The IST instance is simply an RSTP instance that extends the CST inside the MST region.
The IST instance receives and sends BPDUs to the CST. The IST can represent the entire MST region as a CST virtual bridge to the outside world.
These are two functionally equivalent diagrams. Notice the location of the different blocked ports. In a typically bridged network, you expect to see a blocked port between Switches M and B. Instead of blocking on D, you expect to have the second loop broken by a blocked port somewhere in the middle of the MST region. However, due to the IST, the entire region appears as one virtual bridge that runs a single spanning tree (CST). This makes it possible to understand that the virtual bridge blocks an alternate port on B. Also, that virtual bridge is on the C to D segment and leads Switch D to block its port.
MSTIs
The MSTIs are simple RSTP instances that only exist inside a region. These instances run the RSTP automatically by default, without any extra configuration work. Unlike the IST, MSTIs never interact with the outside of the region. Remember that MST only runs one spanning tree outside of the region, so except for the IST instance, regular instances inside of the region have no outside counterpart. Additionally, MSTIs do not send BPDUs outside a region, only the IST does.
MSTIs do not send independent individual BPDUs. Inside the MST region, bridges exchange MST BPDUs that can be seen as normal RSTP BPDUs for the IST while containing additional information for each MSTI. Each switch only sends one BPDU, but each includes one MRecord per MSTI present on the ports.
Note: The first information field carried by an MST BPDU contains data about the IST. This implies that the IST (instance 0) is always present everywhere inside an MST region. However, the network administrator does not have to map VLANs onto instance 0, and therefore this is not a source of concern.
Unlike regular converged spanning tree topology, both ends of a link can send and receive BPDUs simultaneously. This is because, each bridge can be designated for one or more instances and needs to transmit BPDUs. As soon as a single MST instance is designated on a port, a BPDU that contains the information for all instances (IST+ MSTIs) is to be sent. The diagram shown here demonstrates MST BDPUs sent inside and outside of an MST region.
The MRecord contains enough information (mostly root bridge and sender bridge priority parameters) for the corresponding instance to calculate its final topology. The MRecord does not need any timer-related parameters such as hello time, forward delay, and max age that are typically found in a regular IEEE 802.1d or 802.1q CST BPDU. The only instance in the MST region to use these parameters is the IST; the hello time determines how frequently BPDUs are sent, and the forward delay parameter is mainly used when rapid transition is not possible (remember that rapid transitions do not occur on shared links). As MSTIs depend on the IST to transmit their information, MSTIs do not need those timers.
The original IEEE 802.1q standard defines much more than simply trunking. This standard defines a Common Spanning Tree (CST) that only assumes one spanning tree instance for the entire bridged network, regardless of the number of VLANs.
In a network running the CST, these statements are true:
. No load balancing is possible; one Uplink needs to block for all VLANs.
. The CPU is spared; only one instance needs to be computed.
MST Case
Several VLANs can be mapped to a reduced number of spanning tree instances because most networks do not need more than a few logical topologies.
. The desired load balancing scheme can still be achieved.
. The CPU is spared because only limited instances are computed.
MST Region
The main enhancement introduced by MST is that several VLANs can be mapped to a single spanning tree instance. This raises the problem of how to determine which VLAN is to be associated with which instance. More precisely, how to tag BPDUs so that the receiving devices can identify the instances and the VLANs to which each device applies.
The IEEE 802.1s committee adopted a much easier and simpler approach that introduced MST regions. Think of a region as the equivalent of Border Gateway Protocol (BGP) Autonomous Systems, which is a group of switches placed under a common administration.
MST Configuration and MST Region
Each switch running MST in the network has a single MST configuration that consists of these three attributes:
1. An alphanumeric configuration name (32 bytes)
2. A configuration revision number (two bytes)
3. A 4096-element table that associates each of the potential 4096 VLANs supported on the chassis to a given instance
In order to be part of a common MST region, a group of switches must share the same configuration attributes.
Note: If for any reason two switches differ on one or more configuration attribute, the switches are part of different regions.
Region Boundary
In order to ensure consistent VLAN-to-instance mapping, it is necessary for the protocol to be able to exactly identify the boundaries of the regions. For that purpose, the characteristics of the region are included in the BPDUs. The exact VLANs-to-instance mapping is not propagated in the BPDU, because the switches only need to know whether they are in the same region as a neighbor. Therefore, only a digest of the VLANs-to-instance mapping table is sent, along with the revision number and the name. Once a switch receives a BPDU, the switch extracts the digest (a numerical value derived from the VLAN-to-instance mapping table through a mathematical function) and compares this digest with its own computed digest. If the digests differ, the port on which the BPDU was received is at the boundary of a region.
In generic terms, a port is at the boundary of a region if the designated bridge on its segment is in a different region or if it receives legacy 802.1d BPDUs.
MST Instances
According to the IEEE 802.1s specification, an MST bridge must be able to handle at least these two instances:
. One Internal Spanning Tree (IST)
. One or more Multiple Spanning Tree Instance(s) (MSTIs)
The terminology continues to evolve, as 802.1s is actually in a pre-standard phase. It is likely these names will change in the final release of 802.1s.
IST Instances
In order to clearly understand the role of the IST instance, remember that MST originates from the IEEE. Therefore, MST must be able to interact with 802.1q-based networks, because 802.1q is another IEEE standard. For 802.1q, a bridged network only implements a single spanning tree (CST). The IST instance is simply an RSTP instance that extends the CST inside the MST region.
The IST instance receives and sends BPDUs to the CST. The IST can represent the entire MST region as a CST virtual bridge to the outside world.
These are two functionally equivalent diagrams. Notice the location of the different blocked ports. In a typically bridged network, you expect to see a blocked port between Switches M and B. Instead of blocking on D, you expect to have the second loop broken by a blocked port somewhere in the middle of the MST region. However, due to the IST, the entire region appears as one virtual bridge that runs a single spanning tree (CST). This makes it possible to understand that the virtual bridge blocks an alternate port on B. Also, that virtual bridge is on the C to D segment and leads Switch D to block its port.
MSTIs
The MSTIs are simple RSTP instances that only exist inside a region. These instances run the RSTP automatically by default, without any extra configuration work. Unlike the IST, MSTIs never interact with the outside of the region. Remember that MST only runs one spanning tree outside of the region, so except for the IST instance, regular instances inside of the region have no outside counterpart. Additionally, MSTIs do not send BPDUs outside a region, only the IST does.
MSTIs do not send independent individual BPDUs. Inside the MST region, bridges exchange MST BPDUs that can be seen as normal RSTP BPDUs for the IST while containing additional information for each MSTI. Each switch only sends one BPDU, but each includes one MRecord per MSTI present on the ports.
Note: The first information field carried by an MST BPDU contains data about the IST. This implies that the IST (instance 0) is always present everywhere inside an MST region. However, the network administrator does not have to map VLANs onto instance 0, and therefore this is not a source of concern.
Unlike regular converged spanning tree topology, both ends of a link can send and receive BPDUs simultaneously. This is because, each bridge can be designated for one or more instances and needs to transmit BPDUs. As soon as a single MST instance is designated on a port, a BPDU that contains the information for all instances (IST+ MSTIs) is to be sent. The diagram shown here demonstrates MST BDPUs sent inside and outside of an MST region.
The MRecord contains enough information (mostly root bridge and sender bridge priority parameters) for the corresponding instance to calculate its final topology. The MRecord does not need any timer-related parameters such as hello time, forward delay, and max age that are typically found in a regular IEEE 802.1d or 802.1q CST BPDU. The only instance in the MST region to use these parameters is the IST; the hello time determines how frequently BPDUs are sent, and the forward delay parameter is mainly used when rapid transition is not possible (remember that rapid transitions do not occur on shared links). As MSTIs depend on the IST to transmit their information, MSTIs do not need those timers.
Subscribe to:
Posts (Atom)