Wednesday, July 7, 2010

MSTP II: Outside a Region

The concept of CIST
Every MSTP region runs special instance of spanning-tree known as IST or Internal Spanning Tree (=MSTI0). This instance is active on all links inside a region and serves the purpose of disseminating STP topology information for other STP instances. As usual, IST has a root bridge, elected based on the lowest bridge ID (priority/MAC address). However, situation changes when you have different regions in the network (e.g. switches with different region names, different revisions, etc). When a switch detects BPDU messages sourced from another region on any link, it marks the link as MSTP boundary.

Now two regions should build a common spanning tree known as CIST – Common and Internal Spanning Tree. This tree is result of joining ISTs of each region in a special manner. Here is a detailed description of the process.

1) In addition to sending IST configuration BPDUs, every switch initially declares itself as the root of CIST. The switches pass CIST configuration information along with IST information in additional BPDU fields. Switches inside a region never change the path cost to the CIST Root, known as CIST External Root Path Cost. Instead of that, the external path cost only changes on the boundary ports. Thus, this external cost only accounts for the cost of boundary links, not the cost of the paths inside a region. Essentially, CIST External Path Cost information “tunnels” across a region.

2) On boundary ports, switches exchange their CIST BPDU information ONLY. That is, switches hide IST information between regions, but pass CIST metrics. The usual RSTP synchronization process takes places between switches on a border link, and eventually ONE switch with the lowest BID (Bridge ID = Priority + MAC Address) among all regions is elected as the CIST Root. Note that each region still elects a local IST root, known as CIST Regional Root, as described further.

3) The region that contains the CIST Root, declares this switch as the root of local IST as well. However, things start to differ for regions that don’t contain the CIST root. All of them elect one of the border switches (switches connected to other regions) as IST root (aka CIST Regional Root). This procedure elects the IST Root (CIST Regional Root) based on the lowest CIST External Root Path Cost. Note that this procedure differs from elections based purely on BID, which take place inside a single region. In this case, the procedure uses BIDs as tiebreaker, if two or more switches have the same CIST Root External Path Cost. MSTP blocks all redundant boundary “uplinks” marking them as “alternate” paths to the CIST Root. The boundary switches do so by receiving “extra” CIST BPDUs on top of IST BPDUs with external root path cost values and comparing them with the ones received on local boundary ports.

4) Inside a region, switches build regular IST, using the CIST Regional Root as the root of IST. Note that this tree uses so-called Internal Root Path Cost stored in the local IST BPDUs. This cost increments along all the links inside a region, but it never leaks out of the region. Between regions, switches exchange information about CIST External Root Path cost only.


Note that switch with non-optimal priority value may become the CIST Regional Root (local IST Root). For example, if you configure a switch inside a region with a lowest BID among all switches it may not necessarily become the CIST Regional Root. Only if the switch has the lowest BID among all regions it would be elected as CIST Root.


The concept of CST and STP interoperation

From the above information, we conclude that CIST essentially has organization of a two-level hierarchy. The first level treats all regions as “pseudo-bridges” and operates with the External Root Path Cost. The first-level spanning tree roots in CIST Root Bridge and encompasses the pseudo-bridges. They call it CST or Common Spanning Tree. Effectively, this has no idea of the internal MSTP regions structure, but sees each region as a virtual bridge.

This is the point where MSTP interoperates with legacy IEEE STP/RSTP regions as well. The legacy switch regions have no concept of IST, so they simply join their STP instance with the CST and perceive MSTP regions as “transparent” pseudo-bridges, staying unaware of their internal topology. (Note that it may happen so that a switch with the lowest BID belongs to RSTP/STP region. This situation results in all MSTP regions electing local CIST Regional Roots and considering the new CIST Root located outside MSTP “domain”). Naturally, MSTP detects the appropriate STP version on a boundary link and switches to the respective mode of operations (e.g. RSTP/STP).

The second level of CIST hierarchy consists of various regional ISTs. Every MSTP region builds IST instance using the internal path costs and following the optimal “internal” topology. As you remember, this “internal topology” transparently transports the “external” BPDU information to the border switches, so that they can elect Regional CIST Root. The fist level topology (CST) is somewhat independent of the second level topology (IST) and bases upon the CIST root and cost of the boundary links. The second level topology (“internal”, IST) may change in case if boundary link cost changes or something happens to the CIST Root, since this affect the election of the Regional CIST Root.

MSTP pseudo-bridges do not strictly emulate the real bridges. For example, different boundary switches send their own BID in BPDUs, so the pseudo-bridge may appear to have many BIDs on various boundary links. However, this has no impact on the process of transparent tunneling of CIST Root information across the pseudo-bridge. Other things that don’t see pseudo-bridge as non-transparent include MSTP hop count or MaxAge timer value, for they may change asynchronously, as information travels along different paths inside a region.

MSTIs and CIST

Now, what could be said about the MSTIs – individual STP instances used inside regions? From what we learned so far, it is easy to conclude that the only logical solution is to map all MSTP instances to the CIST on the boundary links. This implies that you cannot load-balance VLAN traffic on the boundary links by mapping VLANs to different instances. All VLANs use the same non-blocking uplink that CIST elected as the optimal path to the CIST Root. But this only applies to the “CST” paths connecting the regional virtual bridges – inside any region VLANs follow the internal topology paths, based on the respective MSTI configurations.

It is important to note that MSTIs have no idea of the CIST Root whatsoever; they only use internal paths and internal MSTI root to build the spanning tree. However, all MSTP instances see the root port (towards the CIST Root) of the CIST Regional Bridge as the special “Master Port” connecting them to the “outside” world. This port serves the purpose of the “gateway” linking MSTI’s to other regions. Notice that switches do not send M-records (MSTI information) out of boundary ports, only CIST information and thus . Thus, the CIST and MSTI’s may converge independently and in parallel. The master port will ony beging forwarding when all respective MSTI ports are in sync and forwarding to avoid temporary bridging loops.

MSTP and Fault Isolation

Ethernet is known for its broadcast nature that tends propagating issues across the whole Layer 2 domain. There are tree main problems with Ethernet that affect MSTP designs:

* Unknown unicast flooding results in traffic surges under topology changes. Every topology change may cause massive invalidation of MAC address tables and unicast traffic flooding. This process is the result of Ethernet topology unawareness – the bridges don’t know MAC addresses location.
* Broadcas and Multicast flooding. This is a separate problem as many core protocols (ARP, IGP, PIM) rely on multicasting or broadcasting. Thos packets should be delivered to every node in a broadcast domain and under intense load network could be congested at every point.
* Spanning-Tree Convergence. MSTP uses RSTP procedure for STP re-negotiation. Since it is based on distance-vector behavior, it is prone to convergence issue, such as counting to infinity (old information circulation). This is especially noticeable in larger topologies with 10 switches and more and under special conditions, such as failure of the root bridge.

The concept of MSTP region allows for bounding STP re-computations. Since MSTIs in every region are independent, any change affecting MSTI in one region will not affect MSTIs in other regions. This is a direct result of the fact that M-record information is not exchanged between the regions. However, CIST recalculations affect every region and might be slow converging. This is why it is a good idea not to map any VLAN to CIST.

Topology changes in MSTP are treated the same way as in RSTP. That is, only non-edge links going to forwarding state will cause a topology change. A single physical link may be forwarding for one MSTI and blocking for another. Thus, a single physical change may have different effect on MSTIs and the CIST. Topology changes in MSTIs are bounded to a single region, while topology changes to the CIST propagate through all regions. Every region treats the TC notification from another region as “external” and applies them to CIST-associated ports only.

A topology change to CST (the tree connecting the virtual bridges) will affect all MSTIs in all regions and the CIST. This is due to the fact that new link becoming forwarding between the virtual bridges may change all paths in the topology and thus require massive MAC address re-learning. Thus, from the standpoint of topology change, something happening to the CST will have most massive impact of flooding in the set of interconnected MSTP regions.

The above observations advise a good design rule for MSTP networks – separated “meshy” topologies in their own regions and interconnect regions using “sparse” mesh, keeping in mind balance between redundancy and topology changes effect. This is an adaptation of well-know design principle – separate complexity from complexity to keep networks more stable and isolate fault domains.

Interoperating with PVST+

This task poses a tough issue. We know that PSVST+ runs an STP instance for every VLAN. On a contrary, MSTP maps VLANs to MSTIs, so one-to-one mapping between VLAN and STP instance no longer holds true. How should an MSTP switch operate on a border link connected to the PVST+ domain? As we remember, on the border with IEEE STP domain, PVST+ simply joins VLAN 1 STP with IEEE STP and tunnels SSTP BPDUs across the IEEE STP domain (refer to PVST+ Explained article for more information).

However, MSTP runs multiple MSTIs inside a region and maps them all to CIST on the border link. That means we need to make sure that internal MSTIs could be aware of changes in PVST+ trees. It’s hard to automatically map VLAN-based STPs to MSTI and so the simplest way to accomplish the desired behavior is to join all PVST+ trees with CIST. This way, changes in any of PVST+ STP instances propagate to CIST/IST and affect all MSTIs in result. While not the optimal solution, it ensure that no changes go unnoticed and no black holes occur in a single VLAN due to the topology changes.

The MSTP implementation simulates PVST+ by replicating CIST BPDUs on the link facing the PVST+ domain and sending the BPDUs on ALL VLANs active on the trunk. The MSTP switch consumes all BDPUs received from PVST+ domain and processes them using the CIST/IST instance. The PSVT+ side sees the MSTP domain as a special PVST+ domain with all per-VLAN instances claiming the CIST Root as the root of their STP. Note that PVST+ also interprets the whole region as a single pseudo-bridge, but operating in PSVT+ mode. The two possible options are allowed here:

1) MSTP domain (either a single region or multiple regions) contains the root bridge for ALL VLANs. This is only true if CIST Root BID is better than any PVST+ STP root BID. This is the preferred design, for you can manipulate uplink costs on the PVST+ side and obtain optimal traffic engineering results.

2) PVST+ contains the root bridges for ALL VLANs, including VLAN1, which maps to CST of STP. This is only true is all PVST+ root bridges BIDs for all VLANs are better than CIST Root BID. This is not the preferred design, since all MSTIs map to CIST on the border link, and you cannot load-balance the MSTIs as the enter the PVST+ domain.

Cisco implementation does not support the second option. MSTP domain should contain the bridge with the best BID, to ensure that the CIST Root is also the root for all PVST+ trees. If any other case, MSTP border switch will complain and place the ports that receive superior BPDUs from PVST+ region in root-inconsistent state. To fix this issue, ensure that PVST+ domain does not have any bridges with BIDs better than the CIST Root Bridge ID.

No comments:

Post a Comment