RBFS Redundancy for Subscriber Groups
Overview
Node outages and link failures that may occur on an access network can bring down the subscriber services. These network outages affect critical workloads and continuity of business. So, it is essential to set up a network that is resilient and responds quickly to the events and protect the network from outages.
The following diagram represents a simple access network without redundancy.
RBFS Redundancy protects subscriber services from node or link outages. It provides mechanisms to enhance network resiliency that enables subscriber workloads to remain functional by ensuring a reliable switchover in the event of a node or link outage. With RBFS Redundancy, if one node goes down due to node or link failure, another node can automatically take over the services.
RBFS currently supports redundancy for IPoE subscribers through a hot-standby mode. |
Understanding RBFS Redundancy
RBFS Redundancy protects subscriber groups using an active-standby node cluster model. In the active-standby node cluster, the active node (for a subscriber group) performs subscriber services. The standby device mirrors concurrent subscriber state data from the active peer (for that redundancy session). Both the nodes, paired for redundancy, keep sending 'keepalive' messages to each other to check the health status. RBFS Redundancy is centered around subscriber groups, known as redundancy sessions.
The document uses the term C-BNG throughout the document. C-BNG stands for consolidated BNG. Unlike the spine-leaf topology, where the functionalities are distributed to spine platforms and leaf platforms separately based on their roles, C-BNG platform includes all functionalities together in a single platform. |
RBFS Redundancy is not supported on spine-leaf network topology as the RBFS spine-leaf topological architecture itself innately provides a redundant and resilient network. |
Inter-BNG Connectivity
C-BNG platforms, paired for redundancy, are connected with a Redundancy (RD) TCP link. This RD TCP connection can be formed either directly or through the core network. It uses IS-IS (for unicast reachability) and LDP (for labeled unicast reachability) between the active and standby nodes. The link between the C-BNG pairs establishes connectivity and the link is used to send 'keepalive' messages and data mirroring for subscriber state synchronization.
If the link goes down, the Keepalive messages cannot be exchanged between the C-BNGs and the C-BNGs move to the standalone state only after the hold-timer expires and the previously synchronized subscriber session data becomes invalid.
If the C-BNG pairs are connected through the core network, then a core network failure can impact redundancy. However, if the inter-BNG connection is direct or through a different path that doesn’t rely on the core, redundancy is not impacted by core network failure.
In a scenario, when a new lag is configured for redundancy with the same redundancy session ID, it works for the same session. If any of the lag goes down, a switchover happens.
Redundancy Session
Redundancy session is a binding mechanism that is used to pair C-BNGs for redundancy. RBFS Redundancy allows grouping of subscribers, under a redundancy session and each redundancy session is represented by a redundancy session ID. Simply, an RBFS Redundancy session represents a redundancy group of subscribers. A redundancy session enables linking the LAG with that particular redundancy session (subscriber group). When you define a value for the redundancy session ID, this ID should be unique and the same for both redundancy pairs. When two nodes get the same session ID, they recognize each other as the peer nodes for a particular redundancy session (subscriber group). The TCP session establishment between the nodes occurs after the pairing with the redundancy session ID. Once the TCP session is established, the nodes use this channel for subscriber data mirroring and synchronization and for health status monitoring.
A C-BNG chassis can contain multiple redundancy sessions. Multiple C-BNG nodes can be active nodes for one or more subscriber groups (redundancy sessions) that serve the subscribers and they can, at the same time, be standby nodes for other subscriber groups.
One node, which is paired for redundancy, can perform subscriber services for more than one redundancy session (subscriber group). The peer node, which is identical to the first node, contains the same subscriber group (redundancy session) as a standby. And in the event of that first node goes down due to any outage, the standby node can take over subscriber service for this redundancy session.
RBFS Redundancy allows running a redundancy session actively on one node and back up the same redundancy session (subscriber group) on a different (standby) C-BNG node. In RBFS redundancy, in fact, there is no active node or standby node. It is active subscriber group and standby subscriber group. A C-BNG node can be active for a subscriber group and at the same time it can be a standby for a different subscriber group. You can park a maximum number of 64 redundancy sessions (either active or standby) on a C-BNG node.
RBFS Redundancy Architecture
The following architectural diagram provides a high-level view of RBFS in redundancy mode. It shows two RBFS nodes, paired for redundancy, deployed in an active-standby node cluster, with their interfaces are connected with an RD TCP connection. These peer nodes use the RD TCP connection for sending 'keepalive' messages and data mirroring for subscriber state synchronization.
Both of the nodes are connected to an access device (OLT device in this scenario) on one end from where it receives subscriber traffic and sends traffic. The nodes are also connected to the core network on the other end.
The node 'C-BNG 1' is in active state and performs subscriber services for the 'Session 1' (redundancy subscriber group). 'Session 1' is also mirrored in the C-BNG 2 in standby mode. The standby device mirrors concurrent subscriber state data for 'Session 1' from the active peer. If an active node goes down due to any reason, the peer node detects the outage and uses mirrored 'Session 1' to perform subscriber services.
One C-BNG node acts as active node for one or more sessions (subscriber redundancy groups) and as a standby C-BNG for other subscriber redundancy groups at the same time. The following diagram illustrates the scenario.
RBFS Redundancy can mitigate the following types of failures:
-
Link failure Between Active RBFS Node and Access Node (OLT, DSLAM or MSAN)
-
Node Outage
Redundancy for Node Outage
A node outage, which can bring down the subscriber services, can occur due to many reasons on a network. RBFS Redundancy helps to minimize the impact and reduce interruptions and downtime by providing a resilient system. In the event of a node outage, RBFS Redundancy triggers switchover in which the standby node takes over from the active node with very minimal impact on the subscriber services.
The diagram shows cbng1
as an active node serving subscribers and cbng2
stays as a standby. When an active node goes down, the standby node detects the same and takes over from the active RBFS node. In a node outage scenario, the node becomes unresponsive and cannot maintain communication with its peer node the RD TCP link.
Redundancy for Link Failure
In RBFS Redundancy, Link Aggregation combines multiple physical links into a single logical link. If a member link, which is part of a LAG, goes down, the LAG fails. The diagram shows the 'cbng1' and 'cbng2' deployed in redundancy mode and is connected to the access device with LAGs. When the LAG between 'cbng1' and the access node goes down, the 'cbng1', which is running 'Session 1', becomes inactive for the subscriber group. So the 'cbng2', which is the standby for 'Session 1', detects the failure of 'cbng1' and starts performing subscriber services for 'Session 1' by providing a quick recovery from the disruption.
In this link or LAG failure scenario, cbng1
is in a healthy state, only the LAG interface went down. It cannot perform subscriber services for the particular redundancy session. However, it can keep communication with its peer node as the RD TCP channel.
Subscriber Data Synchronization
RBFS Redundancy subscriber data from the active node is always synced to the standby node. So that if the active node goes down, the standby node takes over and restores traffic forwarding which was previously performed by its peer node. It ensures that traffic can keep flowing even in the event of an outage.
Node States in Redundancy
In RBFS redundancy, C-BNG nodes have different states for various redundancy sessions. Typically, RBFS redundancy nodes encounter the following states for redundancy sessions:
Active: All subscribers are served by active node in the RBFS active-standby node cluster. One node which is active for a redundancy session can be a standby node for a different session. Nodes, that are paired for redundancy, send 'keepalive' messages to each other and also synchronize all subscriber state data with the peer node. The priority values that you specify for the redundancy nodes determine the roles of active and standby. The node that receives the higher priority value for the session ID assumes the role of active for that subscriber group. To set one device as 'active', you must specify higher priority value for the redundancy session for that node.
Standby: Standby node is identical with the active node and synchronizes subscriber data concurrently from peer node. It keeps communication with the peer node to monitor node health status using 'keepalive' messages. The node that gets the lower priority value for the redundancy session ID assumes the role of standby. Standby node for a subscriber group does not perform any subscriber services for that group unless or until the active node encounters an outage.
Down: When a node becomes inactive due to an outage, it is considered as 'down'. In the event of a node outage, it is completely down and cannot perform subscriber services and any communication with its peer node. But in the case of a LAG (between the node and access node) failure, the node cannot perform subscriber services, but it can communicate with the peer node through the RD TCP connection. So that the subscriber state synchronization occurs without any interruption.
Stand Alone: When the active node goes down, the switchover occurs and standby takes over the subscriber service. In this scenario, the serving node is in 'stand alone' state (for that redundancy session) as it has no peer node for redundancy.
Revert or Rollback
In RBFS Redundancy, after a node or link failure and the subsequent switchover, the standby takes over and continues the subscriber service for that subscriber group even after the other node (previously active) recovers from the failure. There is no automated rollback or revert to the previously active router. However, administrators can perform a manual switchover.
Monitoring Node Health Status
The RBFS nodes, which have switchover capacities, monitor each other for the health status. RBFS Redundancy uses 'keepalive' messages that check on the health of the RBFS nodes. Both of the devices send 'keepalive' messages to each other in every five seconds. One Node can detect a failure if it does not receive 'keepalive' messages for a period of 20 seconds from the other node.
Redundancy Clients
There are multiple RBFS daemons that participate in providing redundancy. They include redundancy daemon, (rd
), LAG daemon (lagd
), interface daemon (ifmd
), subscriber daemon (subscriberd
), IPoE daemon (ipoed
). These daemons, which perform various roles, are known as redundancy clients.
Redundancy Daemon
Redundancy Daemon is responsible for establishing high-availability connections. It monitors the ecosystem and detects any outages that may happen on the network. It assigns the roles of active or standby to the nodes depending on the priority configured on the node. The daemon triggers a switchover to the standby node if a failure occurs. It responds to the failure events which are reported locally by daemons who are the redundancy clients. It also cleanses the data after switch-hover from the node that went down.
Supported Hardware Platforms
Currently, RBFS can be deployed in redundancy mode using the RBFS C-BNG (Consolidated BNG) switches. RBFS C-BNG software provides complete BNG functionalities on a single compact hardware switch. You can use the following hardware platforms to deploy RBFS in redundancy mode.
-
UfiSpace S9600-72XC: The UfiSpace S9600-72XC is a multi-function, disaggregated white box aggregation routing platform that is equipped with Broadcom’s Qumran2c chipset. It features 64x25GE and 8x100GE high-speed ports with a switching capacity of up to 2.4Tbs.
-
Edgecore AGR420: AGR420 is a high performance 25GbE aggregation router that consists of fixed 64 x 10G/25G SFP28, 8 x 100GE QSFP28 and 2 x 100G QSFP-DD network interface configurations.
RBFS Redundancy Requirements
The following are the requirements for setting up RBFS Redundancy.
-
Ensure that both of the platform devices, on which RBFS software runs, must be the same model.
-
Ensure that the devices should run the same version of RBFS software. RBFS software 23.2.1 and later versions support deployment in redundancy mode.
-
NTP must be configured on both devices to match the timestamps.