Logging Overview

RBFS logging is the process of writing log messages during the execution of events. RBFS logging provides reports on events across the entire RBFS ecosystem, spanning different functional areas. In a hardware switch, this happens at 2 levels:

Inside the RBFS container
In the RtBrick Host

All these logs, generated by different components, can be exported to the log management server, where you can view and analyze real-time data.

RBFS provides logging for the entire RBFS ecosystem, including Brick Daemons (BD), as well as other (non-BD) daemons. Brick Daemons are built on top of Brick Data Store (BDS), and other (non-BD) daemons (such as Prometheus) are the ones that are not dependent on BDS. BDS provides an in-built infrastructure for logging, which can be used by all BDS applications.

The diagram below shows how logs generated within the RBFS container (from Brick Daemon, Prometheus, and Syslog) and from API Gateway Daemon (APIGWD) are sent to Controller Daemon (CTRLD) on the RtBrick host. CTRLD aggregates these logs and exports them to an external Log Management Server for centralized monitoring and analysis.

Figure 1. Logging Overview

Logging in RBFS Container

The RBFS container logging infrastructure (BDS Logging) provides low-latency, in-memory logging for large-scale systems without significant performance impact.

BDS Logging

Log Tables

BDS logs are stored in a BDS table. BDS creates a log table for each module in a BD. One entry is added to this log table for every log. Older entries are removed from the table when the number of entries exceeds 10,000. A log table is created for a module only if that module has at least one logged event. Every module in RBFS has at least one log table named in this format: <modulename>.<bd-name>.log.

Log Maps

Each log is mapped to a single event logged by the application. To optimize memory usage, RBFS does not store verbose strings; instead, it stores the log map as an identifier to the actual string message.

The log map and the log ID refer to the same entity.

Log Modules

Every BDS application consists of multiple modules. Logging can be configured for each BDS module separately.

The log modules supported by RBFS are: access_service, bcm_q2c, bd, bds, bds_mock, bgp, fib, fwdinfra, hostconfd, ifm, igmp, ipoe, isis, l2tp, lag, ldp, license, lldp, lwip_contrib, mrib, nat, oam, ospf, pim, policy, pool, poold, pppoe, pubsub, resmon, rib, rtbrick-cli, secure_management, snapshot, static, subsMgmt, time_series, validation, vpp.

Log Groups

A log group is a collection of log maps or log IDs. Groups have been introduced to simplify the log configuration tasks. For example, to debug a BGP peer issue, instead of enabling the entire bgp module logs, you can enable logging for the BGP peer log group.

Guidelines and Limitations for BDS Logging

The following are some guidelines and limitations for BDS logging:

By default, BDS logging is enabled, and the log level is set to 'Error'.
By default, logging for the BDS and PUBSUB modules has been disabled.
The user can configure log levels per BD or per module.
In a scaled setup, logging should not be kept enabled for an extended period.
When you set the log-level to "None", the log is disabled for the specific module, group, or globally.

Non-BDS Logging

Syslog

Syslog is generated by the Linux API-based logging mechanism. RBFS Linux container uses syslog for logging. Syslog can also be exported to Graylog. Currently, RBFS supports exporting syslog messages from the Linux system facilities, such as auth, authpriv, daemon, and kern, to Graylog.

TSDB (Prometheus) Alert Messages

Prometheus is the system and service monitoring application that can be deployed in RBFS to collect and process metrics. In RBFS, Prometheus-generated alert messages are forwarded to CtrlD, and from there they can be exported to the configured log management servers.

Logging in to the RtBrick Host

In RBFS, there are daemons that are not part of the RBFS container but run on the RtBrick host. RBFS provides logging for these daemons.

CTRLD

CtrlD logging provides log messages of events related to elements, ZTP, and security.

APIGWD

ApiGwD logs contain details about who accessed the API and how they accessed it.

Plugin Alias

Any logs in RBFS can be exported to an external logging destination. Currently, CtrlD supports GELF and syslog as external plugins.

CtrlD is the egress node for all the GELF (Graylog Extended Log Format) messages. The brick daemons are configured to send GELF messages to CtrlD, which forwards them to the configured endpoints, such as syslog or a GELF endpoint. This is because CtrlD enhances the GELF message with switch-global settings (such as the switch’s serial number).

Supported Logging Formats

RBFS supports two types of logging formats: GELF and Syslog.

GELF: GELF is a logging format for Graylog. It uses JSON to structure log messages, making them searchable and more structured. The supported protocol for GELF is HTTP.

The diagram below shows the example of a GELF message on the Graylog Web Interface.

Figure 2. GELF Message Shown on the Graylog Web Interface

Syslog: Syslog is a format used by Linux systems and many networking devices for logging. It is a plain text protocol with a defined structure. The supported Syslog transport protocols are TCP and UDP.

The diagram below shows the example of a message in syslog format as displayed on the Graylog Web Interface.

Figure 3. Syslog Message Format Shown on the Graylog Web Interface

Log Levels

Log level indicates the severity of events (logs). You can configure logging by severity level. The diagram below shows the log levels supported by RBFS in order of priority:

Figure 4. Log Levels

Level	Level Name	Description
0	Emergency	Events that cause the application to terminate.
1	Alert	A condition that should be corrected immediately. For example, “ISIS_AUTHENTICATION_FAIL” indicates an IS-IS authentication failure.
2	Critical	Severe error that will prevent the application from continuing. For example, “LICENSE_EXPIRY_CRITICAL” indicates that the current license has expired.
3	Error	Events that cause issues. For example, “LICENSE_EXPIRY_ERROR” indicates that the current license is about to expire.
4	Warning	Events that might cause issues. For example, “LICENSE_EXPIRY_WARNING” indicates that the current license is about to expire.
5	Notice	Conditions that are not error conditions, but that may require special handling. For example, “PPPOE_SESSION_LIMIT_REACHED” indicates that the PPPoE session limit has been reached.
6	Info	Events showing normal application behaviour. For example, “PPPOE_SESSION_DELETE” indicates that the PPPoE session deletion has occurred.
7	Debug	Diagnostic information used for debugging. For example, “PPPOE_LCP_START_ECHO” indicates that the PPP LCP echo request interval has started.

Level

Level Name

Description

Emergency

Events that cause the application to terminate.

Alert

A condition that should be corrected immediately. For example, “ISIS_AUTHENTICATION_FAIL” indicates an IS-IS authentication failure.

Critical

Severe error that will prevent the application from continuing. For example, “LICENSE_EXPIRY_CRITICAL” indicates that the current license has expired.

Error

Events that cause issues. For example, “LICENSE_EXPIRY_ERROR” indicates that the current license is about to expire.

Warning

Events that might cause issues. For example, “LICENSE_EXPIRY_WARNING” indicates that the current license is about to expire.

Notice

Conditions that are not error conditions, but that may require special handling. For example, “PPPOE_SESSION_LIMIT_REACHED” indicates that the PPPoE session limit has been reached.

Info

Events showing normal application behaviour. For example, “PPPOE_SESSION_DELETE” indicates that the PPPoE session deletion has occurred.

Debug

Diagnostic information used for debugging. For example, “PPPOE_LCP_START_ECHO” indicates that the PPP LCP echo request interval has started.

All log levels below the specified level are logged. For example, if the level is "Warning” (4), logs at levels such as "Error" (3), "Critical" (2), "Alert" (1), and "Emergency" (1) are included.
Any level above “Warning” indicates that you should log with caution, as they may cause system instability in a scaled environment.

Supported Platforms

Not all features are necessarily supported on each hardware platform. Refer to the Platform Guide for the features and sub-features supported by each platform.