1. Introduction
Operational-state visibility is key for troubleshooting, testing, monitoring and capacity management. This requires to sample router metrics periodically. Ingestion of time-series data allows to ask interesting operational queries.
Examples:
-
A slightly increasing memory consumption over time while overall PPPoE session count has not changed, for example, is an indication for a memory leak.
-
If the 5 Minute chassis temperature is too high, this might be an indication for an imminent hardware breakdown and the switch hardware must be replaced.
-
If utilization of all fabric interfaces is constantly touching the 80% saturation levels then new fabric links must be commissioned.
-
High input traffic with degradation of optical receive levels might be an indication of running very close to optical budget.
The challenge is to sample all these information efficiently in terms of disk, memory and CPU utilization while providing comprehensive query and reporting functionality.
1.1. Supported Platforms
Not all features are necessarily supported on each hardware platform. Refer to the Platform Guide for the features and the sub-features that are or are not supported by each platform.
1.2. Architectural Overview
The RBFS telemetry architecture is based on Prometheus as an open-source systems monitoring and alerting toolkit. Prometheus is designed to pull metrics periodically, and save them efficiently. It allows to analyze the metrics with a powerful query language called PromQL. Also an optional alert management is available. There is opportunity to tie it together with own services to integrate it into the system landscape. Data should have short retention times (default 15d).
This fits perfectly to the needs in BDS. The figure below shows how it fits in an overall architecture.

To mitigate the short retention times, which fits to BDS but not in an overall telemetry process, the data can be stored in a centralized storage database (for example, Influx) this can be done by federation or via remote storage adapters. To distribute the alert messages from prometheus, CTRLD functions as "alertmanager webhook receiver", which takes the alert and distributes it to a log management tool.
1.2.1. Router deployment model
Prometheus DB is run on the router as a dedicated process. It ships with a package-time configuration to poll each BDS capable speaker at periodic intervals. Initially the periodic interval is 1 second. The Prometheus Exposition format is a very simple HTTP based GET query which asks a given BD speaker "Give me all your metrics". Each BD subscribes to the global.time-series.metric.config table, which contains an operator-configurable list of BDS targets. Only the BDS which is master of a table responds. Next Prometheus polls the BD using the /metrics URL.

1.2.2. Storage efficiency
On an average Prometheus uses only around 1-2 bytes per sample. Thus, to plan the capacity of a Prometheus server, you can use the rough formula:
needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
The single binaries disk space:
-rwxr-xr-x 1 root root 27M Sep 2 22:51 alertmanager + -rwxr-xr-x 1 root root 81M Sep 2 22:51 prometheus + -rwxr-xr-x 1 root root 49M Sep 3 19:55 promtool
Promtool is needed to test the configurations before set them to prometheus.
1.2.3. Alerting
The alerting is configured through Prometheus. For more information, see alertmanager.
1.2.4. Role of CTRLD
Figure-4 provides an overview of the role of CTRLD.
Prometheus and Alertmanager register themself in CTRLD, so that CTRLD is aware of these two services.
1.2.4.1. Service state and Proxy
The registration of the services gives 2 advantages:
-
The operational state is an indicator if the service is up and running.
-
The proxy functionality of CTRLD can be used for prometheus and alertmanager.
The proxy functionality is used for querying prometheus directly:
curl 'http://198.51.100.125:19091/api/v1/rbfs/elements/rtbrick/services/PROMETHEUS/proxy/api/v1/query?query=up' | jq .
But it is also used for federation and therefore the following URL is used:
http://198.51.100.125:19091/api/v1/rbfs/elements/rtbrick/services/PROMETHEUS/proxy/federate
1.2.4.2. Alert distribution
CTRLD can forward the alerts from the alertmanager to graylog or any other REST endpoint.
1.2.4.3. API for Configuration
CTRLD provides a REST API Endpoint for configuration of alerts and metrics.
1.2.5. Federation deployment model

Prometheus is intended to have at least one instance per datacenter usually; also with a global Prometheus for global graphing or alerting. Federation allows for pulling metrics and aggregations up the hierarchy.
In the global Prometheus config, this timeseries is pulled:
prometheus.yml:
global: scrape_interval: 60s # By default, scrape targets every 15 seconds. # A scrape configuration containing exactly one endpoint to scrape: scrape_configs: - job_name: "federate" honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="bds"}' scrape_interval: 15s # Patterns for files from which target groups are extracted. file_sd_configs: - files: - ./bds.target.yml refresh_interval: 5m
The match[] here requests all BDS job time series. By following this job naming convention, you do not have to adjust the config every time when there is a new aggregating rule.
The targets itself can be configured in a separate file.
bds.target.yml:
- targets: ['198.51.100.125:19091'] labels: __metrics_path__: "/api/v1/rbfs/elements/rtbrick/services/PROMETHEUS/proxy/federate" box: 125_rtbrick
2. Installation
The RtBrick fullstack comes with a ready to use tsdb instance. So no more installation on RBFS has to be done.
For federation of metrics, a global prometheus instance is needed. To visualize the metrics a Grafana instance has to be installed, and to get the alert messages, a graylog instance has to be set up. This document does not contain an installation guide for that systems.
The information about configuring a federation Pprometheus to scrape metrics from a RBFS installation is described in the Federation deployment model section.
3. Configuring Time Series Database
The following section describes how to configure the system to gather metrics and alerts out of the system.
3.1. Metric
To better understand the Data Model have a look at the Prometheus Data Model.
3.1.1. Metric Data Model
In RBFS it is possible to turn each table attribute into a metric.
![]() |
When you export the time-series metric data for an attribute which has more than 50 label values (user-defined, default labels), you may see truncated data in the exported metric. |
The following table describes the configuration model:
Metric |
|
metric_name |
Name of the metric (metric name conventions). That is the unique identifier for the metric. |
table_name |
Table Name for which the metric is designed, could also be a regular expression. |
append_timestamp |
Timestamp is epoch rendered in milliseconds and its value is equal to current metrics value’s creation time in RBFS. |
bds_metric_type |
|
index_name |
Name of the index, if the bds_metric_type is index-metric. |
metric_type |
|
metric_description |
Description of the metric. |
attributes |
List of Attributes (see Attribute Table) that will be streamed as metric. |
filters |
List of AttributeFilters (see AttributeFilter Table) that filters the table rows which should be considered for metric generation. Each filter in this list has to match in order to generate the metric, so the list implies an implicit AND. |
Attribute |
|
attribute_name |
Name of the attribute that should be streamed as metric. This Attribute has to be a numeric type, or a type that has a numeric converter. |
filters |
List of AttributeFilters (see the [AttributeFilter] table) that filters the table rows which should be considered for metric generation. Each filter in this list has to match in order to generate the metric, so the list implies an implicit AND. |
labels |
List of AttributeLabels (see the [AttributeLabel] table) that are attached to that metric. |
AttributeFilter |
|
match_attribute_name |
Attribute of the Table which is used to match against. |
match_type |
|
match_value |
The value that attribute has to match against. |
AttributeLabel CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values. |
|
label_name |
Name of the Label (label name conventions). |
dynamic |
bool: If the label is dynamic, the label_value is treated as attribute_name, so the value of the attribute is used as the label value, otherwise the label value is used directly. |
label_value |
The value of the label or the attribute which should be used as label value. |
filters |
List of AttributeFilters (see [AttributeFilter] Table) that filters the table rows which should be considered for label generation. Each filter in this list has to match in order to generate the label, so the list implies an implicit AND. |
3.1.2. Configuring Metrics
The configuration of the Metrics can be done in various ways.
3.1.2.1. Configuring Metrics using Command Line Interface
To configure the Time Series Database, perform the following steps:
-
Define Metric configuration
-
Define Attribute configuration
-
Optional Filters at Metric Level and Attribute level
-
Defining labels to be attached to exported metric
3.1.2.1.1. Metric Configuration
Metric configuration is used to configure the parameters of the metric data being exported.
![]() |
Depending on the platform the exact resource name to be monitored can be found in global.chassis_0.resource.sensor, and adjust the Prometheus/Grafana configuration accordingly. |
Syntax
Command arguments
<metric-name> |
Specifies the name of the metric exported, as would be reflected in Prometheus. Use the naming conventions as recommended by Prometheus |
<128 character description about the metric-name > |
Description of the metric |
<counter / gauge> |
Configures the metric data type. Currently the supported Prometheus metric data are: counter and gauge |
<object-metric / index-metric > |
Specifies the type of attribute, that is scraped and exported. There are two types, object-metric and index-metric |
<table-name> |
Specifies the target table, from which the data is scraped and exported. |
<attribute-name> |
Specifies the name of the attribute, in the target table to be scraped and exported |
<append-timestamp> |
Set the append-timestamp to true for exporting the metric values with timestamp. By default, the value is 'false'. |
<index-name> |
Specifies the index-name of the index-metric attribute. This configuration is applicable for index-metric alone. |
<match-attribute-name> |
Specifies the matching attribute name for the filter |
include-subscribed-tables [true / false] |
Specifies whether the configuration needs to be applied on a subscribed tables as well. Default: false. |
Example
admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm table-name global.chassis_0.resource.sensor admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm bds-type object-metric admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm prometheus-type gauge admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm description "Chassis fan speed in rpm" admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm attribute rpm admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm include-subscribed-table false
Allowed Attribute Types (Type Converters)
Normally only attributes are allowed, which are of type numeric, but for some types, there are built-in type converters, which allow also to use attributes of their types.
For the following BDS types, built-in type converters are provided by BDS. As per Prometheus data model, type converter will convert the BDS type into a 64bit float number.
BDS data type |
Outcome number represents |
unix-wallclock-timestamp |
Seconds |
unix-usec-wallclock-timestamp |
Seconds |
unix-usec-monotonic-timestamp |
Seconds |
unix-usec-coarse-wallclock-timestamp |
Seconds |
bandwidth |
bps(bit per second) |
temperature |
Degree Celsius |
3.1.2.1.2. Metric Filter Configuration
Metric filter configuration is used to configure the parameters of the filter. It is used to filter the exported metric. This is an optional configuration.
Syntax
Command arguments
<match-attribute-name> |
Specifies the filter that filters the exported metric, based on specified criteria. This is optional configuration. |
< exact / regular-expression > |
Specifies the match type to be used, There are two options, exact and regular-expression. |
<match-attribute-value> |
Specifies the attribute value used for match. Fixed value for exact. Regex pattern for regular-expression |
Example
Exact Value
admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm filter resource_type match-attribute-value fan admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm filter resource_type match-type exact
Regular Expression
admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm filter resource_name match-attribute-value Chassis.* admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm filter resource_name match-type regular-expression
3.1.2.1.3. Metric Attribute Label Configuration
Metric attribute config is used to configure the labels to be attached to the exported metric.
Syntax
Command arguments
<label-name> |
Specifies the name of label. User definable, Please use naming conventions as recommended by Prometheus |
<dynamic / static> |
Specifies the type of labels, a static value or dynamic value to be added. |
<label-value> |
Specifies the label-value to be used. |
Example
Dynamic Label
admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm attribute rpm label fan admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm attribute rpm label fan label-value resource_name admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm attribute rpm label fan label-type dynamic
Static Label
admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm attribute rpm label vender admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm attribute rpm label fan label-value rtbrick admin@rtbrick: cfg> set time-series metric chassis_fan_speed_rpm attribute rpm label fan label-type static
3.1.2.1.4. Metric Attribute Filter Configuration
Attribute filter config is used to configure the parameters of Attribute filter. It is used to filter the exported metric based on certain fields of the attribute. This is an optional configuration.
Syntax
Command arguments
<attribute name> |
Specifies the filter that filters the exported metric , based on criteria of the attribute. This is optional config. |
<exact / regular-expression> |
Specifies the match type to be used, There are two options, exact and regular-expression. |
<match-attribute-value> |
Specifies the attribute value used for match. Fixed value for exact. Regex pattern for regular-expression |
Example
The below example shows, the metric attribute will be exported only if the port_stat_if_in_discards is exactly 0.
admin@rtbrick: cfg> set time-series metric interface_statistics_data attribute port_stat_if_in_ucast_pkts filter port_stat_if_in_discards admin@rtbrick: cfg> set time-series metric interface_statistics_data attribute port_stat_if_in_ucast_pkts filter port_stat_if_in_discards match-type exact admin@rtbrick: cfg> set time-series metric interface_statistics_data attribute port_stat_if_in_ucast_pkts filter port_stat_if_in_discards match-attribute-value 0
3.1.2.1.5. Metric Label Filter Configuration
Label filter configuration is used to set filter parameters that can be used to attach label based on certain criteria. This is an optional configuration.
Syntax
Command arguments
<match-attribute-name> |
Specifies the filter that filters the exported metric, based on some attribute value.This is optional config. |
< exact / regular-expression > |
Specifies the match type to be used, There are two options, exact and regular-expression. |
<match-attribute-value> |
Specifies the attribute value used for match. Fixed value for exact. Regex pattern for regular-expression |
Example
The below example sets label, interface_orientation to the exported data, only if the interface_name matches ifp-0/0/50.
admin@rtbrick: cfg> set time-series metric interface_statistics_data attribute port_stat_if_in_ucast_pkts label interface_orientation admin@rtbrick: cfg> set time-series metric interface_statistics_data attribute port_stat_if_in_ucast_pkts label interface_orientation filter interface_name admin@rtbrick: cfg> set time-series metric interface_statistics_data attribute port_stat_if_in_ucast_pkts label interface_orientation filter interface_name match-type exact admin@rtbrick: cfg> set time-series metric interface_statistics_data attribute port_stat_if_in_ucast_pkts label interface_orientation filter interface_name match-attribute-value ifp-0/0/50
3.2. Alert
RBFS uses the prometheus alerting feature to generate alerts. These alerts are forwarded to an alertmanager instance inside the rbfs container. The alertmanager instance sends the alert to CTRLD which distributes the alert to an HTTP Endpoint.
Alerts are also configured in a BDS table, and they are exported to Prometheus by the system.
3.2.1. Alert Data Model
Alert |
|
name |
The name of the alert rule. |
group |
Name of the alert group the alert belongs to. |
interval |
How often the rule should be evaluated. Pattern:"[0-9]+(ms |[smhdwy]" Example:"5s" In Prometheus the the interval can specified per alert group. So the alert alert group for Prometheus is calculated via {alert_group}_{interval}. |
expr |
Alert evaluation expression in promql |
labels |
Key, Value pairs of labels that should be applied. The labels clause allows specifying a set of additional labels to be attached to the alert. Any existing conflicting labels will be overwritten. The label values can be templated (see templating). |
annotations |
Key, Value pairs of annotations that should be applied. The annotations clause specifies a set of informational labels that can be used to store longer additional information such as alert descriptions or runbook links. The annotation values can be templated (see templating) |
for |
Alerts are considered firing once they have been returned for this long. Alerts which have not yet fired for long enough are considered pending. Pattern:"[0-9]+(ms |[smhdwy]" Example:"30s" |
level |
This is an explicit annotation label with the label name level.
This is used to specify the severity: |
summary |
This is an explicit annotation label with the label name summary. The annotation values can be templated (see templating). |
description |
This is an explicit annotation label with the label name description. The annotation values can be templated (see templating). |
3.2.2. Configuration
The configuration of the Metrics can be done in various ways.
3.2.2.1. Configuring Alert Using CLI
Syntax
Command arguments
<name> | The name of the alert rule. That is the unique identifier for the rule. |
---|---|
<group> |
Name of the alert group the alert belongs to. The alert group helps to structure the alerts. |
<interval> |
How often the rule should be evaluated. Pattern:"[0-9]+(ms |[smhdwy]" Example:"5s" In Prometheus the the interval can specified per alert group. So the alert alert group for Prometheus is calculated via {alert_group}_{interval}. |
<expr> |
Alert evaluation expression in promql |
<label> |
Key, Value pairs of labels that should be applied. The labels clause allows specifying a set of additional labels to be attached to the alert. Any existing conflicting labels will be overwritten. The label values can be templated (see templating). |
<annotations> |
Key, Value pairs of annotations that should be applied. The annotations clause specifies a set of informational labels that can be used to store longer additional information such as alert descriptions or runbook links. The annotation values can be templated (see templating) |
<for> |
Alerts are considered firing once they have been returned for this long. Alerts which have not yet fired for long enough are considered pending. Pattern:"[0-9]+(ms |[smhdwy]" Example:"30s" |
<level> |
This is an explicit annotation label with the label name level. This is used to specify the severity: 1.Alert The annotation value can be templated (see templating) |
<summary> |
This is an explicit annotation label with the label name summary. The annotation values can be templated (see templating). |
<description> |
This is an explicit annotation label with the label name description. The annotation values can be templated (see templating). |
Example
admin@rtbrick: cfg> set time-series alert sample_alert admin@rtbrick: cfg> set time-series alert sample_alert group hardware_metrics admin@rtbrick: cfg> set time-series alert sample_alert for 30s admin@rtbrick: cfg> set time-series alert sample_alert interval 5s admin@rtbrick: cfg> set time-series alert sample_alert expr avg_over_time(cpu_temperature_celcius[1m])>100 admin@rtbrick: cfg> set time-series alert sample_alert level 2 admin@rtbrick: cfg> set time-series alert sample_alert summary "Element {{ $labels.element_name }} CPU {{$labels.cpu}} HIGH temperature" admin@rtbrick: cfg> set time-series alert sample_alert description "Cpu {{ $labels.cpu }} of element {{ $labels.element_name }} has a temperature o ver 100 for more than 30 seconds" admin@rtbrick: cfg> set time-series alert sample_alert labels device:leaf1 admin@rtbrick: cfg> set time-series alert sample_alert annotations "sample-annotation-key:sample-value"
3.3. Enabling/Disabling Time Series Database History
In every Brick Daemon, the history of time series databases can be enabled. By default, time series database history is disabled.
Syntax
set time-series history-status <option>
Attribute | Description |
---|---|
[disable|enable] |
Enable or disable time series database history. Time series database history is disabled, by default. |
Example:
supervisor@S2-STD-7-7006>bm06-tst.fsn.rtbrick.net: cfg> show datastore confd table global.time-series.config Object: 0, Sequence 3, Last update: Mon May 23 09:02:28 GMT +0000 2022 Attribute Type Length Value configuration_name (1) string (9) 8 rtbrick time-series-history-enable (2) boolean (6) 1 False
3.3.1. Graylog Alert Distribution
The alertmanager on RBFS is configured to send alerts to CTRLD.

CTRLD therefore has an endpoint where the alerts are sent to. CTRLD translates the notification and forwards the message to the configured log management system. The instance used for forwarding is "prometheus".
4. TSDB Operational Commands
The TSDB show commands provide detailed information about the TSDB operations.
4.1. TSDB Show Commands
The TSDB show commands display data about the alerts and metrics of time-series.
4.1.1. Time-series Alert
This command displays time-series alert information in a tabular format. Key information is displayed in the summary output.
Syntax:
show time-series alert <options>
Option | Description |
---|---|
- |
Without any option, the command lists all firing alerts. |
<alert-rule> |
Lists all alerts for the given alert rule. |
Example 1: Summary view of all firing alerts
supervisor@rtbrick>SPINE01: op> show time-series alert Since Level Summary 26-APR-2023 05:50:00 Warning Chassis temperature is 63..C 26-APR-2023 05:49:44 Error Fan speed 0rpm of fan PSU-1-Fan is below 1000rpm. supervisor@rtbrick>SPINE01: op>
Example 2: Summary view of the alert rule "chassis_temperature_warning"
supervisor@rtbrick>SPINE01: op> show time-series alert chassis_temperature_warning Alert: chassis_temperature_warning Firing since 26-APR-2023 05:50:00 Level: Warning Summary: Chassis temperature is 62..C Description: At least one chassis temperature sensor reports 62..C which exceeds threshold of 50..C. Labels: alertname : chassis_temperature_warning bd_name : resmond element_name : ufi08.q2c.u23.r4.nbg.rtbrick.net Annotations: clear_threshold : 50 description : At least one chassis temperature sensor reports 62..C which exceeds threshold of 50..C. level : 4 summary : Chassis temperature is 62..C threshold : 55 unit : CELSIUS value : 62 supervisor@rtbrick>SPINE01: op>
4.1.2. Time-series Metric
This command displays metrics of time-series.
Syntax:
show time-series metric <options>
Attribute | Description |
---|---|
<metric-name> |
Lists the lastly sampled values for the specified metric. |
Example: Summary view of the metric for chassis_fan_speed_rpm
ssupervisor@rtbrick>SPINE01: op> show time-series metric chassis_fan_speed_rpm Metric: chassis_fan_speed_rpm Value: 6100 sampled at 26-APR-2023 10:19:54 Labels: bd : resmond bd_name : resmond element_name : ufi05.q2c.u11.r4.nbg.rtbrick.net fan : Chassis Fan - 0 instance : localhost:11012 job : bds pod_name : DTSL status : 9 Value: 6200 sampled at 26-APR-2023 10:19:54 Labels: bd : resmond bd_name : resmond element_name : ufi05.q2c.u11.r4.nbg.rtbrick.net fan : Chassis Fan - 1 instance : localhost:11012 job : bds pod_name : DTSL status : 9 Value: 6100 sampled at 26-APR-2023 10:19:54 Labels: bd : resmond bd_name : resmond element_name : ufi05.q2c.u11.r4.nbg.rtbrick.net fan : Chassis Fan - 2 instance : localhost:11012 job : bds pod_name : DTSL status : 9 Value: 6100 sampled at 26-APR-2023 10:19:54 Labels: bd : resmond bd_name : resmond element_name : ufi05.q2c.u11.r4.nbg.rtbrick.net fan : Chassis Fan - 3 instance : localhost:11012 job : bds pod_name : DTSL status : 9
©Copyright 2023 RtBrick, Inc. All rights reserved. The information contained herein is subject to change without notice. The trademarks, logos and service marks ("Marks") displayed in this documentation are the property of RtBrick in the United States and other countries. Use of the Marks are subject to RtBrickās Term of Use Policy, available at https://www.rtbrick.com/privacy. Use of marks belonging to other parties is for informational purposes only.