NVSDM API Reference
The user guide for the NVSDM library.
1. Introduction
NVSwitch Device Monitoring (NVSDM) is a library for monitoring NVSwitch devices on NVIDIA Blackwell systems. NVSDM API provides a wide range of telemetry including, but not limited to, device health, port counters, and PCIe statistics.
The NVSDM package also contains the experimental nvsdm_cli
utility. This
utility provides a convenient way to utilize the NVSDM library.
Note
The nvsdm_cli is an experimental tool and is subject to change and/or removal without notice.
Note
NVSDM does not currently support ethernet devices.
1.1. Change log of NVSDM library
This chapter list changes in API and bug fixes that were introduced to the library
1.1.1. Changes between NVSDM v1.2.0 and v1.3.0
Added new ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_INBOUND_BYTES
Added new ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_OUTBOUND_BYTES
Updated doxygen documentation to fix warnings and add better grouping support.
1.1.2. Changes between NVSDM v1.1.0 and v1.2.0
Added a new API to retrieve “local” port number: nvsdmPortGetLocalNum
Modified nvsdmDeviceGetFirmwareVersion to also retrieve firmware versions for ConnectX HCA in addition to switches
Added support for 4 “extended” (i.e. 64b) PMA counters:
NVSDM_PORT_TELEM_CTR_EXT_XMIT_DATA
NVSDM_PORT_TELEM_CTR_EXT_RCV_DATA
NVSDM_PORT_TELEM_CTR_EXT_XMIT_PKTS
NVSDM_PORT_TELEM_CTR_EXT_RCV_PKTS
1.1.3. Changes between NVSDM v1.0 and v1.1.0
Added nvsdmSetLogFile to specify a log file.
Added nvsdmDeviceGetFirmwareVersion to retrieve the firmware version for a given switch.
Added nvsdmDeviceGetTelemetryValues to retrieve telemetry from a device.
Added a new telemetry type: NVSDM_TELEM_TYPE_CONNECTX for ConnectX device telemetry.
1.2. Known issues in the current version of NVSDM library
This is a list of known NVSDM issues in the current release:
The following ConnectX inbound and outbound byte counters are calculated over a very short period of time instead of the intended behavior of being calculated over the lifetime of the NVSDM library.
ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_INBOUND_BYTES
ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_OUTBOUND_BYTES
2. Getting Started
2.1. System Requirements
NVSDM is supported on Linux x86_64 platforms.
The following software dependencies are required to run NVSDM:
libmnl-dev
libibmad-dev
libibumad-dev
The following software dependencies are optional and are used to query additional ConnectX telemetry:
libdoca-sdk-telemetry-dev
fwctl-dkms
Note: these optional dependencies are provided by the NVIDIA DOCA repository
Note: NVSDM does not depend on any other libraries or headers from the CUDA toolkit.
2.2. Installation
NVSDM can be installed from the CUDA Toolkit Installer. Once you have added the CUDA package repository to your system, you can install NVSDM as follows:
For Debian and Ubuntu based OS distributions:
sudo apt-get install -y libnvsdm-<driver-branch>
For Red Hat Enterprise Linux 8/9 based OS distributions:
sudo dnf install libnvsdm-<driver-branch>
The libnvsdm installer package installs the following components:
/usr/lib/x86_64-linux-gnu/include/nvsdm.h
/usr/lib/x86_64-linux-gnu-bin/nvsdm_cli
/usr/lib/x86_64-linux-gnu/libnvsdm.so.1
/usr/lib/x86_64-linux-gnu/libnvsdm.so
/usr/share/doc/libnvsdm-<version>/README
/usr/share/doc/libnvsdm-<version>/third-party-notices.txt
2.3. Using the NVSDM API
Please see the API reference for details on how to use NVSDM.