Skip to content

InfiniBand

InfiniBand refers to two distinct things - The physical link-layer protocol for InfiniBand networks - The InfiniBand Verbs API, an implementation of the remote direct memory access (RDMA) technology - [Day25] Device Plugin - RDMA

RDMA provides access between the main memory of two computers without involving an operating system, cache, or storage. Using RDMA, data transfers with high-throughput, low-latency, and low CPU utilization.

Troubleshoot

To troubleshoot InfiniBand network issues using Mellanox, you can follow these steps and use specific commands:

  1. Verify the Mellanox InfiniBand Network Configuration:
  2. Run the command ibstatus to check the overall status of the InfiniBand fabric and the connected nodes.
  3. Use ibhosts to view the list of hosts and their corresponding GUIDs (Globally Unique Identifier) in the InfiniBand fabric.
  4. Run ibnetdiscover to discover the topology of the InfiniBand fabric and identify any connectivity issues.

  5. Check the Link Status and Performance:

  6. Use ibstatus to check the link status of the InfiniBand adapters.
  7. Run ibcheckstate -l to check the state of the links in the fabric.
  8. Use iblinkinfo to gather information about the state and quality of the links in the InfiniBand fabric.
  9. Run ibdiagnet to perform a comprehensive diagnostic test on the InfiniBand fabric, including link and cable testing.

  10. Analyze Performance and Latency:

  11. Use ibperf to measure the throughput and latency of the InfiniBand fabric.
  12. Run ib_read_bw and ib_write_bw to measure the read and write bandwidth respectively.
  13. Use ib_read_lat and ib_write_lat to measure the read and write latency respectively.
  14. Run perfquery to obtain performance counters from the InfiniBand adapters.

  15. Diagnose Errors and Issues:

  16. Use ibcheckerrors to check for any InfiniBand-specific errors and correctable errors.
  17. Run ibdiagnet to perform a comprehensive diagnostic test and identify potential issues within the fabric.
  18. Use ibstat and iblinkinfo to check for any link errors or problems with the InfiniBand adapters.
ibstat
sudo ibnodes
sudo ibswitches
sudo ibnetdiscover
sudo iblinkinfo
sudo perfquery -a 14
sudo ibqueryerrors --report-port --data
sudo ibqueryerrors --counters
sudo sminfo
ibv_devices
ibv_devinfo -d mlx5_0
ibstatus
ibstat
ibstat mlx5_0
sudo ibqueryerrors
sudo ibqueryerrors --counters
sudo ibnetdiscover
sudo ibnodes
sudo perfquery -a 14
sudo smpquery nodeinfo 14
sudo smpquery portinfo 14 1
ibportstate
sudo ibswitches

command not found

ibcheckstate
ibdiagnet
ibperf
ibcheckerrors

test bandwidth 第一個是server 第二個是client, 在client指定server

ib_read_bw -a -R -z -d mlx5_0 
ib_read_bw -a -R -z -d mlx5_0 167.123.200.1

check current NIC driver version and firmware version ``` bash= interface_name=ib0 driver_version=$(ethtool -i ${interface_name} | grep version | head -n 1 | awk '{print $2}') firmware_version=$(ethtool -i ${interface_name} | grep firmware-version | awk '{print $2}')

install the nic driver
``` bash
# install the nic driver
os=LINUX
vendor=MLNX_OFED
install_driver_version=5.4-3.6.8.1
install_driver_version_full=${vendor}_${os}-${install_driver_version}
distribution_name=$(cat /etc/os-release | grep ID= | head -n 1 | awk -F = '{print $2}' | tr -d '"')
distribution_version=$(cat /etc/os-release | grep VERSION_ID= | awk -F = '{print $2}' | tr -d '"')
distribution=${distribution_name}${distribution_version}
platform=$(uname -m)

wget https://content.mellanox.com/ofed/${vendor}-${install_driver_version}/${install_driver_version_full}-${distribution}-${platform}.tgz
tar zxvf ${install_driver_version_full}-${distribution}-${platform}.tgz && cd ${install_driver_version_full}-${distribution}-${platform}/
sudo ./mlnxofedinstall --force --with-nfsrdma --add-kernel-support
# To load the new driver, run:
sudo /etc/init.d/openibd restart

install the nic firmware

# install the nic firmware
pci_bus_id=81:00.0
install_firmware_version=16_25_1020
ordering_part_numbers=MCX516A-CCA
firmware_name=fw-ConnectX5-rel-${install_firmware_version}-${ordering_part_numbers}_Ax-UEFI-14.18.19-FlexBoot-3.5.701.bin
wget -Nq -O ${firmware_name}.zip http://www.mellanox.com/downloads/firmware/${firmware_name}.zip
unzip -o ${firmware_name}.zip
sudo mstflint -d ${pci_bus_id} -i ${firmware_name} -y b
# reset fw
sudo mstfwreset -d ${pci_bus_id} -y r

MTU Here are the instructions of changing the mtu for the InfiniBand network. Document: https://linux.die.net/man/8/opensm, https://docs.nvidia.com/networking/display/MLNXOFEDv461000/OpenSM

Modified the OpenSM partitions configuration file. This operation only needs to be done on the node that has opensm. Changing the value of mtu from 4 to 5.

$ vim /etc/rdma/partitions.conf

# mtu =
#   1 = 256
#   2 = 512
#   3 = 1024
#   4 = 2048
#   5 = 4096
#
# rate =
#   2  = 2.5   GBit/s (SDR 1x)
#   3  =  10   GBit/s (SDR 4x/QDR 1x)
#   4  =  30   GBit/s (SDR 12x)
#   5  =   5   GBit/s (DDR 1x)
#   6  =  20   GBit/s (DDR 4x)
#   7  =  40   GBit/s (QDR 4x)
#   8  =  60   GBit/s (DDR 12x)
#   9  =  80   GBit/s (QDR 8x)
#   10 = 120   GBit/s (QDR 12x)
# If ExtendedLinkSpeeds are supported, then these rate values are valid too
#   11 =  14   GBit/s (FDR 1x)
#   12 =  56   GBit/s (FDR 4x)
#   13 = 112   GBit/s (FDR 8x)
#   14 = 168   GBit/s (FDR 12x)
#   15 =  25   GBit/s (EDR 1x)
#   16 = 100   GBit/s (EDR 4x)
#   17 = 200   GBit/s (EDR 8x)
#   18 = 300   GBit/s (EDR 12x)

Default=0x7fff, rate=3, mtu=5, scope=2, defmember=full:
        ALL, ALL_SWITCHES=full;
Default=0x7fff, ipoib, rate=3, mtu=5, scope=2:
# restart the opensm
$ sudo systemctl restart opensm

# ensure the opensm is working
$ sudo systemctl status opensm

# ensure the mtu becomes 4096
$ ip a

Unmanaged Switch

ibswitches
flint -d lid-103 q