Skip to content

NVIDIA UFM

NVIDIA®UFM is a host-based solution that provides all the management functionalities required for managing fabrics. - NVIDIA UFM Enterprise (Software) - Software-only platform (deployable via containers or virtual machines). - NVIDIA UFM Enterprise Appliance - Dedicated 1U rack-mountable hardware appliance pre-installed with UFM Enterprise software - NVIDIA UFM XDR Enterprise Appliance - Dedicated 1U rack-mountable hardware appliance (Gen 3.5) pre-installed with UFM Enterprise software, optimized for XDR

DOCA

DOCA is NVIDIA’s "Data Center on a Chip Architecture" software platform for BlueField DPUs. It provides SDKs, drivers, libraries, and tools to offload networking, security, storage, and telemetry workloads from the CPU to the DPU, improving performance and isolation.

Standalone mode

The default UFM® installation directory is /opt/ufm - /opt/ufm/conf/gv.cfg - Optional Configurations - /opt/ufm/files/licenses - /opt/ufm/files/log - /opt/ufm/files/log/ufm.log

Ubuntu 24.04

cat <<EOF > install_ufm.sh
#!/bin/bash
mkdir -p /tmp/ufm
wget http://172.19.30.2/ISO/nvidia/ufm/ufm-6.19.4-3.ubuntu24.x86_64.tgz
tar xvf ufm-6.19.4-3.ubuntu24.x86_64.tgz -C /tmp/ufm

# Prerequisites for UFM Server Software Installation
sudo apt-get install -y acl apache2 bc chrpath cron dos2unix gawk lftp libcairo2 libcurl4 logrotate python3 qperf rsync snmpd sqlite3 sshpass ssl-cert sudo telnet zip

# NVIDIA Host InfiniBand Networking packages provided by DOCA package
# https://developer.nvidia.com/doca-downloads?deployment_platform=Host-Server&deployment_package=DOCA-Host&target_os=Linux&Architecture=x86_64&Profile=doca-all&Distribution=Ubuntu&version=24.04&installer_type=deb_local
wget https://www.mellanox.com/downloads/DOCA/DOCA_v3.1.0/host/doca-host_3.1.0-091000-25.07-ubuntu2404_amd64.deb
sudo dpkg -i doca-host_3.1.0-091000-25.07-ubuntu2404_amd64.deb
sudo apt-get update
sudo apt-get install -y doca-ufm doca-kernel

cd /tmp/ufm/ufm-6.19.4-3.ubuntu24.x86_64 && sudo ./install.sh -q
EOF

bash install_ufm.sh

config the ib interface name in config file

cat /opt/ufm/conf/gv.cfg | grep 'fabric_interface'
# fabric_interface = ib0
sudo /opt/ufm/scripts/change_fabric_config.sh -i ibp129s0
cat /opt/ufm/conf/gv.cfg | grep 'fabric_interface'
# fabric_interface = ibp129s0

systemctl start ufm-enterprise.service

Redhat 9.7

subscription: register a node

sudo subscription-manager register --username <yourusername>
sudo subscription-manager identity

disable SELinux

sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
sudo reboot

cat <<EOF > install_ufm.sh
#!/bin/bash

mkdir -p /tmp/ufm
wget http://172.31.59.4/ISO/nvidia/ufm/ufm-6.19.4-3.el9.x86_64.tgz
tar xvf ufm-6.19.4-3.el9.x86_64.tgz -C /tmp/ufm

# Prerequisites for UFM Server Software Installation
sudo dnf install -y \
acl \
apr-util-openssl \
bc \
dos2unix \
gnutls \
httpd \
iptables-nft \
jansson \
lftp \
libnsl \
libxml2 \
python3 \
libxslt \
mod_session \
rsync \
mod_ssl \
net-snmp \
net-snmp-libs \
net-snmp-utils \
net-tools \
sudo \
php \
telnet \
psmisc \
zip \
sshpass \
sqlite
sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
sudo dnf install -y qperf

# Remove MLNX_OFED OpenSM before UFM installation
pkgs=$(rpm -qa | grep -i opensm || true)
if [[ -n "${pkgs}" ]]; then
  echo "Removing OpenSM packages:"
  echo "${pkgs}"
  sudo rpm -e ${pkgs}
else
  echo "No OpenSM packages installed."
fi

# NVIDIA Host InfiniBand Networking packages provided by DOCA package

# https://developer.nvidia.com/doca-downloads?deployment_platform=Host-Server&deployment_package=DOCA-Host&target_os=Linux&Architecture=x86_64&Profile=doca-all&Distribution=RHEL-Rocky&version=9&installer_type=rpm_local
wget https://www.mellanox.com/downloads/DOCA/DOCA_v3.2.1/host/doca-host-3.2.1-044000_25.10_rhel9.x86_64.rpm
sudo rpm -i doca-host-3.2.1-044000_25.10_rhel9.x86_64.rpm
sudo dnf clean all
sudo dnf -y install doca-ufm doca-kernel

cd /tmp/ufm/ufm-6.19.4-3.el9.x86_64/ && sudo ./install.sh -q
EOF

bash install_ufm.sh

install doca-kernel with doca-host-3.2.1-044000_25.10_rhel9.x86_64.rpm will trigger the kernel upgrade 5.14.0-611.5.1.el9_7.x86_64 to 5.14.0-611.27.1.el9_7.x86_64 after reboot

and will encounter bellow message

Autoinstall on 5.14.0-611.27.1.el9_7.x86_64 succeeded for module(s) kernel-mft knem mlnx-ofa_kernel srp xpmem.
Autoinstall on 5.14.0-611.27.1.el9_7.x86_64 failed for module(s) iser(10) isert(10).

rebuild the doca-kernel for the specific linux kernel verion

cat <<'EOF' > install_ufm.sh
#!/bin/bash

mkdir -p /tmp/ufm
wget http://172.31.59.4/ISO/nvidia/ufm/ufm-6.19.4-3.el9.x86_64.tgz
tar xvf ufm-6.19.4-3.el9.x86_64.tgz -C /tmp/ufm

# Prerequisites for UFM Server Software Installation
sudo dnf install -y acl apr-util-openssl bc dos2unix gnutls httpd iptables-nft jansson lftp libnsl libxml2 python3 libxslt mod_session rsync mod_ssl net-snmp net-snmp-libs net-snmp-utils net-tools sudo php telnet psmisc zip sshpass sqlite
sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
sudo dnf install -y qperf

# Remove MLNX_OFED OpenSM before UFM installation
pkgs=$(rpm -qa | grep -i opensm || true)
if [[ -n "${pkgs}" ]]; then
    echo "Removing OpenSM packages:"
    echo "${pkgs}"
    sudo rpm -e ${pkgs}
else
    echo "No OpenSM packages installed."
fi

# NVIDIA Host InfiniBand Networking packages provided by DOCA package
# https://developer.nvidia.com/doca-downloads?deployment_platform=Host-Server&deployment_package=DOCA-Host&target_os=Linux&Architecture=x86_64&Profile=doca-all&Distribution=RHEL-Rocky&version=9&installer_type=rpm_local
sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
wget https://www.mellanox.com/downloads/DOCA/DOCA_v3.2.1/host/doca-host-3.2.1-044000_25.10_rhel9.x86_64.rpm
sudo rpm -i doca-host-3.2.1-044000_25.10_rhel9.x86_64.rpm
# https://docs.nvidia.com/doca/sdk/doca-host-installation-and-upgrade/index.html#src-4410449037_id-.DOCAHostInstallationandUpgradev3.2.0LC-DOCAExtraPackageanddoca-kernel-support
# rebuild the doca-kernel for the specific linux kernel verion 
sudo dnf install -y doca-extra
sudo /opt/mellanox/doca/tools/doca-kernel-support

# handle the multiple build directory, get the first one
IFS= read -r -d '' RPM < <(find /tmp -maxdepth 2 -type f -path '/tmp/DOCA.*/*' -name 'doca-kernel-repo-*.rpm' -print0 2>/dev/null)
echo "$RPM"
BASE=$(basename "$RPM")
VERSION=${BASE#*kver.}
VERSION=${VERSION%.x86_64.rpm}
echo ${VERSION}
sudo rpm -Uvh $RPM
sudo dnf makecache
sudo dnf install -y --disablerepo=doca doca-kernel-${VERSION}
sudo dnf -y install doca-ufm

cd /tmp/ufm/ufm-6.19.4-3.el9.x86_64/ && sudo ./install.sh -q
EOF

FAQ

  • Device "ib0" does not exist.
    • install the actual ib card on the machine
    • if there is a ib card but the default interface name is not 'ib0'
  • Other SM is in the fabric
    • stop the Subnet Manager(SM) on the managed ib switch
  • Can not access the web console
    • check the firewall

HA mode

Ubuntu 24.04

cat <<EOF > install_ufm_ha.sh
#!/bin/bash
sudo apt install -y pcs pacemaker drbd-utils resource-agents-extra
wget https://www.mellanox.com/downloads/UFM/ufm_ha_6.0.1-1.tgz
tar xvf ufm_ha_6.0.1-1.tgz -C /tmp
cd /tmp/ufm_ha_6.0.1-1
if [ -b /dev/nvme0n1p3 ]; then
  sudo ./install.sh -l /opt/ufm/files/ -d /dev/nvme0n1p3 -p enterprise
elif [ -b /dev/sda3 ]; then
  sudo ./install.sh -l /opt/ufm/files/ -d /dev/sda3 -p enterprise
else
  echo "Error: neither /dev/nvme0n1p3 nor /dev/sda3 found."
  exit 1
fi
sudo systemctl daemon-reload
EOF
bash install_ufm_ha.sh
Validating DRBD disk /dev/nvme0n1p3
Validating sync directory /opt/ufm/files/
================================================
Mon Jan 26 06:18:12 AM UTC 2026
================================================
 [*] Installing UFM HA for enterprise [*]
 -----------------------------------------------
[*] Disabling pcs UI
[*] Installing package files
Installing Debian Package (OS=ubuntu).
Selecting previously unselected package ufm-ha.
(Reading database ... 135555 files and directories currently installed.)
Preparing to unpack ufm-ha_6.0.1-1_all.deb ...
 -D- Checking package: pcs
 -D- Checking package: pacemaker
 -D- Checking package: corosync
 -D- Checking package: e2fsprogs
 -D- Checking package: bind9-host
 -D- Checking package: resource-agents-base
 -D- Checking package: resource-agents-extra
 -D- Checking package: drbd-utils
 -D- Checking package: python3/python
Unpacking ufm-ha (6.0.1-1) ...
Setting up ufm-ha (6.0.1-1) ...
--------------------
Setting configuration for product: enterprise
--------------------
Backing up package file(s): ufm-ha_6.0.1-1_all.deb ==> /opt/backup/ufm_ha/backup_6.0.1-1
Installation files successfully backed up to /opt/backup/ufm_ha/backup_6.0.1-1.
Done. Please run: 'systemctl daemon-reload' to load new HA services
To See full installation logs please check: /var/log/ufm_ha/ufm_ha_install_20260126_0618.log

Redhat 9.7

unmount the default drdb partition when installation config

cat <<'EOF' > unmount_drdb_partition.sh
sudo umount /drdb 2>/dev/null; echo "Done"
sudo sed -i '/\/drdb/s/^/#/' /etc/fstab
cat /etc/fstab
sudo systemctl daemon-reload
lsblk
sudo rmdir /drdb
EOF

bash unmount_drdb_partition.sh
cat <<'EOF' > install_ufm_ha.sh
#!/bin/bash
# Step 1: Enable the High Availability repository (for pcs, pacemaker, resource-agents)
sudo subscription-manager repos --enable=rhel-9-for-x86_64-highavailability-rpms
# Step 2: Install ELRepo for DRBD packages
sudo dnf install -y https://www.elrepo.org/elrepo-release-9.el9.elrepo.noarch.rpm
sudo dnf --enablerepo=elrepo install -y drbd9x-utils kmod-drbd9x
# Step 3: Install HA cluster packages (after enabling HA repo)
sudo dnf install -y pcs pacemaker resource-agents
wget https://www.mellanox.com/downloads/UFM/ufm_ha_6.0.1-1.tgz
tar xvf ufm_ha_6.0.1-1.tgz -C /tmp
cd /tmp/ufm_ha_6.0.1-1
if [ -b /dev/nvme0n1p3 ]; then
    sudo ./install.sh -l /opt/ufm/files/ -d /dev/nvme0n1p3 -p enterprise
elif [ -b /dev/sda3 ]; then
    sudo ./install.sh -l /opt/ufm/files/ -d /dev/sda3 -p enterprise
else
    echo "Error: neither /dev/nvme0n1p3 nor /dev/sda3 found."
    exit 1
fi
sudo systemctl daemon-reload
EOF
bash install_ufm_ha.sh

Can be used in cases where only one link is available among the two UFM HA nodes/servers.

secondary node

foo@ufm-ubuntu-secondary:~$ sudo ufm_ha_cluster config \
--role standby \
--peer-primary-ip 172.31.32.170 \
--local-primary-ip 172.31.32.113 \
--no-vip \
--enable-single-link
[sudo] password for foo:

Warning: It is recommended to use a multilink connection for optimal stability and performance.
By using a single link, it may result in decreased system stability and increased risk of connection issues.
Do you want to proceed? [Y/n]: Y

Node: 1
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.31.32.113]
   SECONDARY_IP:              []
Node: 2
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.31.32.170]
   SECONDARY_IP:              []
Virtual IP(v4):               []
Virtual IP(v6):               []
HACLUSTER_PWD:                [******]
PRIMARY_IP_TYPE:              [ipv4]
SECONDARY_IP_TYPE:            []

DRBD_RESOURCE:                [ha_data]
DRBD_PORT:                    [7788]
DRBD_DISK:                    [/dev/sda3]
DRBD_DEVICE:                  [/dev/drbd0]
DATA_SYNC_DIR:                [/opt/ufm/files/]
DRBD_DATA_MODE:               [ordered]

SCRIPT_PRE_START:             []
SCRIPT_PRE_CONFIG:            []
SCRIPT_POST_CONFIG:           []
SCRIPT_POST_CLEANUP:          []
GENERAL_SETTINGS_DIR:         [/opt/ufm/files//ufm_ha]
GENERAL_UFM_CONF:             [/opt/ufm/files//conf/gv.cfg]
Setting Cluster Password
New password: Retype new password: passwd: password updated successfully
Checking filesystem on drbd disk
fsck from util-linux 2.39.3
e2fsck 1.47.0 (5-Feb-2023)
/dev/sda3: clean, 11/1310720 files, 126322/5242880 blocks
Checking drbd kernel module is installed
Running pre-configuration checks...
pcsd service is running
stop drbd service
Enabling service: pcsd
Synchronizing state of pcsd.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pcsd
Enabling service: pacemaker
Synchronizing state of pacemaker.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pacemaker
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service  /usr/lib/systemd/system/pacemaker.service.
Enabling service: corosync
Synchronizing state of corosync.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync
Created symlink /etc/systemd/system/multi-user.target.wants/corosync.service  /usr/lib/systemd/system/corosync.service.
Disabling service ufm-ha-watcher, as it will be controlled by pacemaker!
Disabling service ufm-enterprise, as it will be controlled by pacemaker!
******************************
** Configuring DRBD Service
******************************
Set global drbd conf file
Creating drbd resources
Creating drbd resource /tmp/ha_data.res file
excluding devices  drbd0-drbd9 from multipathd  service configuration
Copy drbd resource file
cp -f /tmp/ha_data.res /etc/drbd.d/
Preparing drbd disk
unmount partition /opt/ufm/files/
Create metadata file
files=/opt/ufm/files/
Copy files from /opt/ufm/files/ to /ha_backup/20260126_0748/
Deleting files from /opt/ufm/files/
total 8
drwxr-xr-x  2 ufmapp ufmapp 4096 Jan 26 07:48 .
drwxr-xr-x 31 root   root   4096 Jan 26 07:39 ..
umount: /opt/ufm/files/: not mounted.
1+0 records in
1+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0320172 s, 328 MB/s
load drbd module
modprobe drbd
bringup drbd resources
initializing bitmap (640 KB) to all zero
initializing activity log
Writing meta data...
New drbd meta data block successfully created.
ha_data role:Secondary suspended:no
    write-ordering:drain
  volume:0 minor:0 disk:Inconsistent
      size:20970844 read:0 written:0 al-writes:8 bm-writes:0 upper-pending:0
      lower-pending:0 al-suspended:yes blocked:no
  peer connection:Connecting role:Unknown congested:no
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
        received:0 sent:0 out-of-sync:20970844 pending:0 unacked:0

ha_data role:Secondary suspended:no
    write-ordering:drain
  volume:0 minor:0 disk:Inconsistent
      size:20970844 read:0 written:0 al-writes:8 bm-writes:0 upper-pending:0
      lower-pending:0 al-suspended:yes blocked:no
  peer connection:Connecting role:Unknown congested:no
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
        received:0 sent:0 out-of-sync:20970844 pending:0 unacked:0

remove_ha_data_from_fstab
Reserve current node IP address: 172.31.32.113
[*] Configuring HA cluster done.
--------------------------------
To check HA cluster status please use: ufm_ha_cluster status
foo@ufm-ubuntu-secondary:~$

primary node

foo@ufm-ubuntu-primary:~$ sudo ufm_ha_cluster config \
--role master \
--peer-primary-ip 172.31.32.113 \
--local-primary-ip 172.31.32.170 \
--no-vip \
--enable-single-link

Warning: It is recommended to use a multilink connection for optimal stability and performance.
By using a single link, it may result in decreased system stability and increased risk of connection issues.
Do you want to proceed? [Y/n]: Y

Check if IP address 172.31.32.170 is listening to pcsd's port number 2224.
Check if IP address 172.31.32.113 is listening to pcsd's port number 2224.
Node: 1
   NODE_ROLE:                 [master]
   PRIMARY_IP:                [172.31.32.170]
   SECONDARY_IP:              []
Node: 2
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.31.32.113]
   SECONDARY_IP:              []
Virtual IP(v4):               []
Virtual IP(v6):               []
HACLUSTER_PWD:                [******]
PRIMARY_IP_TYPE:              [ipv4]
SECONDARY_IP_TYPE:            []

DRBD_RESOURCE:                [ha_data]
DRBD_PORT:                    [7788]
DRBD_DISK:                    [/dev/nvme0n1p3]
DRBD_DEVICE:                  [/dev/drbd0]
DATA_SYNC_DIR:                [/opt/ufm/files/]
DRBD_DATA_MODE:               [ordered]

SCRIPT_PRE_START:             []
SCRIPT_PRE_CONFIG:            []
SCRIPT_POST_CONFIG:           []
SCRIPT_POST_CLEANUP:          []
GENERAL_SETTINGS_DIR:         [/opt/ufm/files//ufm_ha]
GENERAL_UFM_CONF:             [/opt/ufm/files//conf/gv.cfg]
Creating HA Settings File
Done
Setting Cluster Password
New password: Retype new password: passwd: password updated successfully
Checking filesystem on drbd disk
fsck from util-linux 2.39.3
e2fsck 1.47.0 (5-Feb-2023)
/dev/nvme0n1p3: clean, 11/1310720 files, 126322/5242880 blocks
Checking drbd kernel module is installed
Running pre-configuration checks...
pcsd service is running
stop drbd service
Enabling service: pcsd
Synchronizing state of pcsd.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pcsd
Enabling service: pacemaker
Synchronizing state of pacemaker.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pacemaker
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service  /usr/lib/systemd/system/pacemaker.service.
Enabling service: corosync
Synchronizing state of corosync.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync
Created symlink /etc/systemd/system/multi-user.target.wants/corosync.service  /usr/lib/systemd/system/corosync.service.
Disabling service ufm-ha-watcher, as it will be controlled by pacemaker!
Disabling service ufm-enterprise, as it will be controlled by pacemaker!
******************************
** Configuring DRBD Service
******************************
Set global drbd conf file
Creating drbd resources
Creating drbd resource /tmp/ha_data.res file
excluding devices  drbd0-drbd9 from multipathd  service configuration
Copy drbd resource file
cp -f /tmp/ha_data.res /etc/drbd.d/
Preparing drbd disk
unmount partition /opt/ufm/files/
Create metadata file
files=/opt/ufm/files/
Copy files from /opt/ufm/files/ to /ha_backup/20260126_0750/
Deleting files from /opt/ufm/files/
total 8
drwxr-xr-x  2 ufmapp ufmapp 4096 Jan 26 07:50 .
drwxr-xr-x 31 root   root   4096 Jan 26 07:38 ..
umount: /opt/ufm/files/: not mounted.
1+0 records in
1+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0149752 s, 700 MB/s
load drbd module
modprobe drbd
bringup drbd resources
initializing bitmap (640 KB) to all zero
initializing activity log
Writing meta data...
New drbd meta data block successfully created.
ha_data role:Primary suspended:no
    write-ordering:drain
  volume:0 minor:0 disk:UpToDate
      size:20970844 read:3180 written:0 al-writes:8 bm-writes:0 upper-pending:0
      lower-pending:0 al-suspended:yes blocked:no
  peer connection:Connecting role:Unknown congested:no
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
        received:0 sent:0 out-of-sync:20970844 pending:0 unacked:0

mke2fs 1.47.0 (5-Feb-2023)
Creating filesystem with 5242711 4k blocks and 1310720 inodes
Filesystem UUID: 68915829-68aa-445f-830a-94053dc8b698
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

mount -o data=ordered /dev/drbd0 /opt/ufm/files/
ha_data role:Primary suspended:no
    write-ordering:drain
  volume:0 minor:0 disk:UpToDate
      size:20970844 read:5453 written:139628 al-writes:8 bm-writes:0
      upper-pending:0 lower-pending:0 al-suspended:yes blocked:no
  peer connection:Connecting role:Unknown congested:no
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
        received:0 sent:0 out-of-sync:20970844 pending:0 unacked:0

******************************
** Cluster Nodes authentication
******************************
172.31.32.170: Authorized
172.31.32.113: Authorized
******************************
** Cluster Setup
******************************
No addresses specified for host '172.31.32.170', using '172.31.32.170'
No addresses specified for host '172.31.32.113', using '172.31.32.113'
Destroying cluster on hosts: '172.31.32.113', '172.31.32.170'...
172.31.32.170: Successfully destroyed cluster
172.31.32.113: Successfully destroyed cluster
Requesting remove 'pcsd settings' from '172.31.32.113', '172.31.32.170'
172.31.32.170: successful removal of the file 'pcsd settings'
172.31.32.113: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to '172.31.32.113', '172.31.32.170'
172.31.32.170: successful distribution of the file 'corosync authkey'
172.31.32.170: successful distribution of the file 'pacemaker authkey'
172.31.32.113: successful distribution of the file 'corosync authkey'
172.31.32.113: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to '172.31.32.113', '172.31.32.170'
172.31.32.170: successful distribution of the file 'corosync.conf'
172.31.32.113: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
******************************
** Starting Cluster on Nodes
******************************
172.31.32.170: Starting Cluster...
172.31.32.113: Starting Cluster...
172.31.32.170: Cluster Enabled
172.31.32.113: Cluster Enabled
******************************
** Setting Cluster Properties
******************************
Creating drbd pacemaker resources
******************************
** Creating DRBD pacemaker resources
******************************
CIB updated
Adding ha_data_drbd-clone ha_data_file_system (kind: Mandatory) (Options: first-action=promote then-action=start)
CIB updated
******************************
** Adding Systemd Services
******************************
Adding Systemd Service: ufm-ha-watcher
Adding Systemd Service: ufm-enterprise
remove_ha_data_from_fstab
Reserve current node IP address: 172.31.32.170
[*] Configuring HA cluster done.
--------------------------------
To check HA cluster status please use: ufm_ha_cluster status
Once DRBD sync is completed, you can start HA cluster using: ufm_ha_cluster start
foo@ufm-ubuntu-primary:~$
foo@ufm-ubuntu-primary:~$ sudo ufm_ha_cluster status
Cluster name: ufmcluster
Cluster Summary:
  * Stack: unknown (Pacemaker is running)
  * Current DC: ufm-ubuntu-primary (version unknown) - partition with quorum
  * Last updated: Mon Jan 26 06:33:25 2026 on ufm-ubuntu-primary
  * Last change:  Mon Jan 26 06:25:29 2026 by root via cibadmin on ufm-ubuntu-primary
  * 2 nodes configured
  * 5 resource instances configured (2 DISABLED)

Node List:
  * Online: [ ufm-ubuntu-primary US-6828-NB1 ]

Full List of Resources:
  * Clone Set: ha_data_drbd-clone [ha_data_drbd] (promotable):
    * Promoted: [ ufm-ubuntu-primary ]
    * Unpromoted: [ US-6828-NB1 ]
  * Resource Group: ufmcluster-grp:
    * ha_data_file_system       (ocf:heartbeat:Filesystem):      Started ufm-ubuntu-primary
    * ufm-ha-watcher    (systemd:ufm-ha-watcher):        Stopped (disabled)
    * ufm-enterprise    (systemd:ufm-enterprise):        Stopped (disabled)

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
DRBD_RESOURCE:                [ha_data]
DRBD_CONNECTIVITY:            [SyncSource]
DISK_STATE:                   [UpToDate]
DRBD_ROLE:                    [Primary]
PEER_DISK_STATE:              [Inconsistent]
PEER_DRBD_ROLE:               [Secondary]
DRBD Sync Status:
ha_data role:Primary suspended:no
    write-ordering:drain
  volume:0 minor:0 disk:UpToDate
      size:2147418076 read:59172037 written:1709148 al-writes:226 bm-writes:0
      upper-pending:0 lower-pending:1 al-suspended:no blocked:no
  peer connection:Connected role:Secondary congested:yes
    volume:0 replication:SyncSource peer-disk:Inconsistent done:2.77
        resync-suspended:no
        received:0 sent:59163556 out-of-sync:2088000000 pending:2 unacked:32

Does not require passwordless SSH connection between the servers, but asks you to run configuration commands on both servers.

secondary node

root@ufm-ubuntu-secondary:~# ufm_ha_cluster config \
--role standby \
--local-primary-ip 172.19.23.167 \
--peer-primary-ip 172.19.23.173 \
--local-secondary-ip 172.31.32.113 \
--peer-secondary-ip 172.31.32.170 \
--hacluster-pwd 123456789 \
--virtual-ip 172.19.23.50

Node: 1
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.19.23.167]
   SECONDARY_IP:              [172.31.32.113]
Node: 2
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.19.23.173]
   SECONDARY_IP:              [172.31.32.170]
Virtual IP(v4):               [172.19.23.50]
Virtual IP(v6):               []
HACLUSTER_PWD:                [******]
PRIMARY_IP_TYPE:              [ipv4]
SECONDARY_IP_TYPE:            [ipv4]

DRBD_RESOURCE:                [ha_data]
DRBD_PORT:                    [7788]
DRBD_DISK:                    [/dev/sda3]
DRBD_DEVICE:                  [/dev/drbd0]
DATA_SYNC_DIR:                [/opt/ufm/files/]
DRBD_DATA_MODE:               [ordered]

SCRIPT_PRE_START:             []
SCRIPT_PRE_CONFIG:            []
SCRIPT_POST_CONFIG:           []
SCRIPT_POST_CLEANUP:          []
GENERAL_SETTINGS_DIR:         [/opt/ufm/files//ufm_ha]
GENERAL_UFM_CONF:             [/opt/ufm/files//conf/gv.cfg]
Setting Cluster Password
New password: Retype new password: passwd: password updated successfully
Checking filesystem on drbd disk
fsck from util-linux 2.39.3
Checking drbd kernel module is installed
Running pre-configuration checks...
pcsd service is running
stop drbd service
Enabling service: pcsd
Synchronizing state of pcsd.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pcsd
Enabling service: pacemaker
Synchronizing state of pacemaker.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pacemaker
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service  /usr/lib/systemd/system/pacemaker.service.
Enabling service: corosync
Synchronizing state of corosync.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync
Created symlink /etc/systemd/system/multi-user.target.wants/corosync.service  /usr/lib/systemd/system/corosync.service.
Disabling service ufm-ha-watcher, as it will be controlled by pacemaker!
Disabling service ufm-enterprise, as it will be controlled by pacemaker!
******************************
** Configuring DRBD Service
******************************
Set global drbd conf file
Creating drbd resources
Creating drbd resource /tmp/ha_data.res file
Copy drbd resource file
cp -f /tmp/ha_data.res /etc/drbd.d/
remove_ha_data_from_fstab
Reserve current node IP address: 172.19.23.167
[*] Configuring HA cluster done.
--------------------------------
To check HA cluster status please use: ufm_ha_cluster status

primary node

root@ufm-ubuntu-primary:~# ufm_ha_cluster config \
--role master \
--local-primary-ip 172.19.23.173 \
--peer-primary-ip 172.19.23.167 \
--local-secondary-ip 172.31.32.170 \
--peer-secondary-ip 172.31.32.113 \
--hacluster-pwd 123456789 \
--virtual-ip 172.19.23.50

Validate virtual IP: 172.19.23.50 (ipv4).
Check if IP address 172.19.23.173 is listening to pcsd's port number 2224.
Check if IP address 172.19.23.167 is listening to pcsd's port number 2224.
Check if IP address 172.31.32.170 is listening to pcsd's port number 2224.
Check if IP address 172.31.32.113 is listening to pcsd's port number 2224.
Node: 1
   NODE_ROLE:                 [master]
   PRIMARY_IP:                [172.19.23.173]
   SECONDARY_IP:              [172.31.32.170]
Node: 2
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.19.23.167]
   SECONDARY_IP:              [172.31.32.113]
Virtual IP(v4):               [172.19.23.50]
Virtual IP(v6):               []
HACLUSTER_PWD:                [******]
PRIMARY_IP_TYPE:              [ipv4]
SECONDARY_IP_TYPE:            [ipv4]

DRBD_RESOURCE:                [ha_data]
DRBD_PORT:                    [7788]
DRBD_DISK:                    [/dev/nvme0n1p3]
DRBD_DEVICE:                  [/dev/drbd0]
DATA_SYNC_DIR:                [/opt/ufm/files/]
DRBD_DATA_MODE:               [ordered]

SCRIPT_PRE_START:             []
SCRIPT_PRE_CONFIG:            []
SCRIPT_POST_CONFIG:           []
SCRIPT_POST_CLEANUP:          []
GENERAL_SETTINGS_DIR:         [/opt/ufm/files//ufm_ha]
GENERAL_UFM_CONF:             [/opt/ufm/files//conf/gv.cfg]
Creating HA Settings File
Done
Setting Cluster Password
New password: Retype new password: passwd: password updated successfully
Checking filesystem on drbd disk
fsck from util-linux 2.39.3
Checking drbd kernel module is installed
Running pre-configuration checks...
pcsd service is running
stop drbd service
Enabling service: pcsd
Synchronizing state of pcsd.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pcsd
Enabling service: pacemaker
Synchronizing state of pacemaker.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pacemaker
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service  /usr/lib/systemd/system/pacemaker.service.
Enabling service: corosync
Synchronizing state of corosync.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync
Created symlink /etc/systemd/system/multi-user.target.wants/corosync.service  /usr/lib/systemd/system/corosync.service.
Disabling service ufm-ha-watcher, as it will be controlled by pacemaker!
Disabling service ufm-enterprise, as it will be controlled by pacemaker!
******************************
** Configuring DRBD Service
******************************
Set global drbd conf file
Creating drbd resources
Creating drbd resource /tmp/ha_data.res file
Copy drbd resource file
cp -f /tmp/ha_data.res /etc/drbd.d/
******************************
** Cluster Nodes authentication
******************************
172.19.23.173: Authorized
172.19.23.167: Authorized
******************************
** Cluster Setup
******************************
Destroying cluster on hosts: '172.19.23.167', '172.19.23.173'...
172.19.23.173: Successfully destroyed cluster
172.19.23.167: Successfully destroyed cluster
Requesting remove 'pcsd settings' from '172.19.23.167', '172.19.23.173'
172.19.23.167: successful removal of the file 'pcsd settings'
172.19.23.173: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to '172.19.23.167', '172.19.23.173'
172.19.23.167: successful distribution of the file 'corosync authkey'
172.19.23.167: successful distribution of the file 'pacemaker authkey'
172.19.23.173: successful distribution of the file 'corosync authkey'
172.19.23.173: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to '172.19.23.167', '172.19.23.173'
172.19.23.167: successful distribution of the file 'corosync.conf'
172.19.23.173: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
******************************
** Starting Cluster on Nodes
******************************
172.19.23.173: Starting Cluster...
172.19.23.167: Starting Cluster...
172.19.23.173: Cluster Enabled
172.19.23.167: Cluster Enabled
******************************
** Setting Cluster Properties
******************************
Creating drbd pacemaker resources
******************************
** Creating DRBD pacemaker resources
******************************
CIB updated
Adding ha_data_drbd-clone ha_data_file_system (kind: Mandatory) (Options: first-action=promote then-action=start)
CIB updated
******************************
** Creating Virtual IPv4 Resource
******************************
vip resource on-fail = restart
******************************
** Adding Systemd Services
******************************
Adding Systemd Service: ufm-ha-watcher
Adding Systemd Service: ufm-enterprise
remove_ha_data_from_fstab
Reserve current node IP address: 172.19.23.173
[*] Configuring HA cluster done.
--------------------------------
To check HA cluster status please use: ufm_ha_cluster status
Once DRBD sync is completed, you can start HA cluster using: ufm_ha_cluster start

How to know the DRDB done

not done

DRBD_RESOURCE:                [ha_data]
DRBD_CONNECTIVITY:            [SyncSource]
DISK_STATE:                   [UpToDate]
DRBD_ROLE:                    [Primary]
PEER_DISK_STATE:              [Inconsistent]
PEER_DRBD_ROLE:               [Secondary]
DRBD Sync Status:
ha_data role:Primary suspended:no
    write-ordering:drain
  volume:0 minor:0 disk:UpToDate
      size:20970844 read:10574373 written:448164 al-writes:78 bm-writes:0
      upper-pending:1 lower-pending:0 al-suspended:no blocked:no
  peer connection:Connected role:Secondary congested:yes
    volume:0 replication:SyncSource peer-disk:Inconsistent done:51.38
        resync-suspended:no
        received:0 sent:10507388 out-of-sync:10195276 pending:3 unacked:64

done

DRBD_RESOURCE:                [ha_data]
DRBD_CONNECTIVITY:            [Connected]
DISK_STATE:                   [UpToDate]
DRBD_ROLE:                    [Primary]
PEER_DISK_STATE:              [UpToDate]
PEER_DRBD_ROLE:               [Secondary]
DRBD Sync Status:
ha_data role:Primary suspended:no
    write-ordering:drain
  volume:0 minor:0 disk:UpToDate
      size:20970844 read:20680569 written:466752 al-writes:83 bm-writes:0
      upper-pending:0 lower-pending:0 al-suspended:no blocked:no
  peer connection:Connected role:Secondary congested:no
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:0 sent:20678740 out-of-sync:0 pending:0 unacked:0

Requires passwordless SSH connection between the servers.

Passwordless for primary node

root@ufm-ubuntu-primary:~# ssh-keygen -t ed25519 -C "foo@bar.com" -f "$HOME/.ssh/id_ed25519" -N ""
root@ufm-ubuntu-primary:~# ssh-copy-id -i $HOME/.ssh/id_ed25519.pub root@172.31.32.113

Passwordless for secondary node

root@ufm-ubuntu-secondary:~# ssh-keygen -t ed25519 -C "foo@bar.com" -f "$HOME/.ssh/id_ed25519" -N ""
root@ufm-ubuntu-secondary:~# ssh-copy-id -i $HOME/.ssh/id_ed25519.pub root@172.31.32.170

primary node

root@ufm-ubuntu-primary:~# configure_ha_nodes.sh \
--cluster-password 12345678 \
--master-primary-ip 172.19.23.173 \
--standby-primary-ip 172.19.23.167 \
--master-secondary-ip 172.31.32.170 \
--standby-secondary-ip 172.31.32.113 \
--virtual-ip 172.19.23.50

===================
 Configure HA Nodes
===================
Master IP:                 172.19.23.173
Standby IP:                172.19.23.167
Master SECONDARY IP:       172.31.32.170
Standby SECONDARY IP:      172.31.32.113
Virtual IP:                172.19.23.50
Virtual IP6:
Ignore mgmt intf status:   false
Node: 1
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.19.23.167]
   SECONDARY_IP:              [172.31.32.113]
Node: 2
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.19.23.173]
   SECONDARY_IP:              [172.31.32.170]
Virtual IP(v4):               []
Virtual IP(v6):               []
HACLUSTER_PWD:                [******]
PRIMARY_IP_TYPE:              [ipv4]
SECONDARY_IP_TYPE:            [ipv4]

DRBD_RESOURCE:                [ha_data]
DRBD_PORT:                    [7788]
DRBD_DISK:                    [/dev/sda3]
DRBD_DEVICE:                  [/dev/drbd0]
DATA_SYNC_DIR:                [/opt/ufm/files/]
DRBD_DATA_MODE:               [ordered]

SCRIPT_PRE_START:             []
SCRIPT_PRE_CONFIG:            []
SCRIPT_POST_CONFIG:           []
SCRIPT_POST_CLEANUP:          []
GENERAL_SETTINGS_DIR:         [/opt/ufm/files//ufm_ha]
GENERAL_UFM_CONF:             [/opt/ufm/files//conf/gv.cfg]
Setting Cluster Password
New password: Retype new password: passwd: password updated successfully
Checking filesystem on drbd disk
fsck from util-linux 2.39.3
Checking drbd kernel module is installed
Running pre-configuration checks...
pcsd service is running
stop drbd service
Enabling service: pcsd
Synchronizing state of pcsd.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pcsd
Enabling service: pacemaker
Synchronizing state of pacemaker.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pacemaker
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service  /usr/lib/systemd/system/pacemaker.service.
Enabling service: corosync
Synchronizing state of corosync.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync
Created symlink /etc/systemd/system/multi-user.target.wants/corosync.service  /usr/lib/systemd/system/corosync.service.
Disabling service ufm-ha-watcher, as it will be controlled by pacemaker!
Disabling service ufm-enterprise, as it will be controlled by pacemaker!
******************************
** Configuring DRBD Service
******************************
Set global drbd conf file
Creating drbd resources
Creating drbd resource /tmp/ha_data.res file
drbd is already configured and no changes in drbd configuration.
remove_ha_data_from_fstab
Reserve current node IP address: 172.19.23.167
[*] Configuring HA cluster done.
--------------------------------
To check HA cluster status please use: ufm_ha_cluster status
Validate virtual IP: 172.19.23.50 (ipv4).
Check if IP address 172.19.23.173 is listening to pcsd's port number 2224.
Check if IP address 172.19.23.167 is listening to pcsd's port number 2224.
Check if IP address 172.31.32.170 is listening to pcsd's port number 2224.
Check if IP address 172.31.32.113 is listening to pcsd's port number 2224.
Node: 1
   NODE_ROLE:                 [master]
   PRIMARY_IP:                [172.19.23.173]
   SECONDARY_IP:              [172.31.32.170]
Node: 2
   NODE_ROLE:                 [standby]
   PRIMARY_IP:                [172.19.23.167]
   SECONDARY_IP:              [172.31.32.113]
Virtual IP(v4):               [172.19.23.50]
Virtual IP(v6):               []
HACLUSTER_PWD:                [******]
PRIMARY_IP_TYPE:              [ipv4]
SECONDARY_IP_TYPE:            [ipv4]

DRBD_RESOURCE:                [ha_data]
DRBD_PORT:                    [7788]
DRBD_DISK:                    [/dev/nvme0n1p3]
DRBD_DEVICE:                  [/dev/drbd0]
DATA_SYNC_DIR:                [/opt/ufm/files/]
DRBD_DATA_MODE:               [ordered]

SCRIPT_PRE_START:             []
SCRIPT_PRE_CONFIG:            []
SCRIPT_POST_CONFIG:           []
SCRIPT_POST_CLEANUP:          []
GENERAL_SETTINGS_DIR:         [/opt/ufm/files//ufm_ha]
GENERAL_UFM_CONF:             [/opt/ufm/files//conf/gv.cfg]
Creating HA Settings File
Done
Setting Cluster Password
New password: Retype new password: passwd: password updated successfully
Checking filesystem on drbd disk
fsck from util-linux 2.39.3
Checking drbd kernel module is installed
Running pre-configuration checks...
pcsd service is running
stop drbd service
Enabling service: pcsd
Synchronizing state of pcsd.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pcsd
Enabling service: pacemaker
Synchronizing state of pacemaker.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable pacemaker
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service  /usr/lib/systemd/system/pacemaker.service.
Enabling service: corosync
Synchronizing state of corosync.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync
Created symlink /etc/systemd/system/multi-user.target.wants/corosync.service  /usr/lib/systemd/system/corosync.service.
Disabling service ufm-ha-watcher, as it will be controlled by pacemaker!
Disabling service ufm-enterprise, as it will be controlled by pacemaker!
******************************
** Configuring DRBD Service
******************************
Set global drbd conf file
Creating drbd resources
Creating drbd resource /tmp/ha_data.res file
drbd is already configured and no changes in drbd configuration.
******************************
** Cluster Nodes authentication
******************************
172.19.23.173: Authorized
172.19.23.167: Authorized
******************************
** Cluster Setup
******************************
Destroying cluster on hosts: '172.19.23.167', '172.19.23.173'...
172.19.23.173: Successfully destroyed cluster
172.19.23.167: Successfully destroyed cluster
Requesting remove 'pcsd settings' from '172.19.23.167', '172.19.23.173'
172.19.23.167: successful removal of the file 'pcsd settings'
172.19.23.173: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to '172.19.23.167', '172.19.23.173'
172.19.23.167: successful distribution of the file 'corosync authkey'
172.19.23.167: successful distribution of the file 'pacemaker authkey'
172.19.23.173: successful distribution of the file 'corosync authkey'
172.19.23.173: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to '172.19.23.167', '172.19.23.173'
172.19.23.173: successful distribution of the file 'corosync.conf'
172.19.23.167: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
******************************
** Starting Cluster on Nodes
******************************
172.19.23.173: Starting Cluster...
172.19.23.167: Starting Cluster...
172.19.23.173: Cluster Enabled
172.19.23.167: Cluster Enabled
******************************
** Setting Cluster Properties
******************************
Creating drbd pacemaker resources
******************************
** Creating DRBD pacemaker resources
******************************
CIB updated
Adding ha_data_drbd-clone ha_data_file_system (kind: Mandatory) (Options: first-action=promote then-action=start)
CIB updated
******************************
** Creating Virtual IPv4 Resource
******************************
vip resource on-fail = restart
******************************
** Adding Systemd Services
******************************
Adding Systemd Service: ufm-ha-watcher
Adding Systemd Service: ufm-enterprise
remove_ha_data_from_fstab
Reserve current node IP address: 172.19.23.173
[*] Configuring HA cluster done.
--------------------------------
To check HA cluster status please use: ufm_ha_cluster status
Once DRBD sync is completed, you can start HA cluster using: ufm_ha_cluster start

start

foo@ufm-ubuntu-primary:~$ sudo ufm_ha_cluster start
Cleaned up all resources on all nodes
enabling resource ufm-ha-watcher
enabling resource ufm-enterprise
foo@ufm-ubuntu-primary:~$