This guide is dedicated to the creation of simple two-node HA cluster and iSCSI target setup with ZFS as storage basement.
Architecture
Hardware
Most common HW for such solution – two servers and one JBOD with SAS HBA connection.
A similar system can be built on Cluster-in-a-box class system – two servers with JBOD in one enclosure like we mention in the article about Shared DAS.
Such system serves as SUT (system under test) for this article.
Software
Software architecture is typical to build high availability (HA) clusters on GNU/Linux. Comments on total solution architecture:
Initial GNU/Linux setup
NOTE: All steps must be performed on both nodes.
Install GNU/Linus OS with OpenSSH on both nodes. This article provides a guide for RedHat 7, suitable for CentOS 7 and will work (with minor difference) on all other modern GNU/Linux distributions.
Networking
How to disable automatic network management (for true Linux gurus)
systemctl stop NetworkManager
systemctl disable NetworkManager
yum erase NetworkManager*
System has three physical 10Gbps interfaces:
Interface bonding is done via Team.
External interfaces:
# cat ifcfg-ens11f0
DEVICETYPE=TeamPort
BOOTPROTO=none
USERCTL=no
ONBOOT=no
TEAM_MASTER=team0
TEAM_PORT_CONFIG='{"prio":100}'
NAME="ens11f0"
UUID="704d85d9-7430-4d8f-b920-792263d192ba"
HWADDR="00:8C:FA:E5:6D:E0"
# cat ifcfg-ens11f1
DEVICETYPE=TeamPort
BOOTPROTO=none
USERCTL=no
ONBOOT=no
TEAM_MASTER=team0
TEAM_PORT_CONFIG='{"prio":100}'
NAME=ens11f1
UUID=4bd90873-9097-442a-8ac8-7971756b0fc5
HWADDR=00:8C:FA:E5:6D:E1
team0 interface:
# cat ./ifcfg-team0
DEVICE=team0
DEVICETYPE=Team
BOOTPROTO=static
USERCTL=no
ONBOOT=yes
IPADDR=10.3.254.64
NETMASK=255.255.255.0
GATEWAY=10.3.254.1
DNS1=192.168.10.107
TEAM_CONFIG='{"runner":{"name":"activebackup"},"link_watch":{"name":"ethtool"}}'
Internal:
# cat ./ifcfg-enp130s0
TYPE=Ethernet
BOOTPROTO=none
NAME=enp130s0
UUID=2933ee35-eb16-485e-b65c-e186d772b480
ONBOOT=yes
HWADDR=00:8C:FA:CE:56:DB
IPADDR=172.30.0.1
PREFIX=28
/etc/hosts
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.30.0.1 node1
172.30.0.2 node2
10.3.254.64 node1-ext
10.3.254.12 node2-ext
Notes
It is recommended to disable firewall for initial setup
systemctl stop firewalld
systemctl disable firewalld
Multipath
If you connect block device to SAS HBA card with more than single path (like in CiB with dual expander access) – you will have have double count in OS dev list. To work right we need multipathd daemon.
yum install device-mapper-multipath.x86_64
touch /etc/multipath.conf
systemctl start multipathd
systemctl enable multipathd
If everything is correct – than +multipath -l+
output will looks like:
35000c50077ad9a3f dm-6 SEAGATE ,ST2000NX0263
size=1.8T features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 0:0:0:0 sdb 8:16 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
`- 0:0:6:0 sdg 8:96 active undef running
35000c50077580317 dm-4 SEAGATE ,ST2000NX0273
size=1.8T features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 0:0:1:0 sdc 8:32 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
`- 0:0:7:0 sdh 8:112 active undef running
Actual drive location will be like:
/dev/mapper/35000c50077580317
/dev/mapper/35000c50077ad9287
/dev/mapper/35000c50077ad9a3f
/dev/mapper/35000c50077ad8aab
/dev/mapper/35000c50077ad92ef
ZFS
ZFS is used as alternative to MD+LVM pack. It decrease resource variety for heartbeat and simplify management. You can organize block access (iSCSI/FC) or file access (NFS), use SSD cache and dedupe function without additional tools and control from cluster side.
ZFS deployment on CentOS is simple – http://zfsonlinux.org/epel.html:
yum install ftp://ftp.yandex.ru/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
yum install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum install kernel-devel zfs
After deployment you need to create drive pool and volume. Pool is exported.
zpool create -o cachefile=none pool72 raidz1 /dev/mapper/35000c50077*
zfs create -s -V 1T pool72/vol1T
zpool export pool72
Pool will be accessible via iSCSI. It’s not possible to say what data and metadata will be stored on that pool, so better prevent udev from checking volumes on that pool:
cp /lib/udev/rules.d/60-persistent-storage.rules /etc/udev/rules.d/60-persistent-storage.sed -i '1s/^/KERNEL=="zd*" SUBSYSTEM=="block" GOTO="persistent_storage_end"\n/' /etc/systemctl restart systemd-udevd
iSCSI
yum install targetcli.noarch
Corosync+Pacemaker
Installation (on all cluster nodes):
Packages required:
yum install pcs fence-agents-all
pacemaker scenario to work with ZFS and iSCSI:
cd /usr/lib/ocf/resource.d/heartbeat/
wget https://github.com/skiselkov/stmf-ha/raw/master/heartbeat/ZFS
wget https://github.com/ClusterLabs/resource-agents/raw/master/heartbeat/iSCSITarget
wget https://github.com/ClusterLabs/resource-agents/raw/master/heartbeat/iSCSILogicalUnit
chmod a+x ./ZFS chmod a+x ./iSCSILogicalUnit
chmod a+x ./iSCSITarget
passwd hacluster
systemctl enable pcsd
systemctl enable corosync
systemctl enable pacemaker
systemctl start pcsd
Cluster creation (on any node)
Node authorization:
pcs cluster auth node1 node2
Cluster starts with two “rings” – on internal and external interfaces:
pcs cluster setup --start --name cib node1,node1-ext node2,node2-ext
pcs status
Cluster name: cib
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Thu Mar 12 14:38:37 2015
Last change: Thu Mar 12 14:38:24 2015 via crmd on node1
Current DC: NONE
2 Nodes configured
0 Resources configured
Node node1 (1): UNCLEAN (offline)
Node node2 (2): UNCLEAN (offline)
Full list of resources:
PCSD Status:
node1: Online
node2: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
WARNING: no stonith devices and stonith-enabled is not false means that STONITH resources are not installed. In our case, we will only “fence” storage subsystem with SCSI-3 Persistent Reservation. In short – in the case of “Split Brain” event, one node will be blocked from writing to storage. Resource – drives in ZFS pool:
pcs stonith create fence-pool72 fence_scsi \
devices="/dev/mapper/35000c50077580317, \
/dev/mapper/35000c50077ad8aab, \
/dev/mapper/35000c50077ad9287, \
/dev/mapper/35000c50077ad92ef, \
/dev/mapper/35000c50077ad9a3f" meta provides=unfencing
Our cluster is two-node and we have to ignore quorum policy:
pcs property set no-quorum-policy=ignore
Cluster resources
All resource serving storage pool are added into single group-pool72 group. Pool:
pcs resource create pool72 ZFS \
params pool="pool72" importargs="-d /dev/mapper/" \
op start timeout="90" op stop timeout="90" --group=group-pool72
iSCSI-target and LUN:
pcs resource create target-pool72 iSCSITarget \
portals="10.3.254.230" iqn="iqn.2005-05.com.etegro:cib.pool72" \
implementation="lio-t" --group group-pool72
pcs resource create lun1-pool72 iSCSILogicalUnit \
target_iqn="iqn.2005-05.com.etegro:cib.pool72" lun="1" \
path="/dev/pool72/vol1T" --group group-pool72
Need to point out that LIO target is used only because it’s default in CentOS. Discussion about target quality is not the main purpose of this material. IP-address:
pcs resource create ip-pool72 IPaddr2 \
ip="10.3.254.230" cidr_netmask=24 --group group-pool72
Order to start resources in group:
pcs constraint order pool72 then target-pool72
pcs constraint order target-pool72 then lun1-pool72
pcs constraint order lun1-pool72 then ip-pool72
Taoyuan city, Taiwan, 24th of June 2024. Netberg announced the new Aurora 721 100G and Aurora 421 10G switches, which feature programmable pipelines powered by Broadcom StrataXGS® Trident3 Ethernet switch chips.
Taoyuan city, Taiwan, January 24th, 2024. Netberg announced the release of two new models powered by the Broadcom StrataXGS® Trident3 series , the Netberg Aurora 221 1G switch and Aurora 621 25G switch.
Effective January 12, 2024: The following products are now End of Life (EOL) - Aurora 720 and Aurora 620.
Taoyuan city, Taiwan, December 20th, 2023. Netberg updates its Netberg SONiC distribution to release 2022.11 on Aurora 610, Aurora 710, and Aurora 750 P4-Programmable Intel Tofino IFP systems.
Taipei, Taiwan, 14th of November 2022. Netberg announced the new Aurora 810 400G model programmable switch with Intel Tofino 2 Intelligent Fabric Processors (IFPs) at its heart. The new platform has 32x 400G QSFP-DD Ethernet ports and a 12.8Tbps switching capacity.
Taipei, Taiwan, 24th of October 2022. Netberg participates in the new round of the Fast Forward Initiative by Intel (FFI'22). The program supports academic and research organizations today, aiming at accelerating tomorrow's best network programmability research.