Wednesday, December 17, 2008

Install RHEL 5.1u2 + heartbeat + DRBD

Installation will be done on two IBM x3650 servers with integrated RSA II slimline adapters. I started the installation after changing some parameters in BIOS (OS type changed to Linux OS, BMC serial port sharing disabled). After booting I started install with this parameters: linux pci=nommconf nophet

I setup disks through ServerRAID adapter software to /dev/sda as RAID 1 out of 2 volumes and /dev/sdb as RAID 5 out of remaining 4 volumes. Linux will be installed on /dev/sda and /dev/sdb will be shared.

Don't forget to connect eth2 adapters with crossover cable and serial ports with null modem cable. I then changed network settings for production ethernet to 100Mbit/s full duplex and crossover cable to 1000Mbit/s full duplex (100Mbit/s was a requirement from the customer).

Example files from server1 are below.


ETHTOOL_OPTS="speed 100 duplex full autoneg off"


ETHTOOL_OPTS="speed 1000 duplex full autoneg off"

/etc/hosts needs to be changed accordingly: server1 server2 serviceIP server1repl server2repl s1rsa s2rsa

You should install additional RPMs:

  • libnet-

  • perl-Net-SSLeay-1.30-4.fc6.x86_64.rpm

  • perl-TimeDate-1.16-5.el5.noarch.rpm

  • heartbeat-2.1.4-2.1.x86_64.rpm

  • heartbeat-devel-2.1.4-2.1.x86_64.rpm

  • heartbeat-pils-2.1.4-2.1.x86_64.rpm

  • heartbeat-stonith-2.1.4-2.1.x86_64.rpm

  • drbd82-8.2.6-1.el5.centos.x86_64.rpm

  • kmod-drbd82-8.2.6-

I manually created haclient group and hacluster user:

# groupadd -g 496 haclient
# useradd -M -g haclient -u 498 -d /var/lib/heartbeat/cores/hacluster hacluster

This install should be done on both server1 and server2. After installing DRBD you'll have to start it which will make /proc/drbd file available. I'll put config files a little bit later.

Create meta data on both machines:

[root@server1]# drbdadm create-md r0
[root@server2]# drbdadm create-md r0

Start DRBD on both machines:

[root@server1]# /etc/init.d/drbd start
[root@server2]# /etc/init.d/drbd start

Now we can check the status:

[root@server1]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-06-21 08:48:13
0: cs:Connected st:Secondary/Secondary ds:Inconsistent/Inconsistent C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:425709436

They both are secondary and inconsistent. Let's make server1 our primary:

[root@server1]# drbdsetup /dev/drbd0 primary -o
[root@server1]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-06-21 08:48:13
0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent Cr---
ns:787924 nr:0 dw:0 dr:795968 al:0 bm:48 lo:2 pe:4 ua:253 ap:0 oos:424921628
[>....................] sync'ed: 0.2% (414962/415731)M
finish: 14:45:15 speed: 7,648 (7,800) K/sec

This will take some time to synchronise. If you reboot the machine after synchronisation they will both be Secondary but UpToDate so change one of them to be a master with:

[root@server1]# drbdadm primary r0

Now that one of them is the master, make a filesystem:

[root@server1]# mke2fs -j /dev/drbd0

and add it to /etc/fstab on both machines:

/dev/drbd0 /u ext3 defaults,noauto 0 0

Now you have a working DRBD configuration and you can mount /u filesystem on the primary node. This is the DRBD config file /etc/drbd.conf:

global {
usage-count no;
common {
syncer { rate 30M; }
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib64/heartbeat/drbd-peer-outdater -t 5";
startup {
degr-wfc-timeout 360; # 6 minutes.
disk {
on-io-error detach;
net {
timeout 60; # 6 seconds (unit = 0.1 seconds)
connect-int 10; # 10 seconds (unit = 1 second)
ping-int 10; # 10 seconds (unit = 1 second)
ping-timeout 5; # 500 ms (unit = 0.1 seconds)
max-buffers 2048;
unplug-watermark 128;
max-epoch-size 2048;
ko-count 4;
cram-hmac-alg "sha1";
shared-secret "SomeSecret";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
data-integrity-alg "md5";
syncer {
rate 30M;
al-extents 257;
on server1 {
device /dev/drbd0;
disk /dev/sdb1;
flexible-meta-disk internal;

on server2 {
device /dev/drbd0;
disk /dev/sdb1;
meta-disk internal;

To configure heartbeat first unmount the newly created filesystem and make /dev/drbd0 secondary on both nodes. Now go to /etc/ha.d directory and make a config file:

logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
baud 19200
serial /dev/ttyS0 # Linux
bcast eth0 eth2 # Linux
auto_failback off
stonith external/ibmrsa-telnet /etc/ha.d/stonith.ibmrsa
node server1
node server2
respawn hacluster /usr/lib64/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

All you need now is some more files...

/etc/ha.d/authkeys MUST be the same on both machines:

auth 1
1 sha1 letsmakeitsecret

/etc/ha.d/haresources MUST be the same on both machines:

server1 drbddisk::r0 Filesystem::/dev/drbd0::/u::ext3

/etc/ha.d/stonith.ibmrsa on server1 (user and password are still default):


/etc/ha.d/stonith.ibmrsa on server2 (user and password are still default):


You can now start heartbeat on both nodes:

[root@server1]# /etc/init.d/heartbeat start
[root@server2]# /etc/init.d/heartbeat start

You can check what's going on by tailing /var/log/messages and soon you'll have eth0:0 alias with IP and /u filesystem mounted.

You should now do some testing, try at least this:

  1. turn the primary node off with holding the power switch

    • takeover happens ok with STONITH turning primary node back on

  1. unplug the power cables from primary node

    • no takeover on this environment since RSA adapter is also without power and heartbeat will not takeover without assurance from STONITH that the node is down! Just make sure both nodes have redundant power supplies and everything will be ok

  1. unplug production ethernet from eth0 adapter on primary node

    • takeover happens ok without resetting primary machine with STONITH

To return resources back to the primary machine you should turn it on (if it isn't already on) and wait for DRBD to synchronise the disks. (DRBD should be set to start automatically during the boot)

[root@server1]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-06-21 08:48:13
0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
ns:787924 nr:0 dw:0 dr:795968 al:0 bm:48 lo:2 pe:4 ua:253 ap:0 oos:424921628

When the disks are synchronised start heartbeat on server1 and wait for cluster to stabilise.

[root@server1]# /etc/init.d/heartbeat start

Now you have both machines up and running but resources are still on server2. To get them back to server1 turn heartbeat off on server2, wait for resources to get back and then turn it back on.

[root@server2]# /etc/init.d/heartbeat stop

Wait for resources to move back...

[root@server2]# /etc/init.d/heartbeat start

No comments:

Post a Comment