Tuesday, May 17, 2011

What is cluster ?

■ Requirement : Red Hat Cluster Suite
■ OS Environment : Linux[RHEL, Centos]
Clustering software : Red Hat Cluster suite
Storage : SCSI, SAN, NAS
■ Storage Protocols : iSCSI(pronounced "eye-scuzzy) /FCP

■ Resolution : 

cluster :  A cluster is two or more interconnected computers that create a solution to provide higher availability, higher scalability or both. The advantage of clustering computers for high availability is seen if one of these computers fails, another computer in the cluster can then assume the workload of the failed computer. Users of the system see no interruption of access.

iSCSI => protcol to connect server to storage over IP network. Needs iSCSI initiator util will be on source or server. iSCSI target util will be on storage/target machine.
FCP => fibre channel protocol to connect server to storage over optical channel. Here needs HBA(host bus adapter like NIC) cards. It driver accesses this HBA and HBA communicates to SAN switch/storage controller.(Drivers like qla2xxx of qlogic company, lpfc(of emulex) etc)

Concepts: iSCSI is a protocol whereas SCSI is storage disk. consists using initiator(software+ hardware) and target. Initiator send packet to HBA. Target resides on storage like EqualLogic, NetApp filer, EMC NS-series or a HDS HNAS computer appliance. These attched with LUN to the drives. LUN is logical unit on storage treats as device or drive.

Storage System Connection Types :

a)active/active : all paths active all time
b)active‐passive : one path is active and other is backup
c)virtual port storage system.

Multipathing and Path Failover : When transferring data between the host server and storage, the SAN uses a multipathing technique where package "device-mapper-multipath" will have to be installed on server/node. The daemon "multipathd" periodically checks the connection paths to the storage. Multipathing allows you to have more than one physical path from the Server host to a LUN(treat is a device) on a storage system. If a path or any component along the path—HBA or NIC, cable, switch or switch port, or storage processor—fails, the server selects another of the available paths. The process of detecting a failed path and switching to another is called path failover.

Installation of Red Hat Cluster Suite on RHEL 5 :

1. Register system to RHN(Needs subscription with Red Het). Make sure  it subscribes storage channel. Packages comes with DVD too. If Ignore if system is already registered :


2. Use following command :

$ yum groupinstall clustering cluster-storage

To separately install it do :

  • For Standard kernel :

       $ yum cman cman-kernel dlm dlm-kernel magma magma-plugins system-config-cluster rgmanager ccs fence modcluster --force

  • For SMP kernel :

$ yum cman cman-kernel-smp dlm dlm-kernel-smp magma magma-plugins system-config-cluster rgmanager ccs fence modcluster --force

3. Configuring Red Hat Cluster Suite (This steps should be followed on each nodes ) :

Configuration can be achieved in three ways like :

a) Using web interface(Conga tools) ie ricci and luci.
Conga — This is a comprehensive user interface for installing, configuring, and managing Red Hat clusters, computers, and storage attached to clusters and computers.

a.  Install luci on any system which can connect to each nodes :

 $yum install luci 

b. Initialize luci like

$luci_admin init

c. Install ricci on each nodes like :

$yum install ricci -y

d. Then access A(where luci has installed) like : http://IPof_A:port. Note that you'll get the url when you'll execute

 $luci_admin init 

4. Different Clustered services (Ordered as per the manually starting queue):

a, ccsd, cman, fence, rgmanager.
b. If you use LVM with GFS : ccsd, cman, fence, clvmd, gfs, rgmanager.

5. Configuration file (will be same on each nodes):
      /etc/cluster/cluster.conf, /etc/sysconfig/cluster. While you'll configure it using web interface, it'll be automatically copied on each nodes. Make sure you have enabled all the ports in firewall or disabled the firewall on all nodes as well as on luci node.

6. Now login into LUCI web interface and create a new cluster and give a name. Then in this lcuster add each nodes one by one. In this cluster add one fail over domain like httpd.(Make sure you have installed the httpd on each nodes where all the configuration files are same.). I shall describe it later and show you the result of real fail over testing.

7. Shared Disk configure(Disk size minimum 10MB is enough) : Why it is needed ?

         AA) The shared partitions are used to hold cluster state information including "Cluster lock states", "Service states", "Configuration information". The shared disk may be on any node or on storage disk( will be connected to HBA, RAID controller(raid 1 ie mirror). This will be for shared disk(primary partition+shadow). Each minimum 10MB. Two raw devices on shared disk storage must be created for the primary shared partition and the shadow shared partition. Each shared partition must have a minimum size of 10 MB. The amount of data in a shared partition is constant; it does not increase or decrease over time. Periodically, each member writes the state of its services to shared storage. In addition, the shared partitions contain a version of the cluster configuration file. This ensures that each member has a common view of the cluster configuration. If the primary shared partition is corrupted, the cluster members read the information from the shadow (or backup) shared partition and simultaneously repair the primary partition. Data consistency is maintained through checksums, and any inconsistencies between the partitions are automatically corrected. If a member is unable to write to both shared partitions at start-up time, it is not allowed to join the cluster. In addition, if an active member can no longer write to both shared partitions, the member removes itself from the cluster by rebooting (and may be remotely power cycled by a healthy member).

BB) The following are shared partition requirements:

a)Both partitions must have a minimum size of 10 MB.
b)Shared partitions must be raw devices since file cache won't be there. They cannot contain file systems.
c)Shared partitions can be used only for cluster state and configuration information.

CC) Following are recommended guidelines for configuring the shared partitions(By Red Hat):

a)It is strongly recommended to set up a RAID subsystem for shared storage, and use RAID 1 (mirroring) to make the logical unit that contains the shared partitions highly available. Optionally, parity RAID can be used for high availability. Do not use RAID 0 (striping) alone for shared partitions.
b)Place both shared partitions on the same RAID set, or on the same disk if RAID is not employed, because both shared partitions must be available for the cluster to run.
c)Do not put the shared partitions on a disk that contains heavily-accessed service data. If possible, locate the shared partitions on disks that contain service data that is rarely accessed.

DD) Make shared partitions and attach it with the cluster :

i) initialise quorum disk once in any node

$mkqdisk -c /dev/sdX -l myqdisk

ii)Add quorum disk to cluster at the backend(In web interface it can be done. Just login into luci interface and go to cluster. You'll see "Quorum Partition" tab. click on it and proceed further to configure it.) :

#expected votes =(nodes total votes + quorum disk votes)
#Health check result is written to quorum disk every 2 secs
#if health check fails over 5 tko, 10 (2*5) secs, the node is rebooted by quorum daemon
#Each heuristic check is run very 2 secs and earn 1 score,if shell script return is 0

Note : Need to manually copy this file on each node. But if you do in web interface, you don't need to manually cop. It'll automatically done.

b)Please increase the config_version by 1 and run ccs_tool update /etc/cluster/cluster.conf.
c) Check to verify that the quorum disk has been initialized correctly : #mkqdisk -L and clustat to check its availability.
d)Please note Total votes=quorum votes=5=2+3, if quorum disk vote is less than (node votes+1), the cluster wouldn’t have survived
e) Typically, the heuristics should be snippets of shell code or commands which help determine a node’s usefulness to the cluster or clients. Ideally, you want to add traces for all of your network paths (e.g. check links, or ping routers), and methods to detect availability of shared storage. Only one master is present at any one time in the cluster, regardless of how many partitions exist within the cluster itself. The master is elected by a simple voting scheme in which the lowest node which believes it is capable of running (i.e. scores high enough) bids for master status. If the other nodes agree, it becomes the master. This
algorithm is run whenever no master is present. Here it is "ping -c1 -t1". IP may be san ip/ other nodes' IP etc.

7. Configuring Cluster Daemons :
The Red Hat Cluster Manager provides the following daemons to monitor cluster operation:
cluquorumd — Quorum daemon
clusvcmgrd — Service manager daemon
clurmtabd — Synchronizes NFS mount entries in /var/lib/nfs/rmtab with a private copy on a service's mount point
clulockd — Global lock manager (the only client of this daemon is clusvcmgrd)
clumembd — Membership daemon

8. Configuring Storage : (Either SAN/NAS - using multipath or nfs)
In luci interface click on "add a system" and then go to storage tab and assign the storage in the cluster.

To start the cluster software on a member, type the following commands in this order:

1. service ccsd start
2. service lock_gulmd start or service cman start according to the type of lock manager used
3. service fenced start
4. service clvmd start
5. service gfs start, if you are using Red Hat GFS
6. service rgmanager start

To stop the cluster software on a member, type the following commands in this order:

1. service rgmanager stop
2. service gfs stop, if you are using Red Hat GFS
3. service clvmd stop
4. service fenced stop
5. service lock_gulmd stop or service cman stop according to the type of lock manager used
6. service ccsd stop

Stopping the cluster services on a member causes its services to fail over to an active member.

Testing failover domain (Making high availability):

Pre-configuration : Installed httpd on node 68 and 86.
Common home directory : /var/www/html

Configure httpd as failover domain in cluster (in luci): add failover domain > Add resources > Add service and allocate fail over domain and resource to this service.

1. First httpd_service was on 86(allotted resource is ip 67 to httpd(daemon:domain on cluster) )

ip :

[root@vm86 ~]# ip add list|grep inet
inet scope host lo
inet6 ::1/128 scope host
inet brd scope global eth0
inet scope global secondary eth0
inet6 fe80::216:3eff:fe74:8d56/64 scope link
inet brd scope global virbr0
inet6 fe80::200:ff:fe00:0/64 scope link
[root@vm86 ~]#

2. crashed 86 server ie made down it.

3. httpd service was up : relocated on 68 : Able to access page :

IP floated to 68 server : proof

[root@vm68 ~]# ip add list | grep inet
inet scope host lo
inet6 ::1/128 scope host
inet brd scope global eth0
inet scope global secondary eth0
inet6 fe80::216:3eff:fe74:8d44/64 scope link
inet brd scope global virbr0
inet6 fe80::20


  1. How to install Vertex (Clustering) for Linux Enterprise 6.0

  2. The comments were perfect & that helped a lot to understand iscsi in simple and finally able to configure successfully !!! Thanks kamal !!

  3. Thanks Mithran :) Stay tune for more comments..