Cluster 12. Configuring the Cluster Foundation.
Time sync
Firs we need to keep time in sync between both nodes. Install ntp daemon, set time from pool,ntp,org, start ntp daemon:
yum install ntp -y
ntpdate pool.ntp.org
systemctl -l enable ntpd.service
systemctl -l start ntpd.service
Set proper timezone:
timedatectl list-timezones | grep Baku #you must select proper city
timedatectl set-timezone Asia/Baku
Verify:
timedatectl
ntpdate pool.ntp.org
systemctl -l enable ntpd.service
systemctl -l start ntpd.service
Set proper timezone:
timedatectl list-timezones | grep Baku #you must select proper city
timedatectl set-timezone Asia/Baku
Verify:
timedatectl
Packages
Needed packages:
Corosync - manages cluster communication, quorum and membership, uses totem protocol for "heartbiting". Prior to CentOS 7 crosync itself only cared about who is a cluster member and making sure all members get all totem messages. What happens after the cluster reforms was up to the cluster manager (cman), and the resource group manager (rgmanager). In CentOS 7 cman work (mainly quorum - assign quorum votes and control them) is done by corosync and the rgmanager work is done by pacemaker. (To be frankly - cman work is now done by votequorum part of the corosync).
Pacemaker - cluster resource manager (pcsd is it's daemon)
pcs - (ccs in CentOS 6) - CentOS 7 command line configuration utility
psmisc - contains utilities for managing processes on your system: pstree, killall, and fuser.
policycoreutils-python - contains the core utilities that are required for the basic operation of a Security-Enhanced Linux (SELinux) system and its policies.
fence-agents - provides various agents for fencing (ipmi fence, cisco fence etc.)
dlm - Distributed Lock Manager
Install needed packages:
I prefer to use bash autocomplete: bash-autocomplete-setup-link
yum install -y corosync pacemaker pcs psmisc policycoreutils-python fence-agents dlm
Needed packages:
Corosync - manages cluster communication, quorum and membership, uses totem protocol for "heartbiting". Prior to CentOS 7 crosync itself only cared about who is a cluster member and making sure all members get all totem messages. What happens after the cluster reforms was up to the cluster manager (cman), and the resource group manager (rgmanager). In CentOS 7 cman work (mainly quorum - assign quorum votes and control them) is done by corosync and the rgmanager work is done by pacemaker. (To be frankly - cman work is now done by votequorum part of the corosync).
Pacemaker - cluster resource manager (pcsd is it's daemon)
pcs - (ccs in CentOS 6) - CentOS 7 command line configuration utility
psmisc - contains utilities for managing processes on your system: pstree, killall, and fuser.
policycoreutils-python - contains the core utilities that are required for the basic operation of a Security-Enhanced Linux (SELinux) system and its policies.
fence-agents - provides various agents for fencing (ipmi fence, cisco fence etc.)
dlm - Distributed Lock Manager
Install needed packages:
I prefer to use bash autocomplete: bash-autocomplete-setup-link
yum install -y corosync pacemaker pcs psmisc policycoreutils-python fence-agents dlm
Setup initial cluster
Related to pcsd.service:
systemctl -l enable pcsd.service
systemctl -l start pcsd.service
systemctl -l status pcsd.service #pcsd service automatically starts corosync and pacemaker services when needed
Setup password for cluster user (CentOS 6 cluster user was ricci)
passwd hacluster
Setup and verify firewall:
Verify if firewalld is active:
firewall-cmd --state
firewall-cmd --permanent --add-service=high-availability # view /usr/lib/firewalld/services/high-availability.xml to see which port are in high-availability service
firewall-cmd --reload
firewall-cmd --list-all
Ports listed in /usr/lib/firewalld/services/high-availability.xml :
tcp 2224 - PCSD Web UI (High Availability Web Management)
tcp 3121 - Pacemaker Remote
tcp 5403 - needed for corosync-qnetd
udp 5404 - totem protocol multicast
udp 5405 - totem protocol
tcp 21064 - DLM
Login to any of the cluster node and authenticate “hacluster” user.
We must use names from /etc/hosts which are resolved to our bcn-bond1 addresses (this step automatically setups Corosync authentication):
pcs cluster auth agrp-c01n01 agrp-c01n02 #username is hacluster
To change hacluster password:
passwd hacluster
pcs cluster auth agrp-c01n01 agrp-c01n02 --force # "--force" will force authentication even node is already authenticated
To verify:
cat /var/lib/pcsd/tokens # both nodes info must be here
Create new cluster named agrp-c01 (this step automatically setups Corosync cluster membership & also synchronizes configuration):
pcs cluster setup --name agrp-c01 agrp-c01n01 agrp-c01n02 --transport # udpu transport - UDP Unicast
After all executed steps new file is created:
vi /etc/corosync/corosync.conf
secauth: off attribute. This controls whether the cluster communications are encrypted or not. We can safely disable this because we're working on a known-private network, which yields two benefits; It's simpler to setup and it's a lot faster. If you must encrypt the cluster communications, then you can do so here.
pcs cluster start --all # --all option will start cluster on all nodes & also will start corosync.service and pacemaker.service.
If you want corosync and pacemaker to start automatically on boot:
pcs cluster enable --all # this tutorial doesn't use this setting
Check cluster is functioning properly (on both nodes):
systemctl status corosync # must be active/disabled without any error
Use corosync-cfgtool -s to check whether cluster communication is happy (output must have local node's proper IP in id and "no faults" in status):
Printing ring status.
Local node ID 2
RING ID 0
id = 10.10.53.2
status = ring 0 active with no faults
Next, check the membership and quorum APIs (both nodes must join the cluster):
corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.10.53.1)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.10.53.2)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Verify that corosync uses no-multicats (udpu transport - UDP Unicast):
corosync-cmapctl | grep transport
totem.transport (str) = udpu
pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 agrp-c01n01 (local)
2 1 agrp-c01n02
systemctl status pacemaker # must be active/disabled without any error
Verify that all Pacemaker daemons (pacemaker itself + 6 daemons) are loaded:
ps axf | grep pacemaker
7588 ? Ss 0:00 /usr/sbin/pacemakerd -f
7589 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib
7590 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
7591 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
7592 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
7593 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
7594 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
Finally check cluster overall status:
pcs status # both nodes must be online (it can take several minutes to become online)
Finally, ensure there are no startup errors (aside from messages relating to not having STONITH
configured, which are OK at this point):
journalctl | grep -i error
DC - this string is seen in the "pcs status" output
Designated Coordinator (DC) One CRM in the cluster is elected as Designated Coordinator (DC). The DC is the only entity in the cluster that can decide that a cluster wide change needs to be performed, such as fencing a node or moving resources around. The DC is also the node where the master copy of the CIB is kept. All other nodes get their configuration and resource allocation information from the current DC. The DC is elected from all nodes in the cluster after a membership change (lost nodes etc.).
Quorum:
The votequorum service is part of the corosync project. This service can be optionally loaded into the nodes of a corosync cluster to avoid split-brain situations. It does this by having a number of votes assigned to each system in the cluster and ensuring that only when a majority of the votes are present, cluster operations are allowed to proceed. The service must be loaded into all nodes or none. If it is loaded into a subset of cluster nodes the results will be unpredictable. The following corosync.conf extract will enable votequorum service within corosync:
quorum { provider: corosync_votequorum } # verify with pcs cluster corosync | grep provider
votequorum reads its configuration from corosync.conf. Some values can be changed at runtime, others are only read at corosync startup. It is very important that those values are consistent across all the nodes participating in the cluster or votequorum behavior will be unpredictable.
The "two node cluster" is a use case that requires special consideration. With a standard two node cluster, each node with a single vote, there are 2 votes in the cluster. Using the simple majority calculation (50% of the votes + 1) to calculate quorum, the quorum would be 2. This means that the both nodes would always have to be alive for the cluster to be quorate and operate. Enabling two_node: 1, quorum is set artificially to 1. So simply saying, with two_node=1 one node will continue when the other node fails. The way it works is that in the event of a network outage both nodes race in an attempt to fence each other and the first to succeed continues in the cluster. The system administrator can also associate a delay with a fencing agent so that one node can be given priority in this situation so that it always wins the race. Also this delay will help to escape fence-looping.
pcs cluster corosync | grep two_node
and
pcs quorum | grep "Flags\|Quorum" # it's like "Flags: 2Node" and "Quorum: 1"
two_node=1 requires expected_votes is set to 2 (pcs quorum status | grep Expected votes) and it's automatically set to this value (2) automatically when two node cluster is set up. Also this setting (two_node=1) considering that you have proper fencing setup -
wait_for_all (pcs quorum | grep flags # it's like WaitForAll) - When enabled, the cluster will be quorate for the FIRST TIME only after all nodes have been visible at least once at the same time. The wait_for_all option is automatically enabled when a cluster has two nodes, does not use a quorum device, and auto_tie_breaker is disabled. You can override this by explicitly setting wait_for_all to 0 but in two-node cluster this is not recommended.
auto_tie_breaker When enabled, the cluster can suffer up to 50% of the nodes failing at the same time, in a deterministic fashion. auto_tie_breaker is not compatible with two_node as both are systems for determining what happens should there be an even split of nodes. If you have both enabled, then an error message will be issued and two_node will be disabled.
You can verify wait_for_all setting effect:
pcs cluster stop --all
pcs cluster start
pcs quorum status | grep "Qurum:" # you'll get "Quorum: Activity blocked"
Now:
pcs cluster start --all
pcs quorum status | grep "Qurum:" # you'll get "Quorum: 1"
pcs cluster stop here_name_of_the_other_node
pcs quorum status | grep "Qurum:" # you'll get "Quorum: 1"
So as you see - as expected, wait_for_all only needs all nodes to be online when starting the first time.
How to start nodes when you know the cluster is inquorate, but you are confident that the cluster should proceed with resource management regardless. It can be when one node is powered off and the other node didn't start cluster before first node power off. But you must be sure that the other node doesn't have access to the resources:
pcs quorum unblock # this disables wait_for_all option and then re-enables it
Location
Related to pcsd.service:
systemctl -l enable pcsd.service
systemctl -l start pcsd.service
systemctl -l status pcsd.service #pcsd service automatically starts corosync and pacemaker services when needed
Setup password for cluster user (CentOS 6 cluster user was ricci)
passwd hacluster
Setup and verify firewall:
Verify if firewalld is active:
firewall-cmd --state
firewall-cmd --permanent --add-service=high-availability # view /usr/lib/firewalld/services/high-availability.xml to see which port are in high-availability service
firewall-cmd --reload
firewall-cmd --list-all
Ports listed in /usr/lib/firewalld/services/high-availability.xml :
tcp 2224 - PCSD Web UI (High Availability Web Management)tcp 3121 - Pacemaker Remote
tcp 5403 - needed for corosync-qnetd
udp 5404 - totem protocol multicast
udp 5405 - totem protocol
tcp 21064 - DLM
Login to any of the cluster node and authenticate “hacluster” user.
We must use names from /etc/hosts which are resolved to our bcn-bond1 addresses (this step automatically setups Corosync authentication):
pcs cluster auth agrp-c01n01 agrp-c01n02 #username is hacluster
To change hacluster password:
passwd hacluster
pcs cluster auth agrp-c01n01 agrp-c01n02 --force # "--force" will force authentication even node is already authenticated
To verify:
cat /var/lib/pcsd/tokens # both nodes info must be here
Create new cluster named agrp-c01 (this step automatically setups Corosync cluster membership & also synchronizes configuration):
pcs cluster setup --name agrp-c01 agrp-c01n01 agrp-c01n02 --transport # udpu transport - UDP Unicast
After all executed steps new file is created:
vi /etc/corosync/corosync.conf
secauth: off attribute. This controls whether the cluster communications are encrypted or not. We can safely disable this because we're working on a known-private network, which yields two benefits; It's simpler to setup and it's a lot faster. If you must encrypt the cluster communications, then you can do so here.
pcs cluster start --all # --all option will start cluster on all nodes & also will start corosync.service and pacemaker.service.
If you want corosync and pacemaker to start automatically on boot:
pcs cluster enable --all # this tutorial doesn't use this setting
Check cluster is functioning properly (on both nodes):
systemctl status corosync # must be active/disabled without any error
Use corosync-cfgtool -s to check whether cluster communication is happy (output must have local node's proper IP in id and "no faults" in status):
Printing ring status.
Local node ID 2
RING ID 0
id = 10.10.53.2
status = ring 0 active with no faults
Next, check the membership and quorum APIs (both nodes must join the cluster):
corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.10.53.1)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.10.53.2)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Verify that corosync uses no-multicats (udpu transport - UDP Unicast):
corosync-cmapctl | grep transport
totem.transport (str) = udpu
pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 agrp-c01n01 (local)
2 1 agrp-c01n02
systemctl status pacemaker # must be active/disabled without any error
Verify that all Pacemaker daemons (pacemaker itself + 6 daemons) are loaded:
ps axf | grep pacemaker
7588 ? Ss 0:00 /usr/sbin/pacemakerd -f
7589 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib
7590 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
7591 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
7592 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
7593 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
7594 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
Finally check cluster overall status:
pcs status # both nodes must be online (it can take several minutes to become online)
Finally, ensure there are no startup errors (aside from messages relating to not having STONITH
configured, which are OK at this point):
journalctl | grep -i error
DC - this string is seen in the "pcs status" output
Designated Coordinator (DC) One CRM in the cluster is elected as Designated Coordinator (DC). The DC is the only entity in the cluster that can decide that a cluster wide change needs to be performed, such as fencing a node or moving resources around. The DC is also the node where the master copy of the CIB is kept. All other nodes get their configuration and resource allocation information from the current DC. The DC is elected from all nodes in the cluster after a membership change (lost nodes etc.).
The votequorum service is part of the corosync project. This service can be optionally loaded into the nodes of a corosync cluster to avoid split-brain situations. It does this by having a number of votes assigned to each system in the cluster and ensuring that only when a majority of the votes are present, cluster operations are allowed to proceed. The service must be loaded into all nodes or none. If it is loaded into a subset of cluster nodes the results will be unpredictable. The following corosync.conf extract will enable votequorum service within corosync:
quorum { provider: corosync_votequorum } # verify with pcs cluster corosync | grep provider
votequorum reads its configuration from corosync.conf. Some values can be changed at runtime, others are only read at corosync startup. It is very important that those values are consistent across all the nodes participating in the cluster or votequorum behavior will be unpredictable.
The "two node cluster" is a use case that requires special consideration. With a standard two node cluster, each node with a single vote, there are 2 votes in the cluster. Using the simple majority calculation (50% of the votes + 1) to calculate quorum, the quorum would be 2. This means that the both nodes would always have to be alive for the cluster to be quorate and operate. Enabling two_node: 1, quorum is set artificially to 1. So simply saying, with two_node=1 one node will continue when the other node fails. The way it works is that in the event of a network outage both nodes race in an attempt to fence each other and the first to succeed continues in the cluster. The system administrator can also associate a delay with a fencing agent so that one node can be given priority in this situation so that it always wins the race. Also this delay will help to escape fence-looping.
pcs cluster corosync | grep two_node
and
pcs quorum | grep "Flags\|Quorum" # it's like "Flags: 2Node" and "Quorum: 1"
two_node=1 requires expected_votes is set to 2 (pcs quorum status | grep Expected votes) and it's automatically set to this value (2) automatically when two node cluster is set up. Also this setting (two_node=1) considering that you have proper fencing setup -
wait_for_all (pcs quorum | grep flags # it's like WaitForAll) - When enabled, the cluster will be quorate for the FIRST TIME only after all nodes have been visible at least once at the same time. The wait_for_all option is automatically enabled when a cluster has two nodes, does not use a quorum device, and auto_tie_breaker is disabled. You can override this by explicitly setting wait_for_all to 0 but in two-node cluster this is not recommended.
auto_tie_breaker When enabled, the cluster can suffer up to 50% of the nodes failing at the same time, in a deterministic fashion. auto_tie_breaker is not compatible with two_node as both are systems for determining what happens should there be an even split of nodes. If you have both enabled, then an error message will be issued and two_node will be disabled.
You can verify wait_for_all setting effect:
pcs cluster stop --all
pcs cluster start
pcs quorum status | grep "Qurum:" # you'll get "Quorum: Activity blocked"
Now:
pcs cluster start --all
pcs quorum status | grep "Qurum:" # you'll get "Quorum: 1"
pcs cluster stop here_name_of_the_other_node
pcs quorum status | grep "Qurum:" # you'll get "Quorum: 1"
So as you see - as expected, wait_for_all only needs all nodes to be online when starting the first time.
How to start nodes when you know the cluster is inquorate, but you are confident that the cluster should proceed with resource management regardless. It can be when one node is powered off and the other node didn't start cluster before first node power off. But you must be sure that the other node doesn't have access to the resources:
pcs quorum unblock # this disables wait_for_all option and then re-enables it
auto_tie_breaker When enabled, the cluster can suffer up to 50% of the nodes failing at the same time, in a deterministic fashion. auto_tie_breaker is not compatible with two_node as both are systems for determining what happens should there be an even split of nodes. If you have both enabled, then an error message will be issued and two_node will be disabled.
You can verify wait_for_all setting effect:
pcs cluster stop --all
pcs cluster start
pcs quorum status | grep "Qurum:" # you'll get "Quorum: Activity blocked"
Now:
pcs cluster start --all
pcs quorum status | grep "Qurum:" # you'll get "Quorum: 1"
pcs cluster stop here_name_of_the_other_node
pcs quorum status | grep "Qurum:" # you'll get "Quorum: 1"
So as you see - as expected, wait_for_all only needs all nodes to be online when starting the first time.
How to start nodes when you know the cluster is inquorate, but you are confident that the cluster should proceed with resource management regardless. It can be when one node is powered off and the other node didn't start cluster before first node power off. But you must be sure that the other node doesn't have access to the resources:
pcs quorum unblock # this disables wait_for_all option and then re-enables it
Location
Overall cluster configuration in xml format can be retrieved:
pcs cluster cib scope=configuration # cluster XML dump file, where scope is one from the list: configuration, crm_config, nodes, resources, constraints, status
Note: By default symmetric-cluster is created, meaning all resources can run anywhere:
There are two alternative strategies. One way is to say that, by default, resources can run anywhere, and then the location constraints specify nodes that are not allowed (an opt-out cluster). The other way is to start with nothing able to run anywhere, and use location constraints to selectively enable allowed nodes (an opt-in cluster).
Overall cluster configuration in xml format can be retrieved:
pcs cluster cib scope=configuration # cluster XML dump file, where scope is one from the list: configuration, crm_config, nodes, resources, constraints, status
Note: By default symmetric-cluster is created, meaning all resources can run anywhere:
There are two alternative strategies. One way is to say that, by default, resources can run anywhere, and then the location constraints specify nodes that are not allowed (an opt-out cluster). The other way is to start with nothing able to run anywhere, and use location constraints to selectively enable allowed nodes (an opt-in cluster).
Destroying cluster
If something went wrong, you can destroy cluster and all it's configuration:
pcs cluster destroy --all
rm -rf /var/lib/pacemaker
rm -rf /var/lib/pcsd
rm -rf /etc/corosync
This tutorials were used to understand and setup clustering:
AN!Cluster
unixarena
people.redhat.com
If something went wrong, you can destroy cluster and all it's configuration:
pcs cluster destroy --all
rm -rf /var/lib/pacemaker
rm -rf /var/lib/pcsd
rm -rf /etc/corosync
pcs cluster destroy --all
rm -rf /var/lib/pacemaker
rm -rf /var/lib/pcsd
rm -rf /etc/corosync
This tutorials were used to understand and setup clustering:
AN!Cluster
unixarena
people.redhat.com