Friday, April 27, 2018

Cluster 22. Testing Live-Migration, Overall Recovery and Fail-over

Setup Descripted

We have installed 4 VMs:
  • agrp-c01n01 is the primary node for vm01 & vm02 so these VMs may start on agrp-c01n02 only when agrp-c01n01 is unavailable
  • agrp-c01n02 is the primary node for vm03 & vm04 so these VMs may start on agrp-c01n01 only when agrp-c01n02 is unavailable

VM Live Migration tests (manual, on cluster stop, manual withdrawal)

Whenever you use a migrate command (pcs resource move), Pacemaker creates a permanent location constraint pinning the resource on that node. Something like:
pcs constraint --full | grep -E "Enabled.+vm[0-9]"
    Enabled on: agrp-c01n01 (score:INFINITY) (role: Started) (id:cli-prefer-vm02-www)
This is usually undesirable, to revoke this constraint once the resource migration has been completed or when node is restored issue the pcs recource clear vm02-www
The resource-stickiness will cause the resource to stay where it is, anyway. But by default:
pcs property list --defaults | grep stick
 default-resource-stickiness: 0
In this posts I prefer VM to automatically migrate to the primary node when it's available. If it is pointless for you, use resource-stickiness:
pcs resource update vm02-www meta resource-stickiness=INFINITY

Manual Live Migration

pcs status resources
virsh console vm02-www # from node agrp-c01n01
uptime # note the uptime
pcs resource move vm02-www agrp-c01n02 # moving vm02-www resource from agrp-c01n01 to the agrp-c01n02
virsh console vm02-www # from node agrp-c01n02
uptime # uptime must be equal to the previous result or more
pcs constraint --full | grep -E "Enabled.+Started.+vm[0-9]"
pcs recource clear vm02-www # from any node - this should cause vm02-www to migrate back to the agrp-c01n01

Automatic Live Migration on cluster stop (on one node)

pcs status resources
virsh console vm02-www # from node agrp-c01n01
uptime # note the uptime
pcs cluster stop # stopping cluster on agrp-c01n01
pcs status # now vm01 , vm02, vm03 and vm04 must be started on agrp-c01n02
virsh console vm02-www # from node agrp-c01n02
uptime # uptime must be equal to the previous result or more
pcs constraint --full | grep -E "Enabled.+vm[0-9]" # as you see no constraints are added, because migration is caused by cluster system itself due to agrp-c01n01 going offline
pcs cluster start # starting cluster on agrp-c01n01
pcs status # now vm01 and vm02 should migrate back to agrp-c01n01 automatically

Controlled Migration and Node Withdrowal

This steps must be repeated one time for each node (first agrp-c01n01 & then agrp-c01n02):
agrp-c01n01 withdrawal:
resources=$(pcs resource | grep -E  "VirtualDomain.+n01" | awk '{print $1}')
for resource in $resources; do pcs resource move $resource agrp-c01n02; done
pcs status
pcs cluster stop # on agrp-c01n01
systemctl poweroff # on agrp-c01n01
resources=$(pcs constraint --full | grep -E "Enabled.+Started.+vm[0-9]" | awk '{print $7}' | cut -d\- -f3,4 | cut -d\) -f 1)
for resource in $resources; do pcs resource clear $resource; done
power-on power-offed node
pcs cluster start # all VM should migrate to their original positions

agrp-c01n02 withdrawal:
resources=$(pcs resource | grep -E  "VirtualDomain.+n02" | awk '{print $1}')
for resource in $resources; do pcs resource move $resource agrp-c01n01; done
pcs status
pcs cluster stop # on agrp-c01n02
systemctl poweroff # on agrp-c01n02
resources=$(pcs constraint --full | grep -E "Enabled.+Started.+vm[0-9]" | awk '{print $7}' | cut -d\- -f3,4 | cut -d\) -f 1)
for resource in $resources; do pcs resource clear $resource; done
power-on power-offed node
pcs cluster start # all VM should migrate to their original positions

VM resource restarting test

This process should be repeated for all VM's on it's primary node. Here only steps for vm01-nagios are shown but they are identical for the other VMs:
clear; tail -f -n 0 /var/log/messages
virsh console vm01-nagios
shutdown -h 0
In the output of the /var/log/messages you we'll see below lines:
....
Apr 26 16:08:16 agrp-c01n01 systemd-machined: Machine qemu-3-vm01-nagios terminated
Apr 26 16:08:17 agrp-c01n01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port vnet0
Apr 26 16:08:24 agrp-c01n01 pengine[2795]: warning: Processing failed op monitor for vm01-nagios on agrp-c01n01: not running (7)
...
Apr 26 16:08:24 agrp-c01n01 crmd[2796]:  notice: Initiating start operation vm01-nagios_start_0 locally on agrp-c01n01
...
Apr 26 16:08:25 agrp-c01n01 crmd[2796]:  notice: Result of start operation for vm01-nagios on agrp-c01n01: 0 (ok)
pcs status # vm01-nagios should be "Started"

Nodes Crash-Test

Simulating Software Crash

Crashing agrp-c01n01:
clear; tail -f -n 0 /var/log/messages # on agrp-c01n02
echo c > /proc/sysrq-trigger # on agrp-c01n01
What is in the log (key points):
16:28:17 agrp-c01n02 corosync[2217]: [TOTEM ] A processor failed, forming new configuration.
16:28:18 agrp-c01n02 attrd[2228]:  notice: Node agrp-c01n01 state is now lost
16:28:18 agrp-c01n02 kernel: dlm: closing connection to node 1
agrp-c01n02 corosync[2217]: [MAIN  ] Completed service synchronization, ready to provide service.
16:28:18 agrp-c01n02 dlm_controld[3148]: 2203 fence request 1 pid 22241 nodedown time 1524745698 fence_all dlm_stonith
agrp-c01n02 stonith-ng[2226]:  notice: Requesting peer fencing (reboot) of agrp-c01n01
16:28:22 agrp-c01n02 kernel: drbd r0: PingAck did not arrive in time.
16:28:22 agrp-c01n02 kernel: drbd r0: helper command: /sbin/drbdadm fence-peer r0
16:28:23 agrp-c01n02 pengine[2229]: warning: Cluster node agrp-c01n01 will be fenced: peer is no longer part of the cluster
16:28:23 agrp-c01n02 pengine[2229]: warning: Node agrp-c01n01 is unclean
16:28:23 agrp-c01n02 crmd[2230]:  notice: Requesting fencing (reboot) of node agrp-c01n01
16:28:26 agrp-c01n02 kernel: drbd r1: PingAck did not arrive in time.
16:28:2d6 agrp-c01n02 kernel: drbd r1: helper command: /sbin/drbdadm fence-peer r1
16:28:47 agrp-c01n02 stonith-ng[2226]:  notice: Call to fence_ipmi_n01 for 'agrp-c01n01 reboot' on behalf of stonith-api.22241@agrp-c01n02: OK (0)
16:28:47 agrp-c01n02 crmd[2230]:  notice: Peer agrp-c01n01 was terminated (reboot) by agrp-c01n02 for agrp-c01n02: OK (ref=47de2c75-7e7c-49d3-9796-17779e46e0bd) by client crmd.2230
16:28:47 agrp-c01n02 pengine[2229]:  notice:  * Start      vm02-www          (       agrp-c01n02 )
16:28:47 agrp-c01n02 pengine[2229]:  notice:  * Start      vm01-nagios        (       agrp-c01n02 )
16:28:48 agrp-c01n02 crm-fence-peer.sh[22390]: INFO peer is fenced, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-r1-ms_drbd_r1'
26 16:28:48 agrp-c01n02 crm-fence-peer.sh[22271]: INFO peer is fenced, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-r0-ms_drbd_r0'
16:28:49 agrp-c01n02 kernel: GFS2: fsid=agrp-c01:shared.1: jid=0: Looking at journal...
16:28:49 agrp-c01n02 kernel: GFS2: fsid=agrp-c01:shared.1: recover generation 9 done
16:28:49 agrp-c01n02 crmd[2230]:  notice: Result of start operation for vm02-www on agrp-c01n02: 0 (ok)
26 16:28:49 agrp-c01n02 crmd[2230]:  notice: Result of start operation for vm01-nagios on agrp-c01n02: 0 (ok)

BE CAREFUL: on production system it's better to start cluster on restored node, wait until r0 is UpToDate an then delete r0 constraint (r0 will be promoted to master for both nodes) and then wait until r1 UpToDate an then delete r1 constraint (r1 will be promoted to master for both nodes):
constraints=$(pcs constraint --full | grep -E "drbd-fence.+rule" | awk '{print $4}' | cut -d\: -f 2 | cut -d\) -f 1)
for constraint in $constraints; do pcs constraint remove $constraint; done
Login to the agrp-c01n01:
pcs cluster start # vm01 & vm02 will migrate to the agrp-c01n01


Crashing agrp-c01n02:
clear; tail -f -n 0 /var/log/messages # on agrp-c01n01
echo c > /proc/sysrq-trigger # on agrp-c01n02
What is in the log (key points): 
mostly the same as for agrp-c01n01

BE CAREFUL: on production system it's better to start cluster on restored node, wait until r0 is UpToDate an then delete r0 constraint (r0 will be promoted to master for both nodes) and then wait until r1 UpToDate an then delete r1 constraint (r1 will be promoted to master for both nodes):
constraints=$(pcs constraint --full | grep -E "drbd-fence.+rule" | awk '{print $4}' | cut -d\: -f 2 | cut -d\) -f 1)
for constraint in $constraints; do pcs constraint remove $constraint; done
Login to the agrp-c01n02:
pcs cluster start # vm03 & vm04 will migrate to the agrp-c01n02

Simulating Hardware Crash

Crashing agrp-c01n01:
clear; tail -f -n 0 /var/log/messages # on agrp-c01n02
power-off the node (pull power cord out of PSU)
What is in the log (key points): 
mostly the same as for agrp-c01n01 (while Simulating Software Crash), different points:
18:05:58 agrp-c01n02 corosync[4176]: [TOTEM ] A processor failed, forming new configuration.
18:06:34 agrp-c01n02 fence_ipmilan: Connection timed out
18:06:02 agrp-c01n02 crm-fence-peer.sh[6507]: No messages received in 3 seconds.. aborting
18:07:10 agrp-c01n02 stonith-ng[4190]:  notice: Call to fence_ipmi_n01 for 'agrp-c01n01 reboot' on behalf of stonith-api.6572@agrp-c01n02: Connection timed out (-110)
Apr 26 18:07:10 agrp-c01n02 stonith-ng[4190]: warning: Agent 'fence_ifmib' does not advertise support for 'reboot', performing 'off' action instead
18:07:28 agrp-c01n02 crm-fence-peer.sh[6507]: INFO peer is not reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-r1-ms_drbd_r1'
18:07:28 agrp-c01n02 kernel: drbd r1: helper command: /sbin/drbdadm fence-peer r1 exit code 5 (0x500)
18:07:28 agrp-c01n02 kernel: drbd r1: fence-peer helper returned 5 (peer is unreachable, assumed to be dead)
18:07:30 agrp-c01n02 stonith-ng[4190]:  notice: Call to fence_ifmib_n01 for 'agrp-c01n01 reboot' on behalf of stonith-api.6572@agrp-c01n02: OK (0)
18:07:30 agrp-c01n02 kernel: drbd r0: fence-peer helper returned 7 (peer was stonithed)
18:07:32 agrp-c01n02 crmd[4194]:  notice: Result of start operation for vm01-nagios on agrp-c01n02: 0 (ok)
18:07:32 agrp-c01n02 crmd[4194]:  notice: Result of start operation for vm02-www on agrp-c01n02: 0 (ok)
power-on agrp-c01n01
fence_ifmib --ip agrp-stack01 --community agrp-c01-community --plug Port-channel2 --action status
fence_ifmib --ip agrp-stack01 --community agrp-c01-community --plug Port-channel2 --action on
fence_ifmib --ip agrp-stack01 --community agrp-c01-community --plug Port-channel2 --action status
Login to the agrp-c01n01 and verify it's operational
pcs stonith cleanup # on agrp-c01n02

BE CAREFUL: on production system it's better to start cluster on restored node, wait until r0 is UpToDate an then delete r0 constraint (r0 will be promoted to master for both nodes) and then wait until r1 UpToDate an then delete r1 constraint (r1 will be promoted to master for both nodes):
constraints=$(pcs constraint --full | grep -E "drbd-fence.+rule" | awk '{print $4}' | cut -d\: -f 2 | cut -d\) -f 1)
for constraint in $constraints; do pcs constraint remove $constraint; done
Login to the agrp-c01n01:
pcs cluster start # vm01 & vm02 will migrate to the agrp-c01n01
pcs status


Crashing agrp-c01n02:
clear; tail -f -n 0 /var/log/messages # on agrp-c01n02
power-off the node (pull power cord out of PSU)
What is in the log (key points): 
mostly the same as for agrp-c01n01 (while Simulating Software & Hardware Crash)


power-on agrp-c01n02
fence_ifmib --ip agrp-stack01 --community agrp-c01-community --plug Port-channel3 --action status
fence_ifmib --ip agrp-stack01 --community agrp-c01-community --plug Port-channel3 --action on
fence_ifmib --ip agrp-stack01 --community agrp-c01-community --plug Port-channel3 --action status
Login to the agrp-c01n02 and verify it's operational
pcs stonith cleanup # on agrp-c01n01

BE CAREFUL: on production system it's better to start cluster on restored node, wait until r0 is UpToDate an then delete r0 constraint (r0 will be promoted to master for both nodes) and then wait until r1 UpToDate an then delete r1 constraint (r1 will be promoted to master for both nodes):
constraints=$(pcs constraint --full | grep -E "drbd-fence.+rule" | awk '{print $4}' | cut -d\: -f 2 | cut -d\) -f 1)
for constraint in $constraints; do pcs constraint remove $constraint; done
Login to the agrp-c01n02:
pcs cluster start # vm01 & vm02 will migrate to the agrp-c01n01
pcs status


Administrative modes: standby, unmanaged, maintenance

Recurring monitor operations behave differently under various administrative settings:
  1. When a resource is unmanaged: No monitors will be stopped. If the unmanaged resource is stopped on a node where the cluster thinks it should be running, the cluster will detect and report that it is not, but it will not consider the monitor failed, and will not try to start the resource until it is managed again. Starting the unmanaged resource on a different node is strongly discouraged and will at least cause the cluster to consider the resource failed, and may require the resource’s target-role to be set to Stopped then Started to be recovered.
  2. When a node is put into standby: All resources will be moved away from the node, and all monitor operations will be stopped on the node, except those with role=Stopped. Monitor operations with role=Stopped will be started on the node if appropriate.
  3. When the cluster is put into maintenance mode: All resources will be marked as unmanaged. All monitor operations will be stopped, except those with role=Stopped. As with single unmanaged resources, starting a resource on a node other than where the cluster expects it to be will cause problems.
Maintenance mode and making resource unmanaged are preferred method if you are doing the online changes on the cluster nodes.  Standby mode is preferred if you need some hardware maintenance.

Managed/unmanaged resources

To make resource unmanaged via :
pcs resource unmanage libvirtd # it the same as: pcs resource update libvirtd meta is-managed=false
pcs status
 Clone Set: libvirtd-clone [libvirtd]
     libvirtd (systemd:libvirtd): Started agrp-c01n01 (unmanaged)
     libvirtd (systemd:libvirtd): Started agrp-c01n02 (unmanaged)
To make resource managed again:
pcs resource manage libvirtd # it the same as: pcs resource update libvirtd meta is-managed=true
pcs status
 Clone Set: libvirtd-clone [libvirtd]
     Started: [ agrp-c01n01 agrp-c01n02 ]

Standby/Unstandby 


Standby means that node is not permitted to run any resources but participates in voting (if note shutdown).

To move node to the standby mode:
pcs node standby agrp-c01n02
Aftre issuing - all VMs are migrated to the agrp-c01n01 and then all resources on agrp-c10n02 are stopped and node itself is shown as:
Node agrp-c01n02: standby
pcs quorum status # we'll see that standby node is also participating in voting
Node in standby mode can be restarted or shut-downed, then the status will change into:
Node agrp-c01n02: OFFLINE (standby)
Also total votes will be "1":
pcs quorum status

To clear standby mode after reboot or shutdown:
pcs cluster start # on node agrp-c01n02
pcs node unstandby agrp-c01n02


Maintenance

In a Pacemaker cluster, as in a standalone system, operators must complete maintenance tasks such as software upgrades and configuration changes. Here's what you need to keep Pacemaker's built-in monitoring features from creating unwanted side effects.
With clone and master-slave resources better way is to place node into standby mode, because node in maintenance mode can be fenced by the other node (i.e. DRBD PingAck will not arrive int time and node will be fenced).

Maintenance entire cluster:
pcs property list --defaults | grep mainte
pcs property set maintenance-mode=true
or
pcs node maintenace --all
pcs status
              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Online: [ agrp-c01n01 agrp-c01n02 ]

Full list of resources:
All resources are shown with " (unmanaged)" added to the end of the resource status line. 

In maintenance mode, you can stop or restart cluster resources at will. Pacemaker will not attempt to restart them. All resources automatically become unmanaged, that is, Pacemaker will cease monitoring them and hence be oblivious about their status. You can even stop all Pacemaker services on a node, and all the daemons and processes originally started as Pacemaker managed cluster resources will continue to run.
You should know that when you start Pacemaker services on a node while the cluster in maintenance mode, Pacemaker will initiate a single one-shot monitor operation (a "probe") for every resource just so it has an understanding of what resources are currently running on that node. It will, however, take no further action other than determining the resources' status.
Maintenance mode is something you enable before running other maintenance actions, not when you're already half-way through them. And unless you're very well versed in the interdependencies of resources running on the cluster you're working on, it's usually the very safest option. In short: when doing maintenance on your Pacemaker cluster, by default, enable maintenance mode before you start, and disable it after you're done.

To unmaintenance entire cluster:
pcs property set maintenance-mode=false
or
pcs node unmaintenance --all

Maintenance single node:
pcs node maintenance # will set local node into maintenance mode
pcs node maintenance agrp-c01n01 # will set agrp-c01n01 into maintenance mode


Resources disable/enable

If you need to disable (stop) resource:
pcs resource stop vm01-nagios
pcs status
 vm01-nagios (ocf::heartbeat:VirtualDomain): Started agrp-c01n01 (disabled)
virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     vm02-www                       running

To enable resource:
pcs resource enable vm01-nagios
pcs status
 vm01-nagios (ocf::heartbeat:VirtualDomain): Started agrp-c01n01
virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     vm02-www                     running
 2     vm01-nagios                    running


This tutorials were used to understand and setup clustering: 
AN!Cluster
unixarena
clusterlabs.org
hastexo.com

No comments:

Post a Comment