Tuesday, March 27, 2018

Cluster 17. DLM & CLVM (Distributed Lock Manager Clustered Logical Volume Management). 


Clustered LVM


With DRBD providing the raw storage for the cluster, we must next consider partitions. This is where Clustered LVM, known as CLVM, comes into play.
CLVM is ideal in that by using DLM, the distributed lock manager. It won't allow access to cluster members outside of corosync's closed process group, which, in turn, requires quorum.
It is ideal because it can take one or more raw devices, known as "physical volumes", or simple as PVs, and combine their raw space into one or more "volume groups", known as VGs. These volume groups then act just like a typical hard drive and can be "partitioned" into one or more "logical volumes", known as LVs. These LVs are where KVM's virtual machine guests will exist and where we will create our GFS2 clustered file system (KVM and GFS2 will be set up in further posts).
LVM is particularly attractive because of how flexible it is. We can easily add new physical volumes later, and then grow an existing volume group to use the new space. This new space can then be given to existing logical volumes, or entirely new logical volumes can be created. This can all be done while the cluster is online offering an upgrade path with no down time.


Installation and initial setup

On both nodes (you can use ssh agrp-c01n01 comand to execute the same commands on the remote node):
yum install dlm lvm2-cluster -y
rsync -av /etc/lvm /root/backups/

Before creation of the clustered LVM, we need to first make some changes to the LVM configuration (vi /etc/lvm/lvm.conf):
  1. We need to filter out the DRBD backing devices so that LVM doesn't see the same signature a second time on the DRBD resource's backing device. Or in other words - limit the block devices that are used by LVM commands:
    1. filter = [ "a|/dev/drbd|", "a|/dev/sdb|", "r|.*|" ]
    2. pvs # should only show drbd and sdb devices
  2. Switch from local locking to clustered locking:
    1. lvmconf --enable-cluster # Set locking_type to the default clustered type on this system
    2. verify: cat /etc/lvm/lvm.conf |grep locking_type |grep -v "#" # must be locking_type = 3 (clustered locking using DLM)
    3. Other than this setup, creating LVM logical volumes in a clustered environment is identical to creating LVM logical volumes on a single node. There is no difference in the LVM commands themselves, or in the LVM GUI interface.
  3. Do this setting only if your OS itself doesn't use LVM. Don't use locking_type 1 (local) if locking_type 2 or 3 fail. (If an attempt to initialise type 2 or type 3 locking failed, perhaps because cluster components such as clvmd are not running, with this enabled (set to 1), an attempt will be made to use local file-based locking (type 1). If this succeeds, only commands against local VGs will proceed. VGs marked as clustered will be ignored.
    1. fallback_to_local_locking = 0
    2. verify: cat /etc/lvm/lvm.conf |grep fallback_to_local_locking |grep -v "#"
  4. Disable the writing of LVM cache and remove any existing cache:
    1. write_cache_state = 0 # default is "1"
    2. rm /etc/lvm/cache/*
  5. With releases of lvm2 that provide support for lvm2-lvmetad, clusters sharing access to LVM volumes must have lvm2-lvmetad disabled in the configuration and as a service to prevent problems resulting from inconsistent metadata caching throughout the cluster:
    1. use_lvmetad = 0
    2. verify: cat /etc/lvm/lvm.conf |grep use_lvmetad |grep -v "#"
    3. systemctl disable lvm2-lvmetad.service
    4. systemctl disable lvm2-lvmetad.socket
    5. systemctl stop lvm2-lvmetad.service
    6. systemctl status lvm2-lvmetad
    7. Remove lvmetad socket file (if exists): rm '/etc/systemd/system/sockets.target.wants/lvm2-lvmetad.socket'

Setup DLM and CLVM

Create DLM and CLVMD clone cluster Resources (Clone options allows resource to can run on both nodes. In other words clone is Active/Active mode):
pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence
pcs resource clone dlm clone-max=2 clone-node-max=1 interleave=true ordered=true
pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence
pcs resource clone clvmd clone-max=2 clone-node-max=1 interleave=true ordered=true
Verify:
pcs status

Almost every decision in a Pacemaker cluster, like choosing where a resource should run, is done by comparing scores. Scores are calculated per resource, and the cluster resource manager chooses the node with the highest score for a particular resource. (If a node has a negative score for a resource, the resource cannot run on that node.)

We can manipulate the decisions of the cluster with constraints. Constraints have a score. If a constraint has a score lower than INFINITY, it is only a recommendation. A score of INFINITY means it is a must. Default score is INFINITY. To view current constraints: pcs constraint

Each of our devices must be promoted before starting DLM on that node. Configure the resource order (we want drbd to start and promote to Master first and then we want dlm to start and if successful - clvmd will be started):
pcs constraint order promote ms_drbd_r0 then promote ms_drbd_r1 kind=Mandatory
pcs constraint promote start ms_drbd_r1 then start dlm-clone kind=Mandatory
pcs constraint order start dlm-clone then clvmd-clone kind=Mandatory 

Ordering constraints affect only the ordering of resources; they do not require that the resources
be placed on the same node. If you want resources to be started on the same node and in a
specific order, you need both an ordering constraint and a colocation constraint
A colocation constraint determines that the location of one resource depends on the location of another resource. We need clvmd-clone start on the same node where dlm-clone is stared (this resources are cloned, so both nodes will have dlm and clvmd started):
pcs constraint colocation add ms_drbd_r1 with ms_drbd_r0
pcs constraint colocation add dlm-clone with ms_drbd_r1
pcs constraint colocation add clvmd-clone with dlm-clone
Verify:
pcs constraint

Options described:

  • These two settings are the default we using them only for clarity:
    • clone-max=2 - how many copies of the resource to start (we'll have 2 copies of both dlm and clvmd)
    • clone-node-max=1 - how many copies of the resource can be started on a single node (because of previous setting we'll have 2 copies of each resource and because of this setting only one copy of each resource can be started on one node, so we'll have 4 resources, 2 on each node - dlm & clvmd) 
  • on-fail=fence - STONITH the node on which the resource failed.
  • interleave=true - If this clone depends on another clone via an ordering constraint, is it allowed to start after the local instance of the other clone starts, rather than wait for all instances of the other clone to start (so node1 will start dlm and then clvmd resource, but if we setup "interleave=false", then node1 will wait until dlm is started on node2 too, and will start clvmd only after that)
  • ordered=true - will start copies in serial rather than in parallel
  • kind=Mandatory - Always. If first (dlm-clone) does not perform first-action (default first-action is start), then (clvmd-clone) will not be allowed to performed then-action (default then-action equals to the value of the first-action, so default is - start). If first (dlm-clone) is restarted, then (clvmd-clone - if running) will be stopped beforehand and  started afterward.

Check cluster status:
pcs status

Check that DLM is working properly:
dlm_tool ls # name clvmd / members 1 2 / seq 2,2 on one node and 1,1 on the other

Function clvmd_start() calls function clvmd_activate_all() which is basicly "ocf_run vgchange -ay" so if a clustered volume group is used, by default the clvm resource agent will activate it on all nodes.

Setup Clustered LV /shared

On one node: pvscan # scans all supported LVM block devices in the system for PVs - we must see only OS PVs's
On one node: pvcreate /dev/drbd{0,1}
On both nodes: pvdisplay # verify this on both nodes smth. like ""/dev/drbd1" is a new physical volume of "<465.58 GiB"" will appear
On both nodes: vgscan # scans all supported LVM block devices in the system for VGs - we must see only OS VGs's

Resource r0 will provide disk space for VMs that will normally run on agrp-c01n01.
Resource r1 will provide disk space for VMs that will normally run on agrp-c01n02.
So we'll use appropriate names while creating VGs:

  1. r0 (drbd0) will be in VG agrp-c01n01_vg0
  2. r1 (drbd1) will be in VG agrp-c01n02_vg0

On one node: vgcreate -Ay -cy agrp-c01n01_vg0 /dev/drbd0 
On one node: vgcreate -Ay -cy agrp-c01n02_vg0 /dev/drbd1

Options:

  • -A - Specifies if metadata should be backed up automatically after a change.  Enabling this is strongly advised! See vgcfgbackup(8) for more information.
  • -c - Create a clustered VG using clvmd if LVM is compiled with cluster support.  This allows multiple hosts to share a VG on shared devices.  clvmd and a lock manager must be configured and running.  (A clustered VG using clvmd is different from a shared VG using lvmlockd.)  See clvmd(8) for more information about clustered VGs.


On both nodes: vgs and then pvs
On both nodes: lvscan #  List all logical volumes in all volume groups - we must see only OS LVs's
On one node: lvcreate -L 20G -n shared agrp-c01n01_vg0 # we'll use this shared LV to store OS images and other service information
On both nodes: lvdisplay # we must see newly created "shared" named LV

Now we done with DLM & CLVM. 
Reboot both servers, then start cluster and verify that PV, VG and LV are show properly. Then stop cluster on any node and verify that this node sees only local LVs. If everything works as expected, proceed next.


This tutorials were used to understand and setup clustering: 
AN!Cluster
clusterlabs.org
redhat.com


No comments:

Post a Comment