Monday, March 5, 2018

Cluster 14. Setup partitions for DRBD.

Overview

DRBD states for Distributed Replicated Block Device, is a technology that takes raw storage from two nodes and keeps their data synchronized in real time. It is sometimes described as "network RAID Level 1", and that is conceptually accurate. In this tutorial cluster, DRBD will be used to provide that back-end storage as a cost-effective alternative to a traditional SAN device.

With traditional raid, you would take:
HDD1 + HDD2 -> /dev/sda

With DRBD, you have this:
node1:/dev/sda5 + node2:/dev/sda5 -> both:/dev/drbd0

In both cases, as soon as you create the new device, you pretend like the member devices no longer exist. You format a file-system onto newly created device as an LVM physical volume, and so on.
The main difference with DRBD is that the /dev/drbd0 will always be the same on both nodes. If you write something to node 1, it's instantly available on node 2, and vice versa. Of course, this means that what ever you put on top of DRBD has to be "cluster aware". That is to say, the program or file system using the new /dev/drbd0 device has to understand that the contents of the disk might change because of another node.


Setuping partitions for DRBD

We're going to use a program called parted instead of fdisk. With fdisk, we would have to manually ensure that our partitions fell on 64 KiB block boundaries. With parted, we can use the -a opt to tell it to use optimal alignment, saving us a lot of work. This is important for decent performance performance in our servers. This is true for both traditional platter and modern solid-state drives.
For performance reasons, we want to ensure that the file systems created within a VM matches the block alignment of the underlying storage stack, clear down to the base partitions on /dev/sda (or what ever your lowest-level block device is). By changing the start cylinder of our partitions to always start on 64 KiB boundaries, we're sure to keep the guest OS's file system in-line with the DRBD backing device's blocks. Thus, all reads and writes in the guest OS effect a matching number of real blocks, maximizing disk I/O efficiency.

yum install parted -y # on both nodes

We will setup 2 DRBD resources:
  1. r0 - This resource will back the VMs designed to primarily run on agrp-c01n01
  2. r1 - This resource will back the VMs designed to primarily run on agrp-c01n02
In the case of the split brain (if both nodes are online and both remain working) we must know which nodes data is more valid. We will consider each node to be more valid for a group of VM's - these VM's will be default resources for that node, so we can easily recover:
  1. The DRBD resource hosting agrp-c01n01's servers can invalidate any changes on agrp-c01n02. We consider agrp-c01n01 to be more valid for r0 resource
  2. The DRBD resource hosting agrp-c01n02's servers can invalidate any changes on agrp-c01n01. We consider agrp-c01n02 to be more valid for r1 resource
LVM (Logical Volume) - we have physical volumes (execute pvs to view), every PV is then added to the volume group (in the pvs output you can see VG name in front of the PV name), then VG can be divided into the logical volumes (execute vgs to see how many PV and LV each group has and execute lvs to see details about each LV).

We are going to use raw partitions and if your system has not spare raw partitions, in Cluster 1 post I advised to leave space assumed for VMs as raw partition/device, if you didn't the simplest way will be to reinstall all the things from the start, or (steps below are not tested enough, so use them at your own risk - I assume that you have additional partition with enough space to move /root /home and swap there) - at first you will reduce size of /dev/mapper/centos-home partition (yours name can be different - choose non-root partition which is big enough):

Firstly we will resize /home partition:
umount /home
df -h # verify that /home is unmounted
parted -l # to find type of the home LV filesystem (fxs in my case)
lvremove /dev/centos/home # remove LV from the VG
lvcreate -L 50G -n home centos # create new smaller partition for home in VG centos
mkfs.xfs /dev/mapper/centos-home # make home partition formatted as XFS file-system
mount -a # remount all partitions
df -h # verify that /home is mounted

Secondly we will return allocated free space from PV to the raw partition:
Now determine how much space do you want for your DRBD resources, I'll use 500GB for r0 and 500GB for r1:
pvs # find name of the raw partition with enough free space /dev/sda3 in my case
pvdisplay /dev/sda3 # take PE Size (4 MiB in my case) and Allocated PE (27616 in my case). PE is Physical Extent. Calculate space needed for /dev/sda3 (this space is already usedfor /root , /home and swap). You can use bash bc calculator. 4*27616=110464MiBs (MiB = 1024KiB / MB = 1000KB), 110464/1024=107GiBs. Or you can use:
pvs -o +used /dev/sda3 #  this command shows 107.88g instead of our calculated 107GiBs
pvs -v --segments /dev/sda3 # free space must be only at the end of the LVM, in my case I have some free space before root partition, so I need to reallocate space: pvmove --alloc anywhere /dev/sda3:940779-953578 # /dev/sda3:940779-953578 - is the value from PE ranges column of the pvs -v --segments /dev/sda3 command output, after reallocating - recheck with pvs -v --segments /dev/sda3
After reallocating all partitions and having free space only at the end of the LVm disks, move partitions to the additional disk (/dev/sdb2 in my case):
pvmove  /dev/sda3:14816-27615 /dev/sdb2 # moving /root
pvmove  /dev/sda3:2016-14815 /dev/sdb2 # moving /home
pvmove  /dev/sda3:0-2015 /dev/sdb2 # moving swap
If you renamed VG or want to rename - follow instructions in this post - vgrename

vgreduce centos00 /dev/sda3
pvremove /dev/sda3

Lets start with parted:
parted -a optimal /dev/sda # access parted for /dev/sda - /dev/sda3 partition's disk
rm 3 # remove /dev/sda3
print free # to see how many space we have to form new partitions 
mkpart extended 1076MB 501.076GB # 1076MB is the start point of free space 501.076GB is the start-point + 500GB
print free # check that new extended partition is created and it is 500GB
Now we'll create two logical partitions on the newly created extended partition:
mkpart extended 501GB 1001GB # 501GB is the start point of free space 1001GB is the start-point + 500GB
print free # check that new extended partition is created and it is 500GB
We created 2 extended partitions, lets check that they are aligned optimally:
align-check opt 3 # must return "3 aligned"
align-check opt 4 # must return "4 aligned"
quit # to escape parted

This tutorial was used to understand and setup clustering: 
AN!Cluster

No comments:

Post a Comment