Wednesday, February 21, 2018

Cluster 11. Pacemaker parts: CIB, PEngine, DC, CRMd, LRMd

Cluster simplified overview

We need to configure the cluster in two stages. This is because we have something of a chicken-and-egg problem:
  • We need clustered storage for our virtual machines.
  • Our clustered storage needs the cluster for fencing.
Conveniently, clustering has two logical parts:
  • Cluster communication and membership - cluster manager (which nodes are part of the cluster - managed by: cman in CentOS 6 & corosync.service in CentOS 7)
  • Cluster resource management (manages clustered service, storage, virtual servers - managed by rgmanager in CentOS 6 & pcsd.service in CentOS7) 
Right after a node fails, cluster manager initiates a fence agent against this lost node
After being told (by cluster manager) that the node is lost - resource manager looks to see what services might have been lost and decides what to do using resource management configuration. Usually cluster manager and resource manager can work independently, so to start cluster we need both services to be started.

Pacemaker


Pacemaker is the cluster resource manager (pcsd.service is it's daemon). Itself it consists of five key components:

  1. CIB
  2. CRMd
  3. LRMd
  4. PEngine
  5. STONITHd

The CIB uses XML to represent both the cluster’s configuration and current state of all resources in the cluster. The contents of the CIB are automatically kept in sync across the entire cluster and are used by the PEngine (Policy Engine) to compute the ideal state of the cluster and how it should be achieved.
This list of instructions is then fed to the Designated Controller (DC to find which node is currently selected as DC: pcs status | grep DC | awk '{print $3}'). Pacemaker centralizes all cluster decision making by electing one of the CRMd (Cluster Resource Management daemon) instances to act as a master. Should the elected CRMd process (or the node it is on) fail, a new one is quickly established. The DC carries out the PEngine’s instructions in the required order by passing them to either the Local
Resource Management daemon (LRMd) or CRMd peers on other nodes via the cluster messaging infrastructure (which in turn passes them on to their LRMd process).
The peer nodes all report the results of their operations back to the DC and, based on the expected and actual results, will either execute any actions that needed to wait for the previous one to complete, or abort processing and ask the PEngine to recalculate the ideal cluster state based on the unexpected results.
In some cases, it may be necessary to power off nodes in order to protect shared data or complete resource recovery. For this, Pacemaker comes with STONITHd (Fencing daemon).
In Pacemaker, STONITH devices are modeled as resources (and configured in the CIB) to enable them to be easily monitored for failure, however STONITHd takes care of understanding the STONITH topology such that its clients simply request a node be fenced, and it does the rest.

CIB

The cluster is defined by the CIB, which uses XML notation. The major sections that make up a CIB (pcs cluster cib which is a wrapper for cibadmin --query utility):

  • cib: The entire CIB is enclosed with a cib tag. Certain fundamental settings are defined as attributes of this tag.
    • configuration: This section — the primary focus of this document — contains traditional configuration information such as what resources the cluster serves and the relationships among them. Can be checked by pcs cluster cib scope=configuration 
    • crm_config: cluster-wide configuration options. Can be checked by pcs cluster cib scope=crm_config 
    • nodes: the machines that host the cluster.Can be checked by pcs cluster cib scope=nodes 
    • resources: the services run by the cluster. Can be checked by pcs cluster cib scope=resources 
    • constraints: indications of how resources should be placed. Can be checked by pcs cluster cib scope=constraints 
  • status: This section contains the history of each resource on each node. Based on this data, the cluster can construct the complete current state of the cluster. The authoritative source for this section is the local resource manager (lrmd process) on each cluster node, and the cluster will occasionally repopulate the entire section. For this reason, it is never written to disk, and administrators are advised against modifying it in any way.

Normally command line utilities are used to setup cluster. That tools abstract the XML. But overall understanding of the cluster work is more distinct when understanding not only configuration commands but also how this commands are translated into the xml and propagated over the all cluster nodes. To understand this abstraction:

  • Properties are XML attributes of an XML element.
  • Options are name-value pairs expressed as nvpair child elements of an XML element.


This tutorials were used to understand and setup clustering: 

No comments:

Post a Comment