Let's play the cluster game

10.02.2021 | Tim Schmeling in howto

In the first two parts, you prepared your environment. But before we can move on, check your “homework” given at the end of the last article:

  1. Ensure that all VMs have the same timezone and time configuration.
  2. Check that all VMs are reaching each other by IP and (host-/dns-)name, even if the DNS is not working.

If everything is fine – ntp is in sync and all VMs are reachable via name – we can start.

The following diagram gives you an overview about the infrastructure.

The client requests “wiki.intranet” and gets the IP 192.168.177.40 from the DNS (not covered by this series). This additional IP is a virtual IP (VIP) which we’re using for haproxy. Then the client connects to this IP, handled by the cluster. The cluster ensures that the IP is running on the same node as haproxy. It makes no sense to have the IP on one node and haproxy on another.

The green box is the cluster itself with two nodes (lb-1 and lb-2). haproxy is only up and running on one node and the cluster ensures that the VIP is available on the same one.

Actually, haproxy is configured to listen on the individual IP of the particular machine, so you first have to change this to the VIP:

  1. Stop haproxy process
# systemctl stop haproxy
  1. Replace the individual IP with the VIP for each loadbalancer in file /etc/haproxy/haproxy.cfg:
frontend company.intranet
  bind 192.168.177.40:80
  default_backend company_webserver

Ensure that the haproxy process is stopped on both nodes.

Install the package libglue-devel and restart the cluster on each loadbalancer.

# zypper in libglue-devel
# systemctl restart pacemaker

Ensure that this package is installed on both loadbalancers lb-1 and lb-2 and the cluster is up and running.

Now, let’s start to configure the cluster. Open a terminal to lb-2 and run the following command

  1. run on lb-2
# crm_mon -rnfj

You should leave this Window open to have a look into what happens to the cluster. Open a second terminal to lb-1 and perform the following tasks.

  1. Start the crm shell on lb-1
# crm configure
  1. Show the current configuration of the cluster
crm(live/lb-1)configure# show
node 1084763156: lb-1
node 1084763157: lb-2
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version="2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a" \
        cluster-infrastructure=corosync \
        cluster-name=LB_Cluster
  1. Configure stonith and enable
crm(live/lb-1)configure# primitive rsc_stonith_null stonith:null \
        params hostlist="lb-1 lb-2"
crm(live/lb-1)configure# property stonith-enabled=true

After submitting the command, you should verify and finally commit the configuration to the CIB (Cluster information base):

crm(live/lb-1)configure# verify
crm(live/lb-1)configure# commit

After committing, you should see some changes in the terminal with the crm_mon -rnfj:

Cluster Summary:
  * Stack: corosync
  * Current DC: lb-2 (version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with quorum
  * Last updated: Fri Feb  5 10:43:32 2021
  * Last change:  Fri Feb  5 10:43:30 2021 by root via cibadmin on lb-1
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Node lb-1: online:
    * Resources:
      * rsc_stonith_null        (stonith:null):  Started
  * Node lb-2: online:
    * Resources:

Inactive Resources:
  * No inactive resources

Migration Summary:

This output shows that your stonith dummy device (also called a resource) is started on node lb-1.

  1. Next step is to add the vip into the cluster
crm(live/lb-1)configure# primitive rsc_vip IPaddr2 \
   > params ip=192.168.177.40 cidr_netmask=24

Don’t forget to verify your configuration and if no warning/error occurs, commit the change.

crm(live/lb-1)configure# verify
crm(live/lb-1)configure# commit

After commiting the change, you should see that the VIP is started in the cluster.

  1. Integrate haproxy into the cluster, verify and commit the configuration afterwards
crm(live/lb-1)configure# primitive rsc_loadbalancer systemd:haproxy
crm(live/lb-1)configure# verify
crm(live/lb-1)configure# commit

Now both the VIP and also haproxy is started, but you have to ensure that they are started on the same node. We do this with constraints.

  1. Ensure that VIP and haproxy always started on the same node and VIP is started before haproxy
crm(live/lb-1)configure# colocation col_vip_with_haproxy inf: rsc_vip rsc_loadbalancer
crm(live/lb-1)configure# order ord_vip_with_haproxy rsc_vip rsc_loadbalancer
crm(live/lb-1)configure# verify
crm(live/lb-1)configure# commit

In my case i see the following output of crm_mon -rnfj

Cluster Summary:
  * Stack: corosync
  * Current DC: lb-2 (version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with quorum
  * Last updated: Fri Feb  5 15:55:06 2021
  * Last change:  Fri Feb  5 15:53:58 2021 by root via cibadmin on lb-1
  * 2 nodes configured
  * 3 resource instances configured

Node List:
  * Node lb-1: online:
    * Resources:
      * rsc_stonith_null        (stonith:null):  Started
  * Node lb-2: online:
    * Resources:
      * rsc_vip (ocf::heartbeat:IPaddr2):        Started
      * rsc_loadbalancer        (systemd:haproxy):       Started

Inactive Resources:
  * No inactive resources

Migration Summary:

As we can see, the resources rsc_vip and rsc_loadbalancer started on my second node (lb-2).

Check if the cluster works as expected. When we move the VIP to the other node, we need the cluster to recognize it and also move haproxy to this node.

  1. Move VIP to the other node and check what happens with haproxy
crm(live/lb-1)configure# up
crm(live/lb-1)# resource
crm(live/lb-1)resource# move rsc_vip lb-1
INFO: Move constraint created for rsc_vip to lb-1

We see that both the VIP and haproxy have been moved to the other node (lb-1 in my case) and also got the information that a move constraint was created. Clear that move constraint as follows:

  1. Clear location constraint
crm(live/lb-1)resource# clear rsc_vip
INFO: Removed migration constraints for rsc_vip

At this point you have a very basic working cluster in which a webserver and/or a loadbalancer can fail without your website going down. Please be aware that this setup is not recommended for production use! If you need a cluster for productive environment, have a deeper look into stonith/fencing/SBD mechanisms.

If you need a deepdive into pacemaker/corosync, feel free to read the official documentation [3] or, if you need an consulting expert, contact us [4].

Tim Schmeling
Tim Schmeling
Tim has been at B1 since 2017 and runs a large cloud for an international company with his team. When he's not in the clouds or with customers, he works on high availability, clustering and SAP HANA. He also passes on his knowledge as a trainer.

 


Haben Sie Anmerkungen oder Nachfragen? Melden Sie sich unter blog%b1-systems.de
Col 2