Linuxha.net 1.0.x to 1.1.x Migration Howto

Introduction
The purpose of this document is to provide the information necessary to allow administrators to migrate from the current "Production Ready" 1.0.x series of releases to the "Development" 1.1.x development series. As the names suggest this move is only currently recommended for non-production usage, or possibly in circumstances where the new features offered are needed as soon as possible due to limitations with the current software.

Ascertain Current Version and Archive Configuration
The first step is to ascertain which version you are currently running. This can be done by running the following command on each node in question:

# cat /etc/cluster/VERSION
1.0.4

On each node the current cluster configuratino should be archived. This will be used to back-out back to the current version if a problem occurs. On each node in the cluster run the following commands:

# cd /etc/cluster
# tar cvzf ~/old_valid_cluster_cfg.tgz .

Stop Cluster Services and Uninstall Existing Version
Run the following command to ensure the cluster is not currently running any applications:

# clhalt --force

At this point the current version should be un-installed. For the RPM version this is a matter of running the following command on each node in the cluster:

# rpm -e linuxha

Note: The above command will leave the existing configuration of any packages intact. The archival of the configuration earlier is to allow reversal of necessary configuration changes described later in this document.

Install BETA 1.1.x Version
Assuming that the latest Beta version is avaliable in /tmp on each node, repeat the following commands on the each node to install the software. The commands below are for the RPM version - use the appropriate commands for the package type you are making use of.

# rpm -Uvh linuxha12-1.1.01-1.noarch.rpm

Following installation validate the version has been installed as expected:

# cat /etc/cluster/VERSION
1.1.01

Required /etc/clconf.xml Changes
As of version 1.1 the following "global" entries are required to define ports for the necessary daemons:
It is also highly recommended that a top level section called "net_known_connections" is added. This should contain 1 or more IP addresses that are external to the cluster and addresses that will respond to ICMP ping requests. These are used to help ascertain network partitioning conditions and typically are routers, firewalls or even Internet sites!

A sample section to add might be:

        <net_known_connections>
        192.168.1.1
        192.168.1.104
        </net_known_connections>

Once the changes have been made simply rebuild the cluster configuration by running the following command on one of the nodes:

# clbuild --verbose --force

Required appconf.xml Changes
At present no changes are strictly necessary to the application configuration files. However  for the "network" settings for each application the following attributes are no longer used and should be removed at some point:
If these attributes are removed then the each application in question will need reconfiguration using the following commands on the node where the configuration file was modified:

# clbuildapp --force --vgbuild --verbose --application apache
# clbuildapp --force --build --verbose --application apache

Start Cluster Services
Once the changes are complete start the cluster services - initially ensuring no "auto-start" applications run:

# clform --noapps

Validate the cluster appears to be running:

# clstat

On each node run the following command to ensure all expected daemons are running:

# ps -ef | grep clustername

The output generated should appear similar to the following:

root      4359     1  0 23:29 ?        00:00:00 cldaemon-cluster1
root      4362     1  0 23:29 ?        00:00:00 cllockd-cluster1
root      4366     1  0 23:29 ?        00:00:00 clnetd-cluster1
root      4369     1  0 23:29 ?        00:00:00 clhbd-cluster1
root      4370  4369  0 23:29 ?        00:00:00 clhbd2-cluster1

If any of the above are not running check for errors in the log files under "/var/log/cluster".

After this start and use the cluster as per 1.0.x series. The handling of cluster formation, and node failure conditions should be speedier and the "clstat" command should continue to return instant results even during the fail-over period (under 1.0.x it could produce intermittent time-outs during the actual fail-over).