From simon.edwards at linuxha.net Tue Jul 24 21:01:43 2007 From: simon.edwards at linuxha.net (Simon Edwards) Date: Tue, 24 Jul 2007 21:01:43 +0100 Subject: [Linuxha-users] Linuxha v2 [aka truecl] is getting closer! Message-ID: <1185307303.15437.20.camel@bongo> Hello all, The follow up to Linuxha 1.x is now really taking shape. Due to my day-to-day workload it has taken far longer than expected to get this far - but now the software is living up to the intended design. I've captured some logs from cluster forming, application starting, status reporting, application stopping and cluster halting to give an idea of how things are currently. The "lha_form" routine is used to start the cluster - the example cluster is a 4 node cluster running Slackware, though the distribution does not matter. Notice the formation time is 5 seconds. A 8 node cluster forms in less than 10 seconds. root at slack10s1:/opt/truecl/log# lha_form --verbose Date: 2007/07/24 143110 [ 4260] LOG Verbose logging mode selected. 143110 [ 4260] LOG Checking for available Request Daemons ... 143110 [ 4260] LOG Starting Support Daemons on 143110 [ 4260] LOG slack10s1,slack10s2,slack10s3,slack10s4 ... 143113 [ 4260] LOG slack10s1 : hbd YES,lockd YES,netd NO,syncd YES,statd YES 143113 [ 4260] LOG slack10s2 : hbd YES,lockd YES,netd NO,syncd YES,statd YES 143113 [ 4260] LOG slack10s3 : hbd YES,lockd YES,netd NO,syncd YES,statd YES 143113 [ 4260] LOG slack10s4 : hbd YES,lockd YES,netd NO,syncd YES,statd YES 143113 [ 4260] LOG Starting Cluster Daemons on 143113 [ 4260] LOG slack10s1,slack10s2,slack10s3,slack10s4 ... 143113 [ 4260] LOG slack10s1 OK STARTED 143113 [ 4260] LOG slack10s2 OK STARTED 143113 [ 4260] LOG slack10s3 OK STARTED 143113 [ 4260] LOG slack10s4 OK STARTED 143115 [ 4260] LOG slack10s1 acting as current cluster master. The "lha_startapp" routine starts an application. By default it runs on the current node if that node is suitable for the application. root at slack10s1:/opt/truecl/log# lha_startapp -A test1 -V Date: 2007/07/24 143443 [ 4279] WARN No configured or specified timeout for 'test1' - default 143443 [ 4279] WARN to 60. 143443 [ 4279] LOG Validated node 'slack10s1' is suitable for hosting 143443 [ 4279] LOG application 'test1'. 143443 [ 4279] LOG Attempting connection to Cluster Daemon on 'slack10s1' 143443 [ 4279] LOG ... 143443 [ 4279] LOG Connection to Cluster Daemon on 'slack10s1' successful. 143443 [ 4279] LOG Attempting connection to Master Cluster Daemon on 143443 [ 4279] LOG 'slack10s1' ... 143443 [ 4279] LOG Connection to Master Cluster Daemon on 'slack10s1' 143443 [ 4279] LOG successful. 143443 [ 4279] LOG Checking for available Request Daemons - please wait. 143443 [ 4279] LOG Required Request Daemons [slack10s1] running. 143443 [ 4279] LOG Attempting connection to Lock Daemon on 'slack10s1' ... 143443 [ 4279] LOG Connection to Lock Daemon on 'slack10s1' successful. 143443 [ 4279] LOG Attempting connection to Stat Daemon on 'slack10s1' ... 143443 [ 4279] LOG Connection to Stat Daemon on 'slack10s1' successful. 143443 [ 4279] LOG Stat Daemon on 'slack10s1' confirmed 'test1' is not 143443 [ 4279] LOG running. 143443 [ 4279] LOG Attempting storage activation on other nodes - please 143443 [ 4279] LOG wait... 143443 [ 4279] LOG other nodes: 143443 [ 4279] LOG slack10s2,slack10s1 143444 [ 4279] LOG Available relevant nodes have performed non-current 143444 [ 4279] LOG storage activation. 143444 [ 4279] LOG Attempting storage activation on node 'slack10s1' - 143444 [ 4279] LOG please wait... 143446 [ 4279] LOG Attempting final storage configuration on secondary nodes 143446 [ 4279] LOG - please wait... 143446 [ 4279] LOG Attempting to mount file systems on 'slack10s1' - please 143446 [ 4279] LOG wait... 143446 [ 4279] LOG File Systems mounted: OK=1, FAILED=0. 143446 [ 4279] LOG Application 'test1' IP configured successfully: 143446 [ 4279] LOG Configuring 192.168.1.243: /sbin/ifconfig eth0:1 inet 143446 [ 4279] LOG 192.168.1.243 143446 [ 4279] LOG Sending Builtin Gratuitous arp for eth0:1 143446 [ 4279] LOG Application Started successfully [RC=0]. Once the application is running the "lha_stat" gives an overview of the cluster status: root at slack10s1:/opt/truecl/log# lha_stat cluster: slackcl - UP nodes: 4 [0 DOWN/4 UP] Node Status Apps slack10s1 UP 1 slack10s2 UP 0 slack10s3 UP 0 slack10s4 UP 0 Appname Status Node F/O Notes test1 UP slack10s1 2 The "lha_stat" can be passed a "-A appname" to give more details on a particular application: root at slack10s1:/opt/truecl/log# lha_stat -A test1 Application Status Node Storage Validated Valid Nodes test1 RUNNING slack10s1 DRBD1 Y slack10s1,slack10s2 VG/LV Type Mount Point Size Status testvg/test1 ext3 /test1 131072 Active,Syncing[12Kb/Sec] Applications are stopped using the "lha_stopapp" - again works more quickly than linuxha 1.x: root at slack10s1:/opt/truecl/log# lha_stopapp -A test1 -V Date: 2007/07/24 143634 [ 4321] WARN No configured or specified timeout for 'test1' - default 143634 [ 4321] WARN to 60. 143634 [ 4321] LOG Ascertaining current node for 'test1' ... 143634 [ 4321] LOG Application 'test1' is running on 'slack10s1'. 143634 [ 4321] LOG Attempting connection to Cluster Daemon on 'slack10s1' 143634 [ 4321] LOG ... 143634 [ 4321] LOG Connection to Cluster Daemon on 'slack10s1' successful. 143634 [ 4321] LOG Attempting connection to Master Cluster Daemon on 143634 [ 4321] LOG 'slack10s1' ... 143634 [ 4321] LOG Connection to Master Cluster Daemon on 'slack10s1' 143634 [ 4321] LOG successful. 143634 [ 4321] LOG Checking for available Request Daemons - please wait. 143634 [ 4321] LOG Following Request Daemons running:slack10s1,slack10s2 143634 [ 4321] LOG Attempting connection to Lock Daemon on 'slack10s1' ... 143634 [ 4321] LOG Connection to Lock Daemon on 'slack10s1' successful. 143634 [ 4321] LOG Attempting connection to Stat Daemon on 'slack10s1' ... 143634 [ 4321] LOG Connection to Stat Daemon on 'slack10s1' successful. 143634 [ 4321] LOG Stat Daemon on 'slack10s1' confirmed 'test1' is running. 143635 [ 4321] LOG Stopping application 'test1' on 'slack10s1'... 143635 [ 4321] LOG Application Stopped successfully. 143635 [ 4321] LOG Deconfiguration IP configuration for 'test1' on 143635 [ 4321] LOG 'slack10s1'... 143635 [ 4321] LOG IP Addresses deconfigured successfully. 143635 [ 4321] LOG Attempting to un-mount file systems on 'slack10s1' - 143635 [ 4321] LOG please wait... 143635 [ 4321] LOG File Systems un-mounted: OK=1, FAILED=0. 143635 [ 4321] LOG Attempting storage deactivation on 'slack10s1' ... 143635 [ 4321] LOG Re-attempting remote storage deactivation ... 143635 [ 4321] WARN slack10s1 : ERROR - Status information for device 0 not 143635 [ 4321] WARN found in '/proc/drbd'. from 143635 [ 4321] WARN run_before_shutdown_on_non_current 143635 [ 4321] LOG Available relevant nodes have performed storage 143635 [ 4321] LOG deactivation. Finally dissolving the cluster requires running the "lha_dissolve" command: root at slack10s1:/opt/truecl/log# lha_dissolve --verbose Date: 2007/07/24 143705 [ 4328] LOG Checking for available Request Daemons ... 143705 [ 4328] LOG Querying which nodes are running Cluster Daemons ... 143706 [ 4328] LOG Current cluster master is 'slack10s1'. 143706 [ 4328] LOG Number of applications currently running: 0. 143706 [ 4328] LOG Stopping all 'clusterd' processes ... 143706 [ 4328] LOG Stopping all 'hbd' processes ... 143706 [ 4328] LOG Stopping all 'syncd' processes ... 143707 [ 4328] LOG Stopping all 'netd' processes ... 143707 [ 4328] LOG Stopping all 'lockd' processes ... 143707 [ 4328] LOG Stopping all 'statd' processes ... 143707 [ 4328] LOG Cluster has been halted. A "alpha" version is now not far off being released. The network daemon needs to be written and more tests and functionality added and a huge amount of testing needs to be done - but overall things are looking good! Regards, Simon. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://linuxha.net/pipermail/linuxha-users_linuxha.net/attachments/20070724/35fc905f/attachment.html