Clustered filesystem GFS2

Thunderw · 03-07-2014, 04:24 AM

Hai,

I sat up a clustered filesystem via iSCSI and GFS2, everything works fine except one little - id ont know yet - problem.

When i start cman (the config files are equal on both servers) one starts without any error, on the second one i got this message:

root@domainc02:/etc# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Relax-NG validity error : Extra element fence in interleave
tempfile:4: element clusternodes: Relax-NG validity error : Element clusternode failed to validate content
tempfile:5: element clusternode: Relax-NG validity error : Element clusternodes has extra content: clusternode
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:21: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
Configuration fails to validate
Mar 07 10:09:36 corosync [MAIN ] Corosync Cluster Engine ('1.4.2'): started and ready to provide service.
Mar 07 10:09:36 corosync [MAIN ] Corosync built-in features: nss
Mar 07 10:09:36 corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Mar 07 10:09:36 corosync [MAIN ] Successfully parsed cman config
Mar 07 10:09:36 corosync [MAIN ] Successfully configured openais services to load
Mar 07 10:09:36 corosync [TOTEM ] Token Timeout (10000 ms) retransmit timeout (2380 ms)
Mar 07 10:09:36 corosync [TOTEM ] token hold (1894 ms) retransmits before loss (4 retrans)
Mar 07 10:09:36 corosync [TOTEM ] join (60 ms) send_join (0 ms) consensus (2000 ms) merge (200 ms)
Mar 07 10:09:36 corosync [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs)
Mar 07 10:09:36 corosync [TOTEM ] seqno unchanged const (30 rotations) Maximum network MTU 1402
Mar 07 10:09:36 corosync [TOTEM ] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Mar 07 10:09:36 corosync [TOTEM ] missed count const (5 messages)
Mar 07 10:09:36 corosync [TOTEM ] send threads (0 threads)
Mar 07 10:09:36 corosync [TOTEM ] RRP token expired timeout (2380 ms)
Mar 07 10:09:36 corosync [TOTEM ] RRP token problem counter (2000 ms)
Mar 07 10:09:36 corosync [TOTEM ] RRP threshold (10 problem count)
Mar 07 10:09:36 corosync [TOTEM ] RRP multicast threshold (100 problem count)
Mar 07 10:09:36 corosync [TOTEM ] RRP automatic recovery check timeout (1000 ms)
Mar 07 10:09:36 corosync [TOTEM ] RRP mode set to none.
Mar 07 10:09:36 corosync [TOTEM ] heartbeat_failures_allowed (0)
Mar 07 10:09:36 corosync [TOTEM ] max_network_delay (50 ms)
Mar 07 10:09:36 corosync [TOTEM ] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Mar 07 10:09:36 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Mar 07 10:09:36 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 07 10:09:36 corosync [IPC ] you are using ipc api v2
[ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]

And this worked fine but just only one time when i started again it gave me this, i dont think my config file is wrong, cos the other server will send error messages too.

Here is the cluster.conf file:

<?xml version="1.0"?>
<cluster name="domain_cluster" config_version="3">
<logging debug="on"/>
<clusternodes>
<clusternode name="domainc01" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="SCSI_fence" nodename="domainc01"/>
</method>
</fence>
</clusternode>
<clusternode name="domainc02" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="SCSI_fence" nodename="domainc02"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_scsi" name="SCSI_fence"/>
</fencedevices>
<rm>
<failoverdomains>
</failoverdomains>
</rm>
<dlm plock_ownership="1" plock_rate_limit="0"/>
<gfs_controld plock_rate_limit="0"/>
</cluster>

Any idea?

Thanks, Robert

phrantic · 03-09-2014, 04:39 PM

Hi

Seems to be an error in your cluster.conf file

Quote:

Starting cman... Relax-NG validity error : Extra element fence in interleave
tempfile:4: element clusternodes: Relax-NG validity error : Element clusternode failed to validate content
tempfile:5: element clusternode: Relax-NG validity error : Element clusternodes has extra content: clusternode
Relax-NG validity error : Extra element fencedevices in interleave

You have 2 nodes so it's a good practice to replace <cman/> with <cman expected_votes="1" two_node="1"/>
If you have installed ccs you could run ccs_config_validate

Thunderw · 03-10-2014, 03:49 AM

Hai,

It could not be a cluster.conf problem, since both config files are equal, srv1 starts the service without any error, just srv2 complains and says this error message :S