LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   problems setting corosync/pacemaker to do virtual ip (http://www.linuxquestions.org/questions/linux-server-73/problems-setting-corosync-pacemaker-to-do-virtual-ip-4175431728/)

eantoranz 10-11-2012 03:52 PM

problems setting corosync/pacemaker to do virtual ip
 
Hi!

I'm giving corosync/pacemaker a try (after giving heartbeat/pacemaker a try). I want to do something simple as providing a virtual IP.

I have built/installed corosync from source (prefix for both is /usr/local/ha) and now would like to start the services to then do the pacemaker configuration. I'm working on a VM with ubuntu 10.04 installed on it (that's why I'm building from source in order to have the latest version of both).

If I start corosync, everything looks normal (though I'm not sure of how to make sure the node is up besides seeing the multicast messages in the network):

Code:

Oct 11 15:12:39 ha3 corosync[3719]:  [MAIN  ] Corosync Cluster Engine ('1.4.4'): started and ready to provide service.
Oct 11 15:12:39 ha3 corosync[3719]:  [MAIN  ] Corosync built-in features: nss
Oct 11 15:12:39 ha3 corosync[3719]:  [MAIN  ] Successfully read main configuration file '/usr/local/ha/etc/corosync/corosync.conf'.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Token Timeout (5000 ms) retransmit timeout (247 ms)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] token hold (187 ms) retransmits before loss (20 retrans)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] join (1000 ms) send_join (0 ms) consensus (7500 ms) merge (200 ms)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] seqno unchanged const (30 rotations) Maximum network MTU 1402
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] window size per rotation (50 messages) maximum messages per rotation (20 messages)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] missed count const (5 messages)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] send threads (0 threads)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] RRP token expired timeout (247 ms)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] RRP token problem counter (2000 ms)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] RRP threshold (10 problem count)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] RRP multicast threshold (100 problem count)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] RRP automatic recovery check timeout (1000 ms)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] RRP mode set to none.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] heartbeat_failures_allowed (0)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] max_network_delay (50 ms)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 11 15:12:39 ha3 corosync[3719]:  [IPC  ] you are using ipc api v2
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Receive multicast socket recv buffer size (225280 bytes).
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Transmit multicast socket send buffer size (225280 bytes).
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] The network interface [192.168.55.13] is now up.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Created or loaded sequence id c.192.168.55.13 for this ring.
Oct 11 15:12:39 ha3 corosync[3719]:  [pcmk  ] Logging: Initialized pcmk_startup
Oct 11 15:12:39 ha3 corosync[3719]:  [SERV  ] Service engine loaded: Pacemaker Cluster Manager 1.1.8
Oct 11 15:12:39 ha3 corosync[3719]:  [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Oct 11 15:12:39 ha3 corosync[3719]:  [SERV  ] Service engine loaded: corosync configuration service
Oct 11 15:12:39 ha3 corosync[3719]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Oct 11 15:12:39 ha3 corosync[3719]:  [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Oct 11 15:12:39 ha3 corosync[3719]:  [SERV  ] Service engine loaded: corosync profile loading service
Oct 11 15:12:39 ha3 corosync[3719]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Oct 11 15:12:39 ha3 corosync[3719]:  [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] entering GATHER state from 15.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Creating commit token because I am the rep.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Saving state aru 0 high seq received 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Storing new sequence id for ring 10
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] entering COMMIT state.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] got commit token
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] entering RECOVERY state.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] position [0] member 192.168.55.13:
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] previous ring seq c rep 192.168.55.13
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] aru 0 high delivered 0 received flag 1
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Did not need to originate any messages in recovery.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] got commit token
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Sending initial ORF token
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] install seq 0 aru 0 high seq received 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] install seq 0 aru 0 high seq received 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] install seq 0 aru 0 high seq received 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] install seq 0 aru 0 high seq received 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Resetting old ring state
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] recovery to regular 1-0
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering to app 1 to 0
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] This node is within the primary component and will provide service.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] entering OPERATIONAL state.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 0 to 1
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 1 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] confchg entries 1
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier Start Received From 221751488
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier completion status for nodeid 221751488 = 1.
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization barrier completed
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization actions starting for (dummy CLM service)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 1 to 2
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 2 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 1
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 2 to 3
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 3 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] confchg entries 1
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier Start Received From 221751488
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier completion status for nodeid 221751488 = 1.
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization barrier completed
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Committing synchronization for (dummy CLM service)
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization actions starting for (dummy AMF service)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 2
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 3
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 3 to 4
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 4 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] confchg entries 1
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier Start Received From 221751488
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier completion status for nodeid 221751488 = 1.
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization barrier completed
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Committing synchronization for (dummy AMF service)
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization actions starting for (dummy CKPT service)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 4
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 4 to 5
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 5 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] confchg entries 1
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier Start Received From 221751488
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier completion status for nodeid 221751488 = 1.
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization barrier completed
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Committing synchronization for (dummy CKPT service)
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization actions starting for (dummy EVT service)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 5 to 6
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 6 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 5
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 6 to 7
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 7 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 6
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 7
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 7 to 8
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 8 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] confchg entries 1
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier Start Received From 221751488
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier completion status for nodeid 221751488 = 1.
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization barrier completed
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Committing synchronization for (dummy EVT service)
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization actions starting for (corosync cluster closed process group service v1.01)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering 8 to a
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq 9 to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq a to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [CPG  ] comparing: sender r(0) ip(192.168.55.13) ; members(old:0 left:0)
Oct 11 15:12:39 ha3 corosync[3719]:  [CPG  ] chosen downlist: sender r(0) ip(192.168.55.13) ; members(old:0 left:0)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including 8
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering a to b
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq b to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] confchg entries 1
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier Start Received From 221751488
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Barrier completion status for nodeid 221751488 = 1.
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Synchronization barrier completed
Oct 11 15:12:39 ha3 corosync[3719]:  [SYNC  ] Committing synchronization for (corosync cluster closed process group service v1.01)
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] mcasted message added to pending queue
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including a
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering b to c
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] Delivering MCAST message with seq c to pending delivery queue
Oct 11 15:12:39 ha3 corosync[3719]:  [MAIN  ] Completed service synchronization, ready to provide service.
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including b
Oct 11 15:12:39 ha3 corosync[3719]:  [TOTEM ] releasing messages up to and including c

Then, when I start pacemaker I see this in corosync's log:
Code:

Oct 11 15:14:09 ha3 crmd[3967]:    error: crmd_ais_dispatch: Recieving messages from a node we think is dead: ha3[221751488]
Oct 11 15:14:09 ha3 crmd[3967]:    error: do_log: FSA: Input I_ERROR from check_dead_member() received in state S_STARTING
Oct 11 15:14:09 ha3 crmd[3967]:  warning: do_state_transition: State transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=check_dead_member ]
Oct 11 15:14:09 ha3 crmd[3967]:    error: do_recover: Action A_RECOVER (0000000001000000) not supported
Oct 11 15:14:09 ha3 crmd[3967]:    error: do_started: Start cancelled... S_RECOVERY
Oct 11 15:14:09 ha3 crmd[3967]:    error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
Oct 11 15:14:09 ha3 crmd[3967]:  notice: terminate_cs_connection: Disconnecting from Corosync
Oct 11 15:14:09 ha3 crmd[3967]:    error: do_exit: Could not recover from internal error
Oct 11 15:14:09 ha3 pacemakerd[3748]:    error: pcmk_child_exit: Child process crmd exited (pid=3967, rc=2)


eantoranz 10-11-2012 03:54 PM

By the way, ha3 is the node where I'm running the test... and it's the only node I'm running at the moment.

eantoranz 10-11-2012 03:55 PM

corosync.conf:

Code:

totem {
 
        version: 2
 
        # How long before declaring a token lost (ms)
        token:          5000
 
        # How many token retransmits before forming a new configuration
        token_retransmits_before_loss_const: 20
 
        # How long to wait for join messages in the membership protocol (ms)
        join:          1000
 
        # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
        consensus:      7500
 
        # Turn off the virtual synchrony filter
        vsftype:        none
 
        # Number of messages that may be sent by one processor on receipt of the token
        max_messages:  20
 
        # Disable encryption
        secauth:        off
 
        # How many threads to use for encryption/decryption
        threads:        0
 
        # Limit generated nodeids to 31-bits (positive signed integers)
        clear_node_high_bit: yes
 
        # Optionally assign a fixed node id (integer)
        # nodeid:        1234
 
        interface {
                ringnumber: 0
 
                # The following three values need to be set based on your environment
                bindnetaddr: 192.168.55.13
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
 }
 
 logging {
        fileline: off
        to_syslog: yes
        to_stderr: no
        syslog_facility: daemon
        debug: on
        timestamp: on
 }
 
 amf {
        mode: disabled
 }

/usr/local/ha/etc/corosync/service.d/pcmk
Code:

service {
        # Load the Pacemaker Cluster Resource Manager
        name: pacemaker
        ver:  1
}


eantoranz 10-11-2012 04:33 PM

By the way, when running crm_mon:

Code:

$ sudo ../../sbin/crm_mon -1
[sudo] password for cps:
Last updated: Thu Oct 11 16:01:49 2012
Last change: Thu Oct 11 11:36:56 2012
Current DC: NONE
0 Nodes configured, unknown expected votes
0 Resources configured.



All times are GMT -5. The time now is 04:32 AM.