LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   pacemaker - iscsi: how to set up iscsi targets/logical units? (http://www.linuxquestions.org/questions/linux-server-73/pacemaker-iscsi-how-to-set-up-iscsi-targets-logical-units-4175443286/)

eantoranz 12-28-2012 02:24 PM

pacemaker - iscsi: how to set up iscsi targets/logical units?
 
Hi!

I have this cluster configuration I'm using to do tests (everything on virtual machines... even the san).

I have a pacemaker configuration where I don't consider the target/LUs. I do login then the computer boots and then my configuration assumes that it will be there (and it has worked fairly well so far) however I would like to now add the target/LUs configuration in pacemaker as well.

The configuration is committed but I can't get it to start the pacemaker target resource:

Code:

Resource Group: sanos
    ip_flotante        (ocf::heartbeat:IPaddr2) Started
    san        (ocf::heartbeat:iSCSITarget) Stopped
    sanwwwsanos        (ocf::heartbeat:iSCSILogicalUnit) Stopped
    sandatapostgres    (ocf::heartbeat:iSCSILogicalUnit) Stopped
    sanwwwsesion      (ocf::heartbeat:iSCSILogicalUnit) Stopped
    datapostgres      (ocf::heartbeat:Filesystem) Stopped
    wwwsanos  (ocf::heartbeat:Filesystem) Stopped
    wwwsesion  (ocf::heartbeat:Filesystem) Stopped
    postgres  (lsb:postgresql-8.4) Stopped
    pgbouncer  (lsb:pgbouncer) Stopped
    apache    (lsb:apache2) Stopped

I'd like to know what's going on when it's trying to start the san resource but syslog doesn't provide that much information. When I try to start the san resource (crm resource san start), this is what I get:

Code:

Dec 28 15:48:38 cluster1 cibadmin: [3671]: info: Invoked: cibadmin -Ql -o resources
Dec 28 15:48:38 cluster1 cibadmin: [3673]: info: Invoked: cibadmin -p -R -o resources
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: - <cib admin_epoch="0" epoch="182" num_updates="2" >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -  <configuration >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -    <resources >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -      <group id="sanos" >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -        <primitive id="san" >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -          <meta_attributes id="san-meta_attributes" >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -            <nvpair value="Stopped" id="san-meta_attributes-target-role" />
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -          </meta_attributes>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -        </primitive>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -      </group>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -    </resources>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: -  </configuration>
Dec 28 15:48:38 cluster1 crmd: [823]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: - </cib>
Dec 28 15:48:38 cluster1 crmd: [823]: info: need_abort: Aborting on change to admin_epoch
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: + <cib admin_epoch="0" epoch="183" num_updates="1" >
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +  <configuration >
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +    <resources >
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_pe_invoke: Query 235: Requesting the current CIB: S_POLICY_ENGINE
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +      <group id="sanos" >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +        <primitive id="san" >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +          <meta_attributes id="san-meta_attributes" >
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +            <nvpair value="Started" id="san-meta_attributes-target-role" />
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +          </meta_attributes>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +        </primitive>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +      </group>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +    </resources>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: +  </configuration>
Dec 28 15:48:38 cluster1 cib: [819]: info: log_data_element: cib:diff: + </cib>
Dec 28 15:48:38 cluster1 cib: [819]: info: cib_process_request: Operation complete: op cib_replace for section resources (origin=local/cibadmin/2, version=0.183.1): ok (rc=0)
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_pe_invoke_callback: Invoking the PE: query=235, ref=pe_calc-dc-1356725918-121, seq=192, quorate=0
Dec 28 15:48:38 cluster1 pengine: [822]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 28 15:48:38 cluster1 pengine: [822]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Dec 28 15:48:38 cluster1 pengine: [822]: info: determine_online_status: Node cluster1 is online
Dec 28 15:48:38 cluster1 pengine: [822]: ERROR: unpack_rsc_op: Hard error - san:0_monitor_0 failed with rc=6: Preventing san:0 from re-starting anywhere in the cluster
Dec 28 15:48:38 cluster1 pengine: [822]: ERROR: unpack_rsc_op: Hard error - sanwwwsesion_monitor_0 failed with rc=6: Preventing sanwwwsesion from re-starting anywhere in the cluster
Dec 28 15:48:38 cluster1 pengine: [822]: ERROR: unpack_rsc_op: Hard error - sandatapostgres_monitor_0 failed with rc=6: Preventing sandatapostgres from re-starting anywhere in the cluster
Dec 28 15:48:38 cluster1 pengine: [822]: WARN: unpack_rsc_op: Processing failed op pgbouncer_monitor_0 on cluster1: unknown error (1)
Dec 28 15:48:38 cluster1 pengine: [822]: ERROR: unpack_rsc_op: Hard error - sanwwwsanos_monitor_0 failed with rc=6: Preventing sanwwwsanos from re-starting anywhere in the cluster
Dec 28 15:48:38 cluster1 pengine: [822]: ERROR: unpack_rsc_op: Hard error - san_monitor_0 failed with rc=6: Preventing san from re-starting anywhere in the cluster
Dec 28 15:48:38 cluster1 pengine: [822]: notice: group_print:  Resource Group: sanos
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      ip_flotante#011(ocf::heartbeat:IPaddr2):#011Started cluster1
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      san#011(ocf::heartbeat:iSCSITarget):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      sanwwwsanos#011(ocf::heartbeat:iSCSILogicalUnit):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      sandatapostgres#011(ocf::heartbeat:iSCSILogicalUnit):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      sanwwwsesion#011(ocf::heartbeat:iSCSILogicalUnit):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      datapostgres#011(ocf::heartbeat:Filesystem):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      wwwsanos#011(ocf::heartbeat:Filesystem):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      wwwsesion#011(ocf::heartbeat:Filesystem):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      postgres#011(lsb:postgresql-8.4):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      pgbouncer#011(lsb:pgbouncer):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: notice: native_print:      apache#011(lsb:apache2):#011Stopped
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: ip_flotante: Rolling back scores from san
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: san: Rolling back scores from sanwwwsanos
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource san cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: sanwwwsanos: Rolling back scores from sandatapostgres
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource sanwwwsanos cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: sandatapostgres: Rolling back scores from sanwwwsesion
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource sandatapostgres cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: sanwwwsesion: Rolling back scores from datapostgres
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource sanwwwsesion cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: datapostgres: Rolling back scores from wwwsanos
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource datapostgres cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: wwwsanos: Rolling back scores from wwwsesion
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource wwwsanos cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: wwwsesion: Rolling back scores from postgres
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource wwwsesion cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: postgres: Rolling back scores from pgbouncer
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource postgres cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_merge_weights: pgbouncer: Rolling back scores from apache
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource pgbouncer cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: info: native_color: Resource apache cannot run anywhere
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource ip_flotante#011(Started cluster1)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource san#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource sanwwwsanos#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource sandatapostgres#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource sanwwwsesion#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource datapostgres#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource wwwsanos#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource wwwsesion#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource postgres#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource pgbouncer#011(Stopped)
Dec 28 15:48:38 cluster1 pengine: [822]: notice: LogActions: Leave resource apache#011(Stopped)
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Dec 28 15:48:38 cluster1 crmd: [823]: info: unpack_graph: Unpacked transition 40: 0 actions in 0 synapses
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_te_invoke: Processing graph 40 (ref=pe_calc-dc-1356725918-121) derived from /var/lib/pengine/pe-input-327.bz2
Dec 28 15:48:38 cluster1 crmd: [823]: info: run_graph: ====================================================
Dec 28 15:48:38 cluster1 crmd: [823]: notice: run_graph: Transition 40 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-327.bz2): Complete
Dec 28 15:48:38 cluster1 crmd: [823]: info: te_graph_trigger: Transition 40 is now complete
Dec 28 15:48:38 cluster1 crmd: [823]: info: notify_crmd: Transition 40 status: done - <null>
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 28 15:48:38 cluster1 crmd: [823]: info: do_state_transition: Starting PEngine Recheck Timer
Dec 28 15:48:38 cluster1 cib: [3674]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-47.raw
Dec 28 15:48:38 cluster1 pengine: [822]: info: process_pe_message: Transition 40: PEngine Input stored in: /var/lib/pengine/pe-input-327.bz2
Dec 28 15:48:38 cluster1 cib: [3674]: info: write_cib_contents: Wrote version 0.183.0 of the CIB to disk (digest: 0364d6b6e5a2b2b40c5d9f0eddd87737)
Dec 28 15:48:38 cluster1 cib: [3674]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.V9P0Mi (digest: /var/lib/heartbeat/crm/cib.qKbFRh)

I can see this message:
Code:

san:0_monitor_0 failed with rc=6: Preventing san:0 from re-starting anywhere in the cluster
But what does that rc=6 mean for a ocf:heartbeat:iSCSITarget mean?

This is the definition of the resource:
Code:

primitive san ocf:heartbeat:iSCSITarget \
        params iqn="iqn.2012-12.san:disk1" portals="192.168.55.11" \
        meta target-role="Started"

These are ubuntu server 10.04

eantoranz 12-28-2012 02:27 PM

I started tcpdump checking for traffic to the SAN and there was none which makes me think that I'm either missing something in the configuration or a problem with the iSCSITarget script (or something along those lines).

eantoranz 12-28-2012 02:38 PM

In case it makes sense to the readers, from crm_mon:

Code:

Failed actions:
    san:0_monitor_0 (node=cluster1, call=26, rc=6, status=complete): not configured
    sanwwwsesion_monitor_0 (node=cluster1, call=30, rc=6, status=complete): not configured
    sandatapostgres_monitor_0 (node=cluster1, call=29, rc=6, status=complete): not configured
    pgbouncer_monitor_0 (node=cluster1, call=7, rc=1, status=complete): unknown error
    sanwwwsanos_monitor_0 (node=cluster1, call=28, rc=6, status=complete): not configured
    san_monitor_0 (node=cluster1, call=35, rc=6, status=complete): not configured


eantoranz 12-28-2012 03:03 PM

I hate it when things are hidden from me. Is it possible to manually call iSCSITarget (/usr/lib/ocf/resource.d/heartbeat/iSCSITarget) so that I could see what's going on?

When I call it directly without any params this is what I get:

Code:

/usr/lib/ocf/resource.d/heartbeat/iSCSITarget: line 32: /resource.d/heartbeat/.ocf-shellfuncs: No such file or directory
/usr/lib/ocf/resource.d/heartbeat/iSCSITarget: line 38: have_binary: command not found
/usr/lib/ocf/resource.d/heartbeat/iSCSITarget: line 40: have_binary: command not found
/usr/lib/ocf/resource.d/heartbeat/iSCSITarget: line 42: have_binary: command not found
/usr/lib/ocf/resource.d/heartbeat/iSCSITarget: line 506: ocf_log: command not found

Looks like there's some initialization missing but then what do I have to do to set the params and so on?

eantoranz 12-28-2012 03:08 PM

Just as I thought, I guess I'll find the way to call it manually:

Code:

root@cluster1:/usr/lib/ocf# export OCF_ROOT=$PWD
root@cluster1:/usr/lib/ocf# ./resource.d/heartbeat/iSCSITarget
iSCSITarget[4554]: ERROR: Unsupported iSCSI target implementation ""!

If I break it I'll let you know.

eantoranz 12-28-2012 03:11 PM

A good start

Code:

./resource.d/heartbeat/iSCSITarget meta-data

eantoranz 12-28-2012 03:42 PM

Ok.... first: Enable logging in corosync.conf when trying to solve this kinds of problems.... you will see error messages right there.

The target primitive has worked but now it's the LUNs that won't start. Let's see where I can get to.

eantoranz 12-28-2012 04:06 PM

Ok.... many steps closer:

Code:

OCF_RESKEY_target_iqn="iqn.2012-12.san:disk1" OCF_RESKEY_lun=1 OCF_RESKEY_path=/dev/sdc OCF_RESKEY_implementation="iet" ./resource.d/heartbeat/iSCSILogicalUnit start
iSCSILogicalUnit[7728]: DEBUG: Calling ietadm --op new --tid=1 --lun=1 --params Path=/dev/sdc,Type=fileio,ScsiId=default,ScsiSN=c21f969b,
iSCSILogicalUnit[7728]: ERROR: Called "ietadm --op new --tid=1 --lun=1 --params Path=/dev/sdc,Type=fileio,ScsiId=default,ScsiSN=c21f969b,"
iSCSILogicalUnit[7728]: ERROR: Exit code 255
iSCSILogicalUnit[7728]: ERROR: Command output: "Operation not permitted."
Operation not permitted.
iSCSILogicalUnit[7728]: DEBUG: default start : 1

What happened there?

eantoranz 01-02-2013 08:23 AM

I think I got it all wrong. This iSCSITarget/iSCSILogicalUnit stuff is used from the SAN and not the client/initiators, right? Damn!

What pacemaker primitive to I have to use to use on the client to share a san resource?

eantoranz 01-02-2013 08:38 AM

ocf:heartbeat:iscsi solved it.


All times are GMT -5. The time now is 10:59 PM.