LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   galera cluster 2nd does not start (https://www.linuxquestions.org/questions/linux-server-73/galera-cluster-2nd-does-not-start-4175615142/)

chtsalid 10-06-2017 03:07 AM

galera cluster 2nd does not start
 
Hi all,

I am running galera cluster on centos7 and my 2nd node fails to start.

1st node is running


[root@os-db1 ~]# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 638dd3db-9ecd-11e7-8c45-136f872211b0
seqno: -1
safe_to_bootstrap: 1
[root@os-db1 ~]systemctl status mariadbat
● mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Thu 2017-10-05 22:33:34 CEST; 5min ago
Process: 1884 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 1772 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 1770 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 1851 (mysqld)
Status: "Taking your SQL requests now..."
CGroup: /system.slice/mariadb.service
└─1851 /usr/sbin/mysqld --wsrep_start_position=638dd3db-9ecd-11e7-8c45-136f872211b0:1366

Oct 05 22:37:00 os-db1.lab.local mysqld[1851]: group UUID = 638dd3db-9ecd-11e7-8c45-136f872211b0
Oct 05 22:37:00 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:00 140605334218496 [Note] WSREP: Flow-control interval: [16, 16]
Oct 05 22:37:00 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:00 140605634771712 [Note] WSREP: New cluster view: global ...sion 3
Oct 05 22:37:00 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:00 140605634771712 [Note] WSREP: wsrep_notify_cmd is not d...ation.
Oct 05 22:37:00 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:00 140605634771712 [Note] WSREP: REPL Protocols: 7 (3, 2)
Oct 05 22:37:00 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:00 140605634771712 [Note] WSREP: Assign initial position f...ion: 3
Oct 05 22:37:00 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:00 140605392942848 [Note] WSREP: Service thread queue flushed.
Oct 05 22:37:04 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:04 140605342611200 [Note] WSREP: (75852029, 'tcp://0.0.0.0...1:4567
Oct 05 22:37:04 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:04 140605342611200 [Warning] WSREP: discarding established...:4567)
Oct 05 22:37:05 os-db1.lab.local mysqld[1851]: 2017-10-05 22:37:05 140605342611200 [Note] WSREP: cleaning up ed995d35 (tc...:4567)
Hint: Some lines were ellipsized, use -l to show in full.


however the 2nd one has still the problem



[root@os-db2 ~]# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 638dd3db-9ecd-11e7-8c45-136f872211b0
seqno: -1
safe_to_bootstrap: 1

[root@os-db2 ~]# systemctl status mariadb -l
● mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: exit-code) since Thu 2017-10-05 22:37:10 CEST; 3min 28s ago
Process: 1786 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Process: 1710 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 1708 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 1786 (code=exited, status=1/FAILURE)
Status: "MariaDB server is down"

Oct 05 22:36:57 os-db2.lab.local mysqld[1786]: 2017-10-05 22:36:57 140047168820992 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '10.0.10.21' --datadir '/var/lib/mysql/' --parent '1786' ''
Oct 05 22:36:57 os-db2.lab.local mysqld[1786]: Read: '(null)'
Oct 05 22:36:57 os-db2.lab.local mysqld[1786]: 2017-10-05 22:36:57 140047168820992 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '10.0.10.21' --datadir '/var/lib/mysql/' --parent '1786' '' : 2 (No such file or directory)
Oct 05 22:36:57 os-db2.lab.local mysqld[1786]: 2017-10-05 22:36:57 140047479429888 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
Oct 05 22:36:57 os-db2.lab.local mysqld[1786]: 2017-10-05 22:36:57 140047479429888 [ERROR] Aborting
Oct 05 22:37:10 os-db2.lab.local mysqld[1786]: Error in my_thread_global_end(): 1 threads didn't exit
Oct 05 22:37:10 os-db2.lab.local systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
Oct 05 22:37:10 os-db2.lab.local systemd[1]: Failed to start MariaDB database server.
Oct 05 22:37:10 os-db2.lab.local systemd[1]: Unit mariadb.service entered failed state.
Oct 05 22:37:10 os-db2.lab.local systemd[1]: mariadb.service failed.


cat /var/log/messages

Oct 5 22:41:33 os-db2 systemd: Starting Cleanup of Temporary Directories...
Oct 5 22:41:33 os-db2 systemd: Started Cleanup of Temporary Directories.
Oct 5 22:41:33 os-db2 systemd: Starting MariaDB database server...
Oct 5 22:41:33 os-db2 sh: 2017-10-05 22:41:33 139893839038720 [Note] /usr/sbin/mysqld (mysqld 10.1.26-MariaDB) starting as process 1853 ...
Oct 5 22:41:36 os-db2 sh: WSREP: Recovered position 638dd3db-9ecd-11e7-8c45-136f872211b0:1083
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] /usr/sbin/mysqld (mysqld 10.1.26-MariaDB) starting as process 1885 ...
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: Read nil XID from storage engines, skipping position init
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy <info@codership.com> loaded successfully.
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: CRC-32C: using hardware acceleration.
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: Found saved state: 638dd3db-9ecd-11e7-8c45-136f872211b0:-1, safe_to_bootsrap: 1
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.0.10.21; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ign
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: GCache history reset: old(638dd3db-9ecd-11e7-8c45-136f872211b0:0) -> new(638dd3db-9ecd-11e7-8c45-136f872211b0:1083)
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: Assign initial position for certification: 1083, protocol version: -1
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: wsrep_sst_grab()
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: Start replication
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: Setting initial position to 638dd3db-9ecd-11e7-8c45-136f872211b0:1083
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: protonet asio version 0
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: Using CRC-32C for message checksums.
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: backend: asio
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: gcomm thread scheduling priority set to other:0
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: restore pc from disk failed
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: GMCast version 0
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: (94ec7371, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: (94ec7371, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: EVS version 0
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: gcomm: connecting to group 'galera', peer '10.0.10.20:,10.0.10.21:'
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: (94ec7371, 'tcp://0.0.0.0:4567') connection established to 94ec7371 tcp://10.0.10.21:4567
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Warning] WSREP: (94ec7371, 'tcp://0.0.0.0:4567') address 'tcp://10.0.10.21:4567' points to own listening address, blacklisting
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: (94ec7371, 'tcp://0.0.0.0:4567') connection established to 75852029 tcp://10.0.10.20:4567
Oct 5 22:41:36 os-db2 mysqld: 2017-10-05 22:41:36 140106319649024 [Note] WSREP: (94ec7371, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: declaring 75852029 at tcp://10.0.10.20:4567 stable
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: Node 75852029 state prim
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: view(view_id(PRIM,75852029,6) memb {
Oct 5 22:41:37 os-db2 mysqld: 75852029,0
Oct 5 22:41:37 os-db2 mysqld: 94ec7371,0
Oct 5 22:41:37 os-db2 mysqld: } joined {
Oct 5 22:41:37 os-db2 mysqld: } left {
Oct 5 22:41:37 os-db2 mysqld: } partitioned {
Oct 5 22:41:37 os-db2 mysqld: })
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: save pc into disk
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: gcomm: connected
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: Opened channel 'galera'
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319649024 [Note] WSREP: Waiting for SST to complete.
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: STATE EXCHANGE: sent state msg: 9587a7c7-aa0d-11e7-b1c3-377185d9ee32
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: STATE EXCHANGE: got state msg: 9587a7c7-aa0d-11e7-b1c3-377185d9ee32 from 0 (galera1)
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: STATE EXCHANGE: got state msg: 9587a7c7-aa0d-11e7-b1c3-377185d9ee32 from 1 (galera2)
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: Quorum results:
Oct 5 22:41:37 os-db2 mysqld: version = 4,
Oct 5 22:41:37 os-db2 mysqld: component = PRIMARY,
Oct 5 22:41:37 os-db2 mysqld: conf_id = 5,
Oct 5 22:41:37 os-db2 mysqld: members = 1/2 (joined/total),
Oct 5 22:41:37 os-db2 mysqld: act_id = 1414,
Oct 5 22:41:37 os-db2 mysqld: last_appl. = -1,
Oct 5 22:41:37 os-db2 mysqld: protocols = 0/7/3 (gcs/repl/appl),
Oct 5 22:41:37 os-db2 mysqld: group UUID = 638dd3db-9ecd-11e7-8c45-136f872211b0
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: Flow-control interval: [23, 23]
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106019104512 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1414)
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319330048 [Note] WSREP: State transfer required:
Oct 5 22:41:37 os-db2 mysqld: Group state: 638dd3db-9ecd-11e7-8c45-136f872211b0:1414
Oct 5 22:41:37 os-db2 mysqld: Local state: 638dd3db-9ecd-11e7-8c45-136f872211b0:1083
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319330048 [Note] WSREP: New cluster view: global state: 638dd3db-9ecd-11e7-8c45-136f872211b0:1414, view# 6: Primary, number of nodes: 2, my index: 1, protocol version 3
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319330048 [Warning] WSREP: Gap in state sequence. Need state transfer.
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106010711808 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.0.10.21' --datadir '/var/lib/mysql/' --parent '1885' '' '
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106010711808 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '10.0.10.21' --datadir '/var/lib/mysql/' --parent '1885' ''
Oct 5 22:41:37 os-db2 mysqld: Read: '(null)'
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106010711808 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '10.0.10.21' --datadir '/var/lib/mysql/' --parent '1885' '' : 2 (No such file or directory)
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319330048 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
Oct 5 22:41:37 os-db2 mysqld: 2017-10-05 22:41:37 140106319330048 [ERROR] Aborting
Oct 5 22:41:51 os-db2 mysqld: Error in my_thread_global_end(): 1 threads didn't exit
Oct 5 22:41:51 os-db2 systemd: mariadb.service: main process exited, code=exited, status=1/FAILURE
Oct 5 22:41:51 os-db2 systemd: Failed to start MariaDB database server.
Oct 5 22:41:51 os-db2 systemd: Unit mariadb.service entered failed state.
Oct 5 22:41:51 os-db2 systemd: mariadb.service failed.

Many thanks!


All times are GMT -5. The time now is 01:48 PM.