LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 04-30-2015, 06:05 AM   #1
circus78
Member
 
Registered: Dec 2011
Posts: 273

Rep: Reputation: Disabled
Issue with 2 node cluster mariadb


Hi,
after a crash, I am not able to start my second node (cluster has 2 mariadb hosts).
Fortunately, first node is running fine with all data available.

When I try to start mysql on node 2, the error I get is:


Code:
150430 12:55:48  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
150430 12:55:48  InnoDB: Error: page 7 log sequence number 565808461991
InnoDB: is in the future! Current system log sequence number 565772838412.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: for more information.
150430 12:55:48  InnoDB: Error: page 2 log sequence number 565929024045
InnoDB: is in the future! Current system log sequence number 565772838412.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: for more information.
..
..
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150430 12:55:48 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.42-MariaDB-1~wheezy-wsrep-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=0
max_threads=4098
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 25383890 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f3cc9d0c53b]
/usr/sbin/mysqld(handle_fatal_signal+0x422)[0x7f3cc993a4f2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f3cc900f0a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f3cc786c165]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x180)[0x7f3cc786f3e0]
/usr/sbin/mysqld(+0x73b5dc)[0x7f3cc9b795dc]
/usr/sbin/mysqld(+0x709617)[0x7f3cc9b47617]
/usr/sbin/mysqld(+0x70a41a)[0x7f3cc9b4841a]
/usr/sbin/mysqld(+0x6f25d8)[0x7f3cc9b305d8]
/usr/sbin/mysqld(+0x6c4733)[0x7f3cc9b02733]
/usr/sbin/mysqld(+0x6b72d2)[0x7f3cc9af52d2]
/usr/sbin/mysqld(+0x6b812c)[0x7f3cc9af612c]
/usr/sbin/mysqld(+0x6bad41)[0x7f3cc9af8d41]
/usr/sbin/mysqld(+0x6a3db5)[0x7f3cc9ae1db5]
/usr/sbin/mysqld(+0x659d1f)[0x7f3cc9a97d1f]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x7f3cc993ce78]
/usr/sbin/mysqld(+0x3bb550)[0x7f3cc97f9550]
/usr/sbin/mysqld(_Z11plugin_initPiPPci+0x652)[0x7f3cc97fa592]
/usr/sbin/mysqld(+0x32b118)[0x7f3cc9769118]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x2036)[0x7f3cc976b9b6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f3cc7858ead]
/usr/sbin/mysqld(+0x321a4d)[0x7f3cc975fa4d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.'

I tried to remove grastate.dat and start again, but with same result.
Any suggestion?
Thank you!
 
Old 04-30-2015, 03:00 PM   #2
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
It will be good if you can share more details about the setup. Are both nodes referring to same shared location? Are they referring to same database and you are trying to say one host does not report any problem and other host report issues with database? I doubt that is the case but just to confirm. Which filesystem you are using for hosting database?

In the error log it clearly says your database has been corrupted due to crash. Did you try restoring the database on this machine from the backup?
 
Old 05-01-2015, 03:02 AM   #3
circus78
Member
 
Registered: Dec 2011
Posts: 273

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by T3RM1NVT0R View Post
It will be good if you can share more details about the setup. Are both nodes referring to same shared location? Are they referring to same database and you are trying to say one host does not report any problem and other host report issues with database? I doubt that is the case but just to confirm. Which filesystem you are using for hosting database?

In the error log it clearly says your database has been corrupted due to crash. Did you try restoring the database on this machine from the backup?
Hi,
yes both nodes are in same network. I confirm that only node two has problem, node one is working fine. It's a cluster, so they have same databases obviously.
Filesystem is ext4 on both nodes.
I take backup every night from two nodes with mysqldump.
I didn't try to restore backup. I don't know what would be happen with commands like "DROP TABLE IF EXISTS..." on node one!
 
Old 05-01-2015, 03:12 AM   #4
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
As you are using ext4 filesystem it has to be active passive cluster. Did you try shuting down both nodes and then restarting node2 first and then node1? Did you notice any difference between node1 and node2 in terms of the way they are trying to access the database post crash? Since one node is not reporting any issue with the database and another node is then there has to be some difference in the way they are looking / accessing the database.

Do you see any error message in /var/log/messages or cluster log when you try to load resource on node2?

Since it is a production environment I would suggest scheduling a downtime before doing any testing as it might be possible that during testing you might end up with neither of the nodes able to access the database.
 
Old 05-01-2015, 08:30 AM   #5
circus78
Member
 
Registered: Dec 2011
Posts: 273

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by T3RM1NVT0R View Post
As you are using ext4 filesystem it has to be active passive cluster. Did you try shuting down both nodes and then restarting node2 first and then node1? Did you notice any difference between node1 and node2 in terms of the way they are trying to access the database post crash? Since one node is not reporting any issue with the database and another node is then there has to be some difference in the way they are looking / accessing the database.

Do you see any error message in /var/log/messages or cluster log when you try to load resource on node2?

Since it is a production environment I would suggest scheduling a downtime before doing any testing as it might be possible that during testing you might end up with neither of the nodes able to access the database.
Hi T3RM1NVT0R,
I didn't try to start node2 first, because it's almost empty:

NODE1
# du -h /var/lib/mysql/
..
170 GB


NODE2
# du -h /var/lib/mysql/
..
2.0 GB

I noticed in mysql_error.log (node2) this kind of messages:

Code:
150430 12:23:55 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 140979342)
150430 12:23:55 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [INFO] Proceeding with SST (20150430 12:23:56.583)
WSREP_SST: [INFO] Cleaning the existing datadir (20150430 12:23:56.585)
removed `/var/lib/mysql/owncloud/oc_filecache.ibd'
removed `/var/lib/mysql/owncloud/oc_appconfig.frm'
removed `/var/lib/mysql/owncloud/oc_groups.frm'
removed `/var/lib/mysql/owncloud/oc_jobs.frm'
removed `/var/lib/mysql/owncloud/oc_group_user.frm'
removed `/var/lib/mysql/owncloud/oc_mimetypes.ibd'
removed `/var/lib/mysql/owncloud/oc_share_external.ibd'
..
..
So, basically I think that node2 is trying to recovery all databases from "scratch".
Anyway, when node2 starts, I get the erros you can see on my first post ("log sequence in the future").
 
Old 05-01-2015, 12:04 PM   #6
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
I am still not sure about the setup, how did you configure the resource group and service group? Let me make it simple, how you have configure the cluster so that if any node goes down the other node will take over. It will be good if you explain it in a bit detail.

From the output of df you shared the data is gone off /var/lib/mysql/ on node2 and I doubt it will be recovered by MySQL, the best option is to restore it from the backup.
 
Old 05-04-2015, 04:11 AM   #7
circus78
Member
 
Registered: Dec 2011
Posts: 273

Original Poster
Rep: Reputation: Disabled
Hi,
I confirm that I solved this problem by removing ALL content in /var/lib/mysql, and restart daemon.
It took a long time, but sync has been completed succesfully (about 180 GB of MySQL data).
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mysqld node of mysql cluster system not connecting to management node coal-fire-ice Linux - Server 1 07-27-2015 08:33 AM
MariaDB Galera Cluster Starting Issue blakk Red Hat 2 03-05-2015 02:17 PM
rhel 2 node cluster: resources do not move to second node when node halted tapuhi Linux - Software 1 03-01-2015 10:41 AM
Beowulf Cluster Node boot issue 300cpilot Red Hat 0 09-13-2012 12:49 PM
[SOLVED] KSH script behaving differently on an HACMP cluster node (prod) & a single node (UAT) mufy Programming 5 01-03-2011 02:08 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 02:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration