[SOLVED] Issue with 2 node cluster mariadb

circus78 · 04-30-2015, 06:05 AM

Hi,
after a crash, I am not able to start my second node (cluster has 2 mariadb hosts).
Fortunately, first node is running fine with all data available.

When I try to start mysql on node 2, the error I get is:

Code:

150430 12:55:48  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
150430 12:55:48  InnoDB: Error: page 7 log sequence number 565808461991
InnoDB: is in the future! Current system log sequence number 565772838412.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: for more information.
150430 12:55:48  InnoDB: Error: page 2 log sequence number 565929024045
InnoDB: is in the future! Current system log sequence number 565772838412.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: for more information.
..
..
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150430 12:55:48 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.42-MariaDB-1~wheezy-wsrep-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=0
max_threads=4098
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 25383890 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f3cc9d0c53b]
/usr/sbin/mysqld(handle_fatal_signal+0x422)[0x7f3cc993a4f2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f3cc900f0a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f3cc786c165]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x180)[0x7f3cc786f3e0]
/usr/sbin/mysqld(+0x73b5dc)[0x7f3cc9b795dc]
/usr/sbin/mysqld(+0x709617)[0x7f3cc9b47617]
/usr/sbin/mysqld(+0x70a41a)[0x7f3cc9b4841a]
/usr/sbin/mysqld(+0x6f25d8)[0x7f3cc9b305d8]
/usr/sbin/mysqld(+0x6c4733)[0x7f3cc9b02733]
/usr/sbin/mysqld(+0x6b72d2)[0x7f3cc9af52d2]
/usr/sbin/mysqld(+0x6b812c)[0x7f3cc9af612c]
/usr/sbin/mysqld(+0x6bad41)[0x7f3cc9af8d41]
/usr/sbin/mysqld(+0x6a3db5)[0x7f3cc9ae1db5]
/usr/sbin/mysqld(+0x659d1f)[0x7f3cc9a97d1f]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x7f3cc993ce78]
/usr/sbin/mysqld(+0x3bb550)[0x7f3cc97f9550]
/usr/sbin/mysqld(_Z11plugin_initPiPPci+0x652)[0x7f3cc97fa592]
/usr/sbin/mysqld(+0x32b118)[0x7f3cc9769118]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x2036)[0x7f3cc976b9b6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f3cc7858ead]
/usr/sbin/mysqld(+0x321a4d)[0x7f3cc975fa4d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.'

I tried to remove grastate.dat and start again, but with same result.
Any suggestion?
Thank you!

T3RM1NVT0R · 04-30-2015, 03:00 PM

It will be good if you can share more details about the setup. Are both nodes referring to same shared location? Are they referring to same database and you are trying to say one host does not report any problem and other host report issues with database? I doubt that is the case but just to confirm. Which filesystem you are using for hosting database?

In the error log it clearly says your database has been corrupted due to crash. Did you try restoring the database on this machine from the backup?

circus78 · 05-01-2015, 03:02 AM

Quote:

Originally Posted by T3RM1NVT0R

It will be good if you can share more details about the setup. Are both nodes referring to same shared location? Are they referring to same database and you are trying to say one host does not report any problem and other host report issues with database? I doubt that is the case but just to confirm. Which filesystem you are using for hosting database?

In the error log it clearly says your database has been corrupted due to crash. Did you try restoring the database on this machine from the backup?

Hi,
yes both nodes are in same network. I confirm that only node two has problem, node one is working fine. It's a cluster, so they have same databases obviously.
Filesystem is ext4 on both nodes.
I take backup every night from two nodes with mysqldump.
I didn't try to restore backup. I don't know what would be happen with commands like "DROP TABLE IF EXISTS..." on node one!

T3RM1NVT0R · 05-01-2015, 03:12 AM

As you are using ext4 filesystem it has to be active passive cluster. Did you try shuting down both nodes and then restarting node2 first and then node1? Did you notice any difference between node1 and node2 in terms of the way they are trying to access the database post crash? Since one node is not reporting any issue with the database and another node is then there has to be some difference in the way they are looking / accessing the database.

Do you see any error message in /var/log/messages or cluster log when you try to load resource on node2?

Since it is a production environment I would suggest scheduling a downtime before doing any testing as it might be possible that during testing you might end up with neither of the nodes able to access the database.

circus78 · 05-01-2015, 08:30 AM

Quote:

Originally Posted by T3RM1NVT0R

As you are using ext4 filesystem it has to be active passive cluster. Did you try shuting down both nodes and then restarting node2 first and then node1? Did you notice any difference between node1 and node2 in terms of the way they are trying to access the database post crash? Since one node is not reporting any issue with the database and another node is then there has to be some difference in the way they are looking / accessing the database.

Do you see any error message in /var/log/messages or cluster log when you try to load resource on node2?

Since it is a production environment I would suggest scheduling a downtime before doing any testing as it might be possible that during testing you might end up with neither of the nodes able to access the database.

Hi T3RM1NVT0R,
I didn't try to start node2 first, because it's almost empty:

NODE1
# du -h /var/lib/mysql/
..
170 GB

NODE2
# du -h /var/lib/mysql/
..
2.0 GB

I noticed in mysql_error.log (node2) this kind of messages:

Code:

150430 12:23:55 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 140979342)
150430 12:23:55 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [INFO] Proceeding with SST (20150430 12:23:56.583)
WSREP_SST: [INFO] Cleaning the existing datadir (20150430 12:23:56.585)
removed `/var/lib/mysql/owncloud/oc_filecache.ibd'
removed `/var/lib/mysql/owncloud/oc_appconfig.frm'
removed `/var/lib/mysql/owncloud/oc_groups.frm'
removed `/var/lib/mysql/owncloud/oc_jobs.frm'
removed `/var/lib/mysql/owncloud/oc_group_user.frm'
removed `/var/lib/mysql/owncloud/oc_mimetypes.ibd'
removed `/var/lib/mysql/owncloud/oc_share_external.ibd'
..
..

So, basically I think that node2 is trying to recovery all databases from "scratch".
Anyway, when node2 starts, I get the erros you can see on my first post ("log sequence in the future").

T3RM1NVT0R · 05-01-2015, 12:04 PM

I am still not sure about the setup, how did you configure the resource group and service group? Let me make it simple, how you have configure the cluster so that if any node goes down the other node will take over. It will be good if you explain it in a bit detail.

From the output of df you shared the data is gone off /var/lib/mysql/ on node2 and I doubt it will be recovered by MySQL, the best option is to restore it from the backup.

circus78 · 05-04-2015, 04:11 AM

Hi,
I confirm that I solved this problem by removing ALL content in /var/lib/mysql, and restart daemon.
It took a long time, but sync has been completed succesfully (about 180 GB of MySQL data).