We are administering a MySQL server for one of our customers that has >100 databases with about 50 tables each, many of them InnoDB tables. The server crashed and I'm trying to find the culprit. When restarting with innodb_force_recovery = 2
, I can connect and don't see any errors in the error.log. More importantly, mysqlcheck --all-databases
reports "Ok" for all tables. But when I remove innodb_force_recovery
, the server crashes again, writes a stack trace to error.log and can only be stopped with kill -9
.
How am I supposed to find the offending database under these circumstances and what would cause mysqlcheck to miss the corrupted tables? Please don't tell me to just ignore it and restore all databases from dumps. That might be acceptable for one or two databases and if it occurs only once in a blue moon, but I've had problems with the same server more than once and restoring everything from dumps just requires too much time and manual work to do that every time.
The server version is 5.5.46 and innodb_file_per_table
is active.
Excerpt of error.log as requested (does the The tablespace free space info is corrupt
mean there is an error that is not in a specific table and that can not be corrected?):
180222 17:13:48 mysqld_safe Starting mysqld daemon with databases from /home/mysql
180222 17:13:48 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
180222 17:13:48 [Note] /usr/libexec/mysqld (mysqld 5.5.46) starting as process 26242 ...
180222 17:13:48 [Note] Plugin 'FEDERATED' is disabled.
180222 17:13:48 InnoDB: The InnoDB memory heap is disabled
180222 17:13:48 InnoDB: Mutexes and rw_locks use InnoDB's own implementation
180222 17:13:48 InnoDB: Compressed tables use zlib 1.2.3
180222 17:13:48 InnoDB: Using Linux native AIO
180222 17:13:48 InnoDB: Initializing buffer pool, size = 128.0M
180222 17:13:49 InnoDB: Completed initialization of buffer pool
180222 17:13:49 InnoDB: highest supported file format is Barracuda.
180222 17:13:49 InnoDB: Waiting for the background threads to start
180222 17:13:50 InnoDB: 5.5.46 started; log sequence number 1632912830888
180222 17:13:50 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
180222 17:13:50 [Note] - '0.0.0.0' resolves to '0.0.0.0';
180222 17:13:50 [Note] Server socket created on IP: '0.0.0.0'.
180222 17:13:50 [Note] Event Scheduler: Loaded 0 events
180222 17:13:50 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.46' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL) by Remi
InnoDB: Dump of the tablespace extent descriptor: len 40; hex 000000000000000200000000061600000000126e00000004ffffffffffffffffffffffffffffbfaa; asc n ;
InnoDB: Serious error! InnoDB is trying to free page 512
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
InnoDB: You may need to dump your InnoDB tables and recreate the whole
InnoDB: database!
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180222 17:13:50 InnoDB: Assertion failure in thread 2499464080 in file fsp0fsp.c line 3309
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:13:50 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=0
max_threads=512
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1314506 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/libexec/mysqld(my_print_stacktrace+0x33)[0x842a1f3]
/usr/libexec/mysqld(handle_fatal_signal+0x42b)[0x82d9d3b]
[0x7bc420]
[0x7bc410]
/lib/libc.so.6(gsignal+0x50)[0x626b10]
/lib/libc.so.6(abort+0x101)[0x628421]
/usr/libexec/mysqld[0x85012e7]
/usr/libexec/mysqld[0x850147e]
/usr/libexec/mysqld[0x849c0b1]
/usr/libexec/mysqld[0x84a8a61]
/usr/libexec/mysqld[0x8561fef]
/usr/libexec/mysqld[0x85570a9]
/usr/libexec/mysqld[0x847b082]
/usr/libexec/mysqld[0x846bf04]
/usr/libexec/mysqld[0x846dad4]
/lib/libpthread.so.0[0x50d912]
/lib/libc.so.6(clone+0x5e)[0x6d347e]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
180222 17:13:50 mysqld_safe Number of processes running now: 0
180222 17:13:50 mysqld_safe mysqld restarted
The error message itself is telling you why verifying the table data doesn't fix the issue. It tried to free a page (probably from a table or index); but that page was already marked free. In other words, one of your tables or indexes was using a page, when the page is considered available to give to another table or index. And, obviously, if InnoDB doesn't know what pages are actually free, Bad Things Can Happen.
Dumping all the table/index data, and reloading, means giving InnoDB a chance to rebuild its collection of free pages. Ideally, you'd do this on a fresh database. Why? Well, you should never assume there's only one, single, solitary corruption error. So if you move the data to a fresh install, you don't have to worry if there were additional, undetected, corruption issues.
As suggested, dumping and restoring from those dumps was the way I eventually chose to restore all databases. Luckily,
innodb_force_recovery = 2
allowed me to dump everything without errors, so that I didn't have to use dumps from backups. Of course I would rather have found out the real cause of the error, but MySQL offers no help apart from some mentions of "The tablespace free space info is corrupt" in bug reports. Without identifying and removing the cause of the error, I expect it to reoccur eventually - our customer will probably be even more pissed than I am.Maybe bad hardware is to blame, but the SMART data of all disks in the system look good and
/var/log/messages
does not contain anything suspicious at the time of the crash. There wasn't an unexpected power loss or reboot either.