一 背景
   某一创业的朋友的主机因为磁盘阵列损坏机器crash,重启MySQL服务时 报如下错误:

  • InnoDB: Reading tablespace information from the .ibd files...
  • InnoDB: Restoring possible half-written data pages from the doublewrite
  • InnoDB: buffer...
  • InnoDB: Doing recovery: scanned up to log sequence number 9120034833
  • 150125 16:12:51 InnoDB: Starting an apply batch of log records to the database...
  • InnoDB: Progress in percents: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 150125 16:12:51 [ERROR] mysqld got signal 11 ;
  • This could be because you hit a bug. It is also possible that this binary
  • or one of the libraries it was linked against is corrupt, improperly built,
  • or misconfigured. This error can also be caused by malfunctioning hardware.
  • To report this bug, see http://kb.askmonty.org/en/reporting-bugs
  • We will try our best to scrape up some info that will hopefully help
  • diagnose the problem, but since we have already crashed,
  • something is definitely wrong and this may fail.
  • Server version: 5.5.37-MariaDB-log
  • key_buffer_size=268435456
  • read_buffer_size=1048576
  • max_used_connections=0
  • max_threads=1002
  • thread_count=0
  • It is possible that mysqld could use up to
  • key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332093 K bytes of memory
  • 41 Hope that.
  • 二 分析
        主要关注 mysqld got signal 11 的问题,从日志内容分析来看,数据库在机器crash 导致日志文件损坏,重启之后无法正常恢复,更无法正常对外提供服务。

    三 解决
        因为日志已经损坏,这里采用非常规手段,首先修改innodb_force_recovery参数,使mysqld跳过恢复步骤,将mysqld 启动,将数据导出来然后重建数据库。
    innodb_force_recovery可以设置为1-6,大的数字包含前面所有数字的影响。
      1. (SRV_FORCE_IGNORE_CORRUPT):忽略检查到的corrupt页。
      2. (SRV_FORCE_NO_BACKGROUND):阻止主线程的运行,如主线程需要执行full purge操作,会导致crash。
      3. (SRV_FORCE_NO_TRX_UNDO):不执行事务回滚操作。
      4. (SRV_FORCE_NO_IBUF_MERGE):不执行插入缓冲的合并操作。
      5. (SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交。
      6. (SRV_FORCE_NO_LOG_REDO):不执行前滚的操作。
    注意 
      a 当设置参数值大于0后,可以对表进行select,create,drop操作,但insert,update或者delete这类操作是不允许的。
      b 当innodb_purge_threads 和 innodb_force_recovery一起设置会出现一种loop现象:   

  • 150125 17:07:42 InnoDB: Waiting for the background threads to start
  • 150125 17:07:43 InnoDB: Waiting for the background threads to start
  • 150125 17:07:44 InnoDB: Waiting for the background threads to start
  • 150125 17:07:45 InnoDB: Waiting for the background threads to start
  • 150125 17:07:46 InnoDB: Waiting for the background threads to start
  • 150125 17:07:47 InnoDB: Waiting for the background threads to start
  • 在my.cnf中修改以下两个参数
    innodb_force_recovery=6
    innodb_purge_thread=0

    重启MySQL 

  • 150125 17:10:47 [Note] Crash recovery finished.
  • 150125 17:10:47 [Note] Server socket created on IP: '0.0.0.0'.
  • 150125 17:10:47 [Note] Event Scheduler: Loaded 0 events
  • 150125 17:10:47 [Note] /vdata/webserver/mysql/bin/mysqld: ready for connections.
  • Version: '5.5.37-MariaDB-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution
  • 立即对数据库做逻辑导出 ,完成之后将innodb_force_recovery设置为0 ,innodb_purge_thread=1 ,然后重建数据库 。
    另外 MySQL 版本 5.5以及之前 ,当innodb_purge_threads =1,innodb_force_recovery >1 的情况会出现上文提到的循环报warning 问题(=1 没有问题),
    原因:
    MySQL 的源代码中显示  当innodb_purge_threads 和 innodb_force_recovery一起设置会出现loop循环

  • while (srv_shutdown_state == SRV_SHUTDOWN_NONE) {
  •       if (srv_thread_has_reserved_slot(SRV_MASTER) == ULINT_UNDEFINED
  •           || (srv_n_purge_threads == 1
  •           && srv_thread_has_reserved_slot(SRV_WORKER)
  •           == ULINT_UNDEFINED)) {
  •           ut_print_timestamp(stderr);
  •           fprintf(stderr, " InnoDB: Waiting for the background threads to start\n");
  •           os_thread_sleep(1000000);
  •       } else {
  •           break;
  •       }
  •   }
  • 所以当需要设置innodb_force_recovery>1的时候需要关闭 innodb_purge_threads,设置为0(默认)。

    四 小结 
       MySQL crash 或者 MySQL 数据库服务器 crash 会导致各种各样的问题 ,比如主备之间的error 1594 (5.6 版本开启crash-safe ,会最大程度上避免 error 1594的问题,以后会写5.6新特性介绍该功能 ),error 1236 日志损坏数据文件损坏 ,等等,本案例只是其中的一种,细心从日志中找的相关错误提示,逐步解决即可。

    相关文章: