MariaDB 在开始复制时停止答案

【问题标题】：MariaDB down on start replicationMariaDB 在开始复制时停止
【发布时间】：2021-06-21 05:46:02
【问题描述】：

标签：mariadb、mysql、数据库复制我已将带有复制数据库的 MariaDB 服务器从 10.3.12 升级到 10.3.29。当我开始复制时出现错误：

2021-06-21 07:09:32 0x7f77400ab700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.29/storage/innobase/row/row0ins.cc line 221
InnoDB: Failing assertion: !cursor->index->is_committed()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
2021-06-21  7:09:32 0 [ERROR] InnoDB: Unable to find a record to delete-mark
210621  7:09:32 [ERROR] mysqld got signal 6 ;
InnoDB: tuple This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

DATA TUPLE: 2 fields;
To report this bug, see https://mariadb.com/kb/en/reporting-bugs

 0:We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

Server version: 10.3.29-MariaDB-log
 SQL NULLkey_buffer_size=16777216
;read_buffer_size=2097152

max_used_connections=1
 1:max_threads=502
 len 4; hex thread_count=9
80It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17505319 K  bytes of memory
42Hope that's ok; if not, decrease some variables in the equation.

fdThread pointer: 0x7f54b40012a8
2dAttempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
; asc  B -;;

重启mysqld后显示SLAVE STATUS

*************************** 1. row ***************************
                Slave_IO_State: 
                   Master_Host: eggplant.***
                   Master_User: replication
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: eggplant-bin.031680
           Read_Master_Log_Pos: 542851065
                Relay_Log_File: mysql-relay-bin.000002
                 Relay_Log_Pos: 306
         Relay_Master_Log_File: eggplant-bin.031675
              Slave_IO_Running: No
             Slave_SQL_Running: No
               Replicate_Do_DB: 
           Replicate_Ignore_DB: ***
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 4
               Relay_Log_Space: 74965917
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: NULL
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 0
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: 
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)

显示主状态（在副本上）

+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mango-bin.000002 |      328 |              |                  |
+------------------+----------+--------------+------------------+

我尝试设置 GTID 或 binlog 文件和位置。我尝试更改master（版本10.3.12或级联超过10.3.29）

但当我设置 log-slave-updates = 1 时，复制开始时不会出错。

【问题讨论】：

你说"replicated database from 10.3.12 to 10.3.29"并且你发布的错误信息来自10.3.29。在这两者之间，哪一个是主人，哪一个是奴隶？当您说“当我开始复制时出现错误..”时，您的意思是在 MariaDB 服务启动或错误（或 MariaDB 崩溃）后使用 START SLAVE 命令启动它吗？你启动服务了吗？
这是来自副本的日志。副本的版本是 10.3.29 主：10.3.12 另一个主：10.3.29。我在没有开始复制的情况下启动服务，因为我有选项 skip-slave-start。当我开始使用 START SLAVE 时出现错误并重新启动服务。
所以一个副本有两个主（主），对吗？ Something like this?
基本上是一个master（版本10.3.12），scheme：master (10.3.12)->slave (10.3.29)我刚刚尝试了另一个master，版本为10.3.29，scheme：master (10.3.12)->New master (10.3.29)->Replica (10.3.29)
所以在“.. New master (10.3.29)->Replica (10.3.29)”之间，哪一个崩溃了，哪一个你用log-slave-updates = 1设置，使复制成功启动?

标签： mysql mariadb database-replication

【解决方案1】：

MariaDB 代码中的断言甚至是代码开发人员的意外事件。这些应该以bug reports 的形式存在，供开发人员修复。在实施适当的修复之前，有时可能会有变通方法。

这个断言的可能情况是MDEV-22373。这已在 10.3.28 中修复，但在此之前，表的二级索引中存在损坏。我假设你在升级之前有一个以前的版本作为副本。

不幸的是，要纠正此错误，需要删除并重新创建副本上的二级索引。

这是在 MDEV-24449 中修复的 MariaDB 特定回归的可能性要低一个数量级，这需要完整的逻辑转储和重新加载。

【讨论】：

mysqlcheck 可以显示索引损坏的表吗？我可以通过命令 ALTER TABLE db.table ENGINE=InnoDB; 重新创建索引吗？或者我需要 DROP/CREATE INDEX？
mysqlcheck 是不够的。我认为改变表 ENGINE=InnoDB 应该就足够了。