找回密码
 立即注册
首页 业界区 安全 终止分区表变更操作时误删数据字典缓存导致MySQL崩溃分 ...

终止分区表变更操作时误删数据字典缓存导致MySQL崩溃分析

幽淆 2025-9-24 15:45:24
终止分区表变更操作时误删数据字典缓存导致MySQL崩溃分析

1. 问题简述

在 MySQL 中,当终止一个处于 committing alter table to storage engine 阶段的分区表操作时,InnoDB 会尝试进行回滚并清理数据字典缓存。不幸的是,过程中发生了误删表缓存对象的情况 —— InnoDB 错误地移除了另一张非目标表的缓存条目,导致引用计数不为 0,触发断言失败并导致 MySQL 崩溃。
2. 复现步骤

环境说明:
系统:CentOS 7
数据库:MySQL 8.0.32
2.1 建表准备
  1. CREATE DATABASE TEST;
  2. CREATE TABLE TEST.A ( X INT)
  3. PARTITION BY RANGE (X) (
  4.     PARTITION P0 VALUES LESS THAN (10000),
  5.     PARTITION PMAX VALUES LESS THAN MAXVALUE
  6. );
  7. CREATE TABLE TEST.A_1 LIKE TEST.A;
  8. SELECT COUNT(*) FROM TEST.A_1;
复制代码
2.2 反复尝试终止 ALTER TABLE 操作

在一个 Shell 中持续执行终止 "committing alter table to storage engine" 状态(KILL相应SQL请求)的线程:
  1. while true; do  {   mysql --login-path=mylogin -BNe  'SELECT CONCAT("KILL ",ID ,";") FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE = "COMMITTING ALTER TABLE TO STORAGE ENGINE";' | mysql --login-path=mylogin  -vvv ; } ; done
复制代码
2.3 循环执行 ALTER 操作

在另一个 Shell 中循环执行添加与删除分区的操作:
  1. while true; do  {   mysql --login-path=mylogin -BNe  "ALTER TABLE test.a ADD PARTITION (PARTITION pmax VALUES LESS THAN MAXVALUE);" ;   mysql --login-path=mylogin -BNe " ALTER TABLE test.a DROP PARTITION pmax;" ; }; done
复制代码
3. 崩溃日志与原因分析

当 DROP PARTITION 操作在关键阶段被 KILL 时,崩溃发生:
  1. 2025-06-05T17:03:19.270698+08:00 2975 [ERROR] [MY-013183] [InnoDB] Assertion failure: dict0dict.cc:1885:table->get_ref_count() == 0 thread 140327459395328
  2. InnoDB: We intentionally generate a memory trap.
  3. InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
  4. InnoDB: If you get repeated assertion failures or crashes, even
  5. InnoDB: immediately after the mysqld startup, there may be
  6. InnoDB: corruption in the InnoDB tablespace. Please refer to
  7. InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
  8. InnoDB: about forcing recovery.
  9. 2025-06-05T09:03:19Z UTC - mysqld got signal 6 ;
  10. Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
  11. BuildID[sha1]=7afc1fad28c808c287fa7599451d3355e1b3be73
  12. Thread pointer: 0x7fa054000f40
  13. Attempting backtrace. You can use the following information to find out
  14. where mysqld died. If you see no messages after this, something went
  15. terribly wrong...
  16. stack_bottom = 7fa0885e8a30 thread_stack 0x100000
  17. /usr/local/mysql-debug/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x48da47f]
  18. /usr/local/mysql-debug/bin/mysqld(print_fatal_signal(int)+0x3a2) [0x34cf2af]
  19. /usr/local/mysql-debug/bin/mysqld(my_server_abort()+0x6b) [0x34cf561]
  20. /usr/local/mysql-debug/bin/mysqld(my_abort()+0xd) [0x48d0eaf]
  21. /usr/local/mysql-debug/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4d9b051]
  22. /usr/local/mysql-debug/bin/mysqld() [0x4f15f0b]
  23. /usr/local/mysql-debug/bin/mysqld(dict_table_remove_from_cache(dict_table_t*)+0x1d) [0x4f166cd]
  24. /usr/local/mysql-debug/bin/mysqld(dict_partitioned_table_remove_from_cache(char const*)+0x159) [0x4f16829]
  25. /usr/local/mysql-debug/bin/mysqld() [0x4a07dc2]
  26. /usr/local/mysql-debug/bin/mysqld() [0x337ee49]
  27. /usr/local/mysql-debug/bin/mysqld(mysql_alter_table(THD*, char const*, char const*, HA_CREATE_INFO*, Table_ref*, Alter_info*)+0x3bf6) [0x338a12a]
  28. /usr/local/mysql-debug/bin/mysqld(Sql_cmd_alter_table::execute(THD*)+0x5d4) [0x39af486]
  29. /usr/local/mysql-debug/bin/mysqld(mysql_execute_command(THD*, bool)+0x54c8) [0x32a5677]
  30. /usr/local/mysql-debug/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x756) [0x32a785f]
  31. /usr/local/mysql-debug/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x15a8) [0x329d7c3]
  32. /usr/local/mysql-debug/bin/mysqld(do_command(THD*)+0x5bd) [0x329b853]
  33. /usr/local/mysql-debug/bin/mysqld() [0x34ba2df]
  34. /usr/local/mysql-debug/bin/mysqld() [0x51ed236]
  35. /lib64/libpthread.so.0(+0x7ea5) [0x7fa13cf41ea5]
  36. /lib64/libc.so.6(clone+0x6d) [0x7fa13b55bb0d]
  37. Trying to get some variables.
  38. Some pointers may be invalid and cause the dump to abort.
  39. Query (7fa054010390): ALTER TABLE test.a DROP PARTITION pmax
  40. Connection ID (thread ID): 2975
  41. Status: KILL_CONNECTION
复制代码
堆栈信息
  1. (gdb) bt
  2. #0  0x00007fa13cf46aa1 in pthread_kill () from /lib64/libpthread.so.0
  3. #1  0x00000000048da549 in my_write_core (sig=6) at /software/db/mysql-8.0.32/mysys/stacktrace.cc:295
  4. #2  0x00000000034cf4ec in handle_fatal_signal (sig=6) at /software/db/mysql-8.0.32/sql/signal_handler.cc:230
  5. #3  <signal handler called>
  6. #4  0x00007fa13b493387 in raise () from /lib64/libc.so.6
  7. #5  0x00007fa13b494a78 in abort () from /lib64/libc.so.6
  8. #6  0x00000000034cf671 in my_server_abort () at /software/db/mysql-8.0.32/sql/signal_handler.cc:286
  9. #7  0x00000000048d0eaf in my_abort () at /software/db/mysql-8.0.32/mysys/my_init.cc:258
  10. #8  0x0000000004d9b051 in ut_dbg_assertion_failed (expr=0x6d821d3 "table->get_ref_count() == 0",
  11.     file=0x6d81ab8 "/software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc", line=1885) at /software/db/mysql-8.0.32/storage/innobase/ut/ut0dbg.cc:99
  12. #9  0x0000000004f15f0b in dict_table_remove_from_cache_low (table=0x7fa0445c6e08, lru_evict=false)
  13.     at /software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc:1885
  14. #10 0x0000000004f166cd in dict_table_remove_from_cache (table=0x7fa0445c6e08) at /software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc:1969
  15. #11 0x0000000004f16829 in dict_partitioned_table_remove_from_cache (name=0x7fa0885e2bd0 "test/a")
  16.     at /software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc:1999
  17. #12 0x0000000004a07dc2 in innobase_dict_cache_reset (schema_name=0x7fa054010868 "test", table_name=0x7fa054010880 "a")
  18.     at /software/db/mysql-8.0.32/storage/innobase/handler/ha_innodb.cc:3991
  19. #13 0x000000000337ee49 in mysql_inplace_alter_table (thd=0x7fa054000f40, schema=..., new_schema=..., table_def=0x7fa0445af410,
  20.     altered_table_def=0x7fa054012d60, table_list=0x7fa054011020, table=0x0, altered_table=0x7fa05403e230, ha_alter_info=0x7fa0885e3710,
  21.     inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, alter_ctx=0x7fa0885e4630, columns=std::set with 0 elements, fk_key_info=0x7fa05403b720,
  22.     fk_key_count=0, fk_invalidator=0x7fa0885e4560) at /software/db/mysql-8.0.32/sql/sql_table.cc:13744
  23. #14 0x000000000338a12a in mysql_alter_table (thd=0x7fa054000f40, new_db=0x7fa054010868 "test", new_name=0x0, create_info=0x7fa0885e6170,
  24.     table_list=0x7fa054011020, alter_info=0x7fa0885e6000) at /software/db/mysql-8.0.32/sql/sql_table.cc:17405
  25. #15 0x00000000039af486 in Sql_cmd_alter_table::execute (this=0x7fa054011670, thd=0x7fa054000f40) at /software/db/mysql-8.0.32/sql/sql_alter.cc:349
  26. #16 0x00000000032a5677 in mysql_execute_command (thd=0x7fa054000f40, first_level=true) at /software/db/mysql-8.0.32/sql/sql_parse.cc:4688
  27. #17 0x00000000032a785f in dispatch_sql_command (thd=0x7fa054000f40, parser_state=0x7fa0885e7910) at /software/db/mysql-8.0.32/sql/sql_parse.cc:5322
  28. #18 0x000000000329d7c3 in dispatch_command (thd=0x7fa054000f40, com_data=0x7fa0885e8a00, command=COM_QUERY)
  29.     at /software/db/mysql-8.0.32/sql/sql_parse.cc:2036
  30. #19 0x000000000329b853 in do_command (thd=0x7fa054000f40) at /software/db/mysql-8.0.32/sql/sql_parse.cc:1439
  31. #20 0x00000000034ba2df in handle_connection (arg=0xbe191d0) at /software/db/mysql-8.0.32/sql/conn_handler/connection_handler_per_thread.cc:302
  32. #21 0x00000000051ed236 in pfs_spawn_thread (arg=0xbe23890) at /software/db/mysql-8.0.32/storage/perfschema/pfs.cc:2986
  33. #22 0x00007fa13cf41ea5 in start_thread () from /lib64/libpthread.so.0
  34. #23 0x00007fa13b55bb0d in clone () from /lib64/libc.so.6
复制代码
崩溃核心栈:
  1. dict_partitioned_table_remove_from_cache
  2. -> strncmp("test/a", "test/a_1#p#p0", 6) == 0  // 误判为匹配
  3. -> dict_table_remove_from_cache() 导致 ref_count != 0 触发断言
复制代码
GDB 调试关键值:
  1. (gdb) p name
  2. $1 = "test/a"
  3. (gdb) p prev_table->name.m_name
  4. $2 = "test/a_1#p#p0"
  5. (gdb) p strncmp("test/a", "test/a_1#p#p0", 6)
  6. $3 = 0   // 返回0,误认为匹配
复制代码
由于 "test/a" 是 "test/a_1#p#p0" 的前缀,使用 strncmp(name, m_name, name_len) 判断时发生误判,错误地将 test.a_1 的分区子表从缓存中移除。
4. 关键源码解读

崩溃函数在 InnoDB 源码中的定义如下(文件 dict0dict.cc):
  1. void dict_partitioned_table_remove_from_cache(const char *name) {
  2.   ut_ad(dict_sys_mutex_own());
  3.   size_t name_len = strlen(name);
  4.   for (uint32_t i = 0; i < hash_get_n_cells(dict_sys->table_id_hash); ++i) {
  5.     dict_table_t *table = static_cast<dict_table_t *>(hash_get_first(dict_sys->table_hash, i));
  6.     while (table != nullptr) {
  7.       dict_table_t *prev_table = table;
  8.       table = static_cast<dict_table_t *>(HASH_GET_NEXT(name_hash, prev_table));
  9.       if (prev_table->is_dd_table) {
  10.         continue;
  11.       }
  12.       if ((strncmp(name, prev_table->name.m_name, name_len) == 0) &&
  13.           dict_table_is_partition(prev_table)) {
  14.         btr_drop_ahi_for_table(prev_table);
  15.         dict_table_remove_from_cache(prev_table);  // 错误删除
  16.       }
  17.     }
  18.   }
  19. }
复制代码
核心问题:


  • 使用 strncmp(name, m_name, name_len) 导致前缀匹配误判为全匹配
  • 应该采用 精确表名匹配,如:
  1. if (strcmp(name, prev_table->name.m_name) == 0)
复制代码

  • 或使用更严格的逻辑,确保仅匹配当前表或其合法分区。
5. 结论与建议

问题根因:


  • 缓存移除逻辑基于前缀匹配,未验证完整表名,误删其他表结构对象;
  • 被删表正在被引用,引用计数 ref_count ≠ 0,触发 InnoDB 保护性断言;
  • 最终导致 MySQL 进程崩溃。
潜在影响:


  • 表结构类似(同名前缀)的表,在 DDL 并发或异常终止场景下可能被误删;
  • 可能存在更广泛的 InnoDB 缓存一致性漏洞,特别在并发和 kill 介入场景中。
建议修复方向:


  • 将 strncmp(name, m_name, name_len) 替换为更安全的 strcmp 或引入完整名判断;
  • 限制 dict_partitioned_table_remove_from_cache() 中删除的表名范围,仅处理严格匹配的表;
  • 增强 DDL 异常中断处理的健壮性,确保引用关系、缓存清理逻辑的一致性。

Enjoy GreatSQL
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!

相关推荐

您需要登录后才可以回帖 登录 | 立即注册