mysql 客户端访问崩溃排查

错误说明

近期在测试 shardingsphere-proxy 工具的时候, 接连碰到几例 mysql 客户端崩溃的情况, 各 mysql 客户端分布在不同主机, 以及不同版本上. 通过 gdb 直接跟踪可以看到如下错误所示:

# 所有版本以 gdb 方式连接:
gdb -ex run -ex bt --args  mysql -h ... -p ...


# ubuntu 22.04  - mysql-5.7.44

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
74      ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x0000555555595931 in my_strdup (key=0, from=0x0, my_flags=16)
    at /mnt/jenkins/workspace/ps5.7-autobuild-RELEASE/test/percona-server-5.7-5.7.44-48/mysys/my_malloc.c:320
#2  0x000055555558192f in init_username () at /mnt/jenkins/workspace/ps5.7-autobuild-RELEASE/test/percona-server-5.7-5.7.44-48/client/mysql.cc:5850
#3  init_username () at /mnt/jenkins/workspace/ps5.7-autobuild-RELEASE/test/percona-server-5.7-5.7.44-48/client/mysql.cc:5838
#4  0x000055555558b296 in construct_prompt () at /mnt/jenkins/workspace/ps5.7-autobuild-RELEASE/test/percona-server-5.7-5.7.44-48/client/mysql.cc:5741
#5  read_and_execute (interactive=<optimized out>) at /mnt/jenkins/workspace/ps5.7-autobuild-RELEASE/test/percona-server-5.7-5.7.44-48/client/mysql.cc:2282
#6  0x000055555557fd93 in main (argc=<optimized out>, argv=<optimized out>)
    at /mnt/jenkins/workspace/ps5.7-autobuild-RELEASE/test/percona-server-5.7-5.7.44-48/client/mysql.cc:1449


# centos 7 - mysql-5.7.28
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5fd98c1 in __strlen_sse2_pminub () from /lib64/libc.so.6
#0  0x00007ffff5fd98c1 in __strlen_sse2_pminub () from /lib64/libc.so.6
#1  0x000000000041cd6c in my_strdup (key=0, from=0x0, my_flags=16)
    at /mnt/workspace/percona-server-5.7-binaries-release-rocks-new/label_exp/min-centos-6-x64/test/percona-server-5.7.28-31/mysys/my_malloc.c:320
#2  0x0000000000408924 in init_username ()
    at /mnt/workspace/percona-server-5.7-binaries-release-rocks-new/label_exp/min-centos-6-x64/test/percona-server-5.7.28-31/client/mysql.cc:5846
#3  0x000000000040a515 in construct_prompt ()
    at /mnt/workspace/percona-server-5.7-binaries-release-rocks-new/label_exp/min-centos-6-x64/test/percona-server-5.7.28-31/client/mysql.cc:5736
#4  0x00000000004077ed in read_and_execute (interactive=true)
    at /mnt/workspace/percona-server-5.7-binaries-release-rocks-new/label_exp/min-centos-6-x64/test/percona-server-5.7.28-31/client/mysql.cc:2287
#5  main (argc=11, argv=0x9c2438)

处理分析

直接看 my_malloc.c 代码:

317 char *my_strdup(PSI_memory_key key, const char *from, myf my_flags)
318 {
319   char *ptr;
320   size_t length= strlen(from)+1;
321   if ((ptr= (char*) my_malloc(key, length, my_flags)))
322     memcpy(ptr, from, length);
323   return ptr;
324 }

可以对比堆栈信息来看:

#1  0x000000000041cd6c in my_strdup (key=0, from=0x0, my_flags=16)

可以看到 my_strdup 函数的 from 参数为 NULL, 相应的函数 strlen 就是对空指针进行计算, 最终导致 mysql 客户端崩溃.

备注: 堆栈中的 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory. 消息容易误导我们, 不同发行版和架构会有不同的提示. 我们可以在各主机下简单通过 c 的 strlen 函数进行验证, 如下代码:

#include <stdio.h>
#include <string.h>

int main()
{

  int length = strlen(NULL);
  printf("Length of string is : %d\n", length);

  return 0;
}

ubuntu 22.04 系统运行的时候出现了类似的崩溃信息:

Program received signal SIGSEGV, Segmentation fault.
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
77      ../sysdeps/x86_64/multiarch/strlen-evex.S: No such file or directory.

辅助分析

根据上述分析, 分别进行 mysql 通信时的抓包分析:

# mysql-sniffer -P 13099 -i eth0 -n -v
Initializing MySQL sniffing on eth0:3309...
2024/02/22 20:13:11.949926   10.10.0.3:35934:select @@version_comment limit 1 ## type: 3, bytes: 32, time: 0.00
2024/02/22 20:13:11.981929   10.10.0.3:35934:select USER() ## type: 3, bytes: 13, time: 0.00

tcpdump 抓包则未发现响应:

20:12:12.002638 IP 10.10.0.3.35738 > 10.10.0.46.3309: Flags [P.], seq 126:144, ack 205, win 229, length 18
E..:..@.9..B
...
.#...3+?.S..i..P............select USER()................
20:12:12.004388 IP 10.10.0.46.3309 > 10.10.0.3.35738: Flags [P.], seq 205:271, ack 144, win 229, length 66
E..j..@.@.$e

也就是 mysql 客户端在执行了 select USER() 语句后出现了崩溃的情况, 直接连接 shardingsphere-proxy 可以发现其直接返回了空数据:

mysql > select USER();
+--------+
| USER() |
+--------+
|        |
+--------+
1 row in set (0.09 sec)

这里的空不是空字符串

对应上述 strlen 函数的空指针.

如何处理

据此, 可以知晓有两种处理方式:

shardingsphere-proxy 支持语法

可以反馈给官方支持此类 SQL 语法, 这样 mysql 不会出现空指针的情况. 但这种反馈修复的时间一般都很长, 不适合我们用户侧.

mysql 客户端调整

另一种可以让 mysql 客户端不去执行不支持的语法, 如下所示, 之所以指定了 select USER(), 也是因为 my.cnf 里开启了 prompt 功能:

[mysql]
prompt = 'mysql \u@[\h:\p \d] > '
default-character-set = utf8mb4

这里的 \u 即让 mysql 在连接后执行了 init_username(select USER()) 操作, 配置中去掉 \u 即可解决该问题:

# 见 src/client/mysql.cc

5658 static const char* construct_prompt()
5659 {
......
5734       case 'u':
5735         if (!full_username)
5736           init_username();
......
......
5833 static void init_username()
5834 {
5835   my_free(full_username);
5836   my_free(part_username);
5837 
5838   MYSQL_RES *result= NULL;
5839   if (!mysql_query(&mysql,"select USER()") &&
5840       (result=mysql_use_result(&mysql)))

总结

大部分 mysql 协议兼容的工具产品(比如 clickhouse, shardingsphere) 等都没有完全兼容 mysql 的语法, 在通过 mysql 客户端测试的时候也更容易出现一些兼容性的问题. 另外通过应用程序去测试即便不会出现文章中的此类问题, 但也需要对业务用到的 SQL 语法做更多的访问测试.