glusterfs trash 功能引起进程崩溃退出处理

glusterfs trash 功能引起进程崩溃退出

glusterfs 的 trash 功能类似回收站, 用户删除的文件保存到指定的回收目录里, 该功能可以用于数据恢复. 系统环境:

Centos 6.4
Glusterfs-3.7.11

错误日志: 每隔几分钟, gluster 进程退出, 相应的报错信息如下:

[2016-06-30 15:59:59.281498] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-gv0-server: disconnecting connection from gw108.sysop.zongheng.com-22273-2016/06/30-16:00:00:221452-gv0-client-1-0-0
[2016-06-30 15:59:59.281546] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-gv0-server: Shutting down connection gw108.sysop.zongheng.com-22273-2016/06/30-16:00:00:221452-gv0-client-1-0-0
[2016-06-30 16:00:01.134074] E [MSGID: 113020] [posix.c:2651:posix_create] 0-gv0-posix: setting gfid on /web/brick1/gv0/.trashcan//zhcn-data2/static/upload/homeActivity/includeFile/homeActivityJs_1.js_2016-06-30_160001 failed
[2016-06-30 16:00:01.134166] E [posix.c:2996:_fill_writev_xdata] (-->/usr/lib64/glusterfs/3.7.11/xlator/features/trash.so(trash_truncate_readv_cbk+0x17a) [0x7f014a327bda] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(posix_writev+0x1c7) [0x7f014ab580b7] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(_fill_writev_xdata+0x1a7) [0x7f014ab50d87] ) 0-gv0-posix: fd: 0x7f01440c9178 inode: 0x7f012839c17cgfid:00000000-0000-0000-0000-000000000000 [Invalid argument]
[2016-06-30 16:00:01.134223] E [posix.c:2996:_fill_writev_xdata] (-->/usr/lib64/glusterfs/3.7.11/xlator/features/trash.so(trash_truncate_readv_cbk+0x17a) [0x7f014a327bda] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(posix_writev+0x1c7) [0x7f014ab580b7] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(_fill_writev_xdata+0x1a7) [0x7f014ab50d87] ) 0-gv0-posix: fd: 0x7f01440c9178 inode: 0x7f012839c17cgfid:00000000-0000-0000-0000-000000000000 [Invalid argument]
[2016-06-30 16:00:01.172025] E [MSGID: 113020] [posix.c:2651:posix_create] 0-gv0-posix: setting gfid on /web/brick1/gv0/.trashcan//zhcn-data2/static/upload/ad/column/19672573644.js_2016-06-30_160001 failed
[2016-06-30 16:00:01.172110] E [posix.c:2996:_fill_writev_xdata] (-->/usr/lib64/glusterfs/3.7.11/xlator/features/trash.so(trash_truncate_readv_cbk+0x17a) [0x7f014a327bda] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(posix_writev+0x1c7) [0x7f014ab580b7] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(_fill_writev_xdata+0x1a7) [0x7f014ab50d87] ) 0-gv0-posix: fd: 0x7f01440c9178 inode: 0x7f012839c17cgfid:00000000-0000-0000-0000-000000000000 [Invalid argument]
[2016-06-30 16:00:01.172157] E [posix.c:2996:_fill_writev_xdata] (-->/usr/lib64/glusterfs/3.7.11/xlator/features/trash.so(trash_truncate_readv_cbk+0x17a) [0x7f014a327bda] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(posix_writev+0x1c7) [0x7f014ab580b7] -->/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(_fill_writev_xdata+0x1a7) [0x7f014ab50d87] ) 0-gv0-posix: fd: 0x7f01440c9178 inode: 0x7f012839c17cgfid:00000000-0000-0000-0000-000000000000 [Invalid argument]
pending frames:
frame : type(0) op(24)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2016-06-30 16:00:01
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.11
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb8)[0x7f0158252a18]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x7f01582726af]
/lib64/libc.so.6(+0x32920)[0x7f0156bf3920]
/lib64/libc.so.6(+0x132d5f)[0x7f0156cf3d5f]
/usr/lib64/glusterfs/3.7.11/xlator/features/trash.so(+0x43fb)[0x7f014a3263fb]
/usr/lib64/glusterfs/3.7.11/xlator/features/trash.so(trash_truncate_stat_cbk+0x333)[0x7f014a32b0e3]
/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(posix_fstat+0x180)[0x7f014ab51210]
/usr/lib64/glusterfs/3.7.11/xlator/features/trash.so(trash_ftruncate+0x3e3)[0x7f014a32bf23]
/usr/lib64/glusterfs/3.7.11/xlator/features/changetimerecorder.so(ctr_ftruncate+0x167)[0x7f014a1118e7]
/usr/lib64/glusterfs/3.7.11/xlator/features/changelog.so(changelog_ftruncate+0x161)[0x7f0149a52181]
/usr/lib64/glusterfs/3.7.11/xlator/features/bitrot-stub.so(br_stub_ftruncate_resume+0x133)[0x7f0149625903]
/usr/lib64/libglusterfs.so.0(call_resume+0x80)[0x7f015827d740]
/usr/lib64/glusterfs/3.7.11/xlator/features/bitrot-stub.so(br_stub_fd_incversioning_cbk+0x9a)[0x7f0149628eea]
/usr/lib64/glusterfs/3.7.11/xlator/features/changelog.so(changelog_fsetxattr_cbk+0xf4)[0x7f0149a54144]
/usr/lib64/glusterfs/3.7.11/xlator/features/changetimerecorder.so(ctr_fsetxattr_cbk+0x139)[0x7f014a113f89]
/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(posix_fsetxattr+0x299)[0x7f014ab55329]
/usr/lib64/libglusterfs.so.0(default_fsetxattr+0x83)[0x7f015825c073]
/usr/lib64/glusterfs/3.7.11/xlator/features/changetimerecorder.so(ctr_fsetxattr+0x16d)[0x7f014a11297d]
/usr/lib64/glusterfs/3.7.11/xlator/features/changelog.so(changelog_fsetxattr+0x170)[0x7f0149a567a0]
/usr/lib64/glusterfs/3.7.11/xlator/features/bitrot-stub.so(br_stub_fd_versioning+0x1d4)[0x7f0149627834]
/usr/lib64/glusterfs/3.7.11/xlator/features/bitrot-stub.so(+0x87ff)[0x7f01496287ff]
/usr/lib64/glusterfs/3.7.11/xlator/features/bitrot-stub.so(br_stub_ftruncate+0x493)[0x7f014962c783]
/usr/lib64/libglusterfs.so.0(default_ftruncate+0x78)[0x7f015825b1d8]
/usr/lib64/glusterfs/3.7.11/xlator/features/locks.so(+0xae88)[0x7f01491ffe88]
/usr/lib64/glusterfs/3.7.11/xlator/storage/posix.so(posix_fstat+0x180)[0x7f014ab51210]
/usr/lib64/libglusterfs.so.0(default_fstat+0x6e)[0x7f015825bd0e]

故障分析: http://git.gluster.org/cgit/glusterfs.git/commit/?id=b5cfe948cb3569f034da80ac97b5d2f028b3b0e5

在早先的版本中, trash 处理器在每次调用 mkdir 系统调用的时候, 都需要返回 gfid-req 集合到字典信息中, 如果 gfid(gluster file identify describe, uuid 格式, 笔者猜测对应上面的 00000000-0000-0000-0000-000000000000 ) 无效, 则会引起 brick 进程的退出, 引起 crash 错误. 官方新的版本通过直接返回 EPERM (权限不足) 给 POSIX 处理器来避免 crash 的发生.

处理方式: 可以禁止 trash 功能, 也可以升级到 3.7.12 版本.