GlusterFS分裂大脑问题

在GlusterFS安装中,我一直面临着性能问题。 我们采用了新的应用程序生成方式,突然间所有GlusterFS客户端和主服务器也开始显示CPU的高利用率。 这是造成真正的痛苦。 我的安装如下:

我有version 3.7.4 glusterFS两个主服务器

 [root@gfs1 glusterfs]# gluster volume info Volume Name: repl-vol Type: Replicate Volume ID: 7535cfad-6bb9-4147-9fea-e869e7b8d565 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gfs1.myhost.com:/GlusterFS/repl-data Brick2: gfs2.myhost.com:/GlusterFS/repl-data Options Reconfigured: cluster.self-heal-window-size: 100 performance.cache-max-file-size: 2MB performance.cache-size: 256MB performance.write-behind-window-size: 4MB performance.io-thread-count: 32 cluster.data-self-heal-algorithm: diff nfs.disable: off [root@gfs2 ec2-user]# gluster volume info Volume Name: repl-vol Type: Replicate Volume ID: 7535cfad-6bb9-4147-9fea-e869e7b8d565 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gfs1.myhost.com:/GlusterFS/repl-data Brick2: gfs2.myhost.com:/GlusterFS/repl-data Options Reconfigured: cluster.self-heal-window-size: 100 nfs.disable: off cluster.data-self-heal-algorithm: diff performance.io-thread-count: 32 performance.write-behind-window-size: 4MB performance.cache-size: 256MB performance.cache-max-file-size: 2MB 

我有大约14个客户,我们正在使用glusterFS。 glusterFS正在托pipe1.2TB的数据,基本上是静态内容JS / CSS /图像。 我们一直在监视服务器CPU利用率突然激增。 networkingIO也很高,达到125MB / s-250MB / s。 我检查日志,主要发现以下问题:

 [2015-09-09 03:13:33.797655] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <3fd13508-b29e-4d52-8c9c-14ccd2f24b9f/100000130641_4.jpg>, ed715d52-4a39-46db-901b-16ae13f01898 on repl-vol-client-1 and 0bc0c058-b6a7-4f0d-9d46-96f7fcded0f3 on repl-vol-client-0. Skipping conservative merge on the file. [2015-09-09 03:13:36.074219] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <3fd13508-b29e-4d52-8c9c-14ccd2f24b9f/100000132992_4.jpg>, 8b67cc38-df53-43c7-ad42-b9c616b980b1 on repl-vol-client-1 and 41f393de-9d83-4f52-bfcf-832e31a27a87 on repl-vol-client-0. Skipping conservative merge on the file. [2015-09-09 03:13:36.076681] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <3fd13508-b29e-4d52-8c9c-14ccd2f24b9f/100000132995_4.jpg>, b1dd578b-3dfe-43dc-ad3a-d54c86298278 on repl-vol-client-1 and bd7c42b9-575f-46bc-9f56-804994f27ab0 on repl-vol-client-0. Skipping conservative merge on the file. [2015-09-09 04:00:50.975933] I [MSGID: 108026] [afr-self-heal-entry.c:589:afr_selfheal_entry_do] 0-repl-vol-replicate-0: performing entry selfheal on cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3 [2015-09-09 04:00:51.005409] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file. [2015-09-09 04:00:51.011467] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available] [2015-09-09 04:00:51.014205] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available] [2015-09-09 04:00:51.046092] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file. [2015-09-09 04:10:53.125065] I [MSGID: 108026] [afr-self-heal-entry.c:589:afr_selfheal_entry_do] 0-repl-vol-replicate-0: performing entry selfheal on cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3 [2015-09-09 04:10:53.225256] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file. [2015-09-09 04:10:53.232229] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available] [2015-09-09 04:10:53.236203] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available] [2015-09-09 04:10:53.343344] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file. 

两个主要错误是remote operation failedGfid mismatch 。 我甚至试图解决裂脑,但似乎我做错了什么或不工作。

恢复步骤:

 [root@gfs2 ec2-user]# gluster volume heal repl-vol info split-brain Brick gfs1.myhost.com:/GlusterFS/repl-data/ <gfid:cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3> /media/klevu_images/1/0 Number of entries in split-brain: 2 Brick gfs2.myhost.com:/GlusterFS/repl-data/ /media/klevu_images/1/0 <gfid:cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3> Number of entries in split-brain: 2 

所以我只是删除上面的文件,然后尝试gluster volume heal repl-data

我不是很确定解决大脑分裂问题会解决我的performance问题。 此外,分裂的大脑不断进来。我的主要目标是确定性能。