Linux mdraid RAID 6,磁盘每隔几天随机丢弃一次

我有一些运行Debian 8的服务器,configuration为RAID6的8x800GB SSD。 所有磁盘都连接到LSI-3008闪烁到IT模式。 在每个服务器上,我还有一个2磁盘对作为操作系统的RAID1。

当前状态

# dpkg -l|grep mdad ii mdadm 3.3.2-5+deb8u1 amd64 tool to administer Linux MD arrays (software RAID) # uname -a Linux R5U32-B 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux # more /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md2 : active raid6 sde1[1](F) sdg1[3] sdf1[2] sdd1[0] sdh1[7] sdb1[6] sdj1[5] sdi1[4] 4687678464 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/7] [U_UUUUUU] bitmap: 3/6 pages [12KB], 65536KB chunk md1 : active (auto-read-only) raid1 sda5[0] sdc5[1] 62467072 blocks super 1.2 [2/2] [UU] resync=PENDING md0 : active raid1 sda2[0] sdc2[1] 1890881536 blocks super 1.2 [2/2] [UU] bitmap: 2/15 pages [8KB], 65536KB chunk unused devices: <none> # mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Fri Jun 24 04:35:18 2016 Raid Level : raid6 Array Size : 4687678464 (4470.52 GiB 4800.18 GB) Used Dev Size : 781279744 (745.09 GiB 800.03 GB) Raid Devices : 8 Total Devices : 8 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Jul 19 17:36:15 2016 State : active, degraded Active Devices : 7 Working Devices : 7 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : R5U32-B:2 (local to host R5U32-B) UUID : 24299038:57327536:4db96d98:d6e914e2 Events : 2514191 Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/sdd1 2 0 0 2 removed 2 8 81 2 active sync /dev/sdf1 3 8 97 3 active sync /dev/sdg1 4 8 129 4 active sync /dev/sdi1 5 8 145 5 active sync /dev/sdj1 6 8 17 6 active sync /dev/sdb1 7 8 113 7 active sync /dev/sdh1 1 8 65 - faulty /dev/sde1 

问题

RAID 6arrays每隔1-3天左右半定期降级。 原因是其中一个(任何一个)磁盘出现故障,出现以下错误:

 #dmesg -T [Sat Jul 16 05:38:45 2016] sd 0:0:3:0: attempting task abort! scmd(ffff8810350cbe00) [Sat Jul 16 05:38:45 2016] sd 0:0:3:0: [sde] CDB: [Sat Jul 16 05:38:45 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [Sat Jul 16 05:38:45 2016] scsi target0:0:3: handle(0x000d), sas_address(0x500304801707a443), phy(3) [Sat Jul 16 05:38:45 2016] scsi target0:0:3: enclosure_logical_id(0x500304801707a47f), slot(3) [Sat Jul 16 05:38:46 2016] sd 0:0:3:0: task abort: SUCCESS scmd(ffff8810350cbe00) [Sat Jul 16 05:38:46 2016] end_request: I/O error, dev sde, sector 2064 [Sat Jul 16 05:38:46 2016] md: super_written gets error=-5, uptodate=0 [Sat Jul 16 05:38:46 2016] md/raid:md2: Disk failure on sde1, disabling device.md/raid:md2: Operation continuing on 7 devices. [Sat Jul 16 05:38:46 2016] RAID conf printout: [Sat Jul 16 05:38:46 2016] --- level:6 rd:8 wd:7 [Sat Jul 16 05:38:46 2016] disk 0, o:1, dev:sdd1 [Sat Jul 16 05:38:46 2016] disk 1, o:0, dev:sde1 [Sat Jul 16 05:38:46 2016] disk 2, o:1, dev:sdf1 [Sat Jul 16 05:38:46 2016] disk 3, o:1, dev:sdg1 [Sat Jul 16 05:38:46 2016] disk 4, o:1, dev:sdi1 [Sat Jul 16 05:38:46 2016] disk 5, o:1, dev:sdj1 [Sat Jul 16 05:38:46 2016] disk 6, o:1, dev:sdb1 [Sat Jul 16 05:38:46 2016] disk 7, o:1, dev:sdh1 [Sat Jul 16 05:38:46 2016] RAID conf printout: [Sat Jul 16 05:38:46 2016] --- level:6 rd:8 wd:7 [Sat Jul 16 05:38:46 2016] disk 0, o:1, dev:sdd1 [Sat Jul 16 05:38:46 2016] disk 2, o:1, dev:sdf1 [Sat Jul 16 05:38:46 2016] disk 3, o:1, dev:sdg1 [Sat Jul 16 05:38:46 2016] disk 4, o:1, dev:sdi1 [Sat Jul 16 05:38:46 2016] disk 5, o:1, dev:sdj1 [Sat Jul 16 05:38:46 2016] disk 6, o:1, dev:sdb1 [Sat Jul 16 05:38:46 2016] disk 7, o:1, dev:sdh1 [Sat Jul 16 12:40:00 2016] sd 0:0:7:0: attempting task abort! scmd(ffff88000d76eb00) 

已经尝试过了

我已经尝试了以下,没有任何改进:

  • 将/ sys / block / md2 / md / stripe_cache_size从256增加到16384
  • 将dev.raid.speed_limit_min从1000增加到50000

需要你的帮助

这些错误是由mdadmconfiguration还是由内核或控制器引起的?

更新20160802

按照ppetraki和其他人的build议:

  • 使用原始磁盘代替分区

    这并不能解决问题

  • 减less块大小

    块大小已经修改为128KB,然后是64KB,但RAID卷在几天内仍然下降。 从dmesg显示与以前的错误类似。 我忘了尝试减less块大小为32KB。

  • 将RAID数量减less到6个磁盘

    我试图销毁现有的RAID,在每个磁盘上调零超级块,并用6个磁盘(原始磁盘)和64KB块创buildRAID6。 减less磁盘RAID的数量似乎使arrays寿命更长,大约在4-7天之前降级

  • 更新驱动程序

我只是更新驱动程序到Linux_Driver_RHEL6-7_SLES11-12_P12( http://www.avagotech.com/products/server-storage/host-bus-adapters/sas-9300-8e )。 磁盘错误仍然显示如下

 [Tue Aug 2 17:57:48 2016] sd 0:0:6:0: attempting task abort! scmd(ffff880fc0dd1980) [Tue Aug 2 17:57:48 2016] sd 0:0:6:0: [sdg] CDB: [Tue Aug 2 17:57:48 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [Tue Aug 2 17:57:48 2016] scsi target0:0:6: handle(0x0010), sas_address(0x50030480173ee946), phy(6) [Tue Aug 2 17:57:48 2016] scsi target0:0:6: enclosure_logical_id(0x50030480173ee97f), slot(6) [Tue Aug 2 17:57:49 2016] sd 0:0:6:0: task abort: SUCCESS scmd(ffff880fc0dd1980) [Tue Aug 2 17:57:49 2016] end_request: I/O error, dev sdg, sector 0 

就在刚才,我arrays退化了。 这次/ dev / sdf和/ dev / sdg显示错误“正在尝试任务中止!scmd”

 [Tue Aug 2 21:26:02 2016] [Tue Aug 2 21:26:02 2016] sd 0:0:5:0: [sdf] CDB: [Tue Aug 2 21:26:02 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [Tue Aug 2 21:26:02 2016] scsi target0:0:5: handle(0x000f), sas_address(0x50030480173ee945), phy(5) [Tue Aug 2 21:26:02 2016] scsi target0:0:5: enclosure logical id(0x50030480173ee97f), slot(5) [Tue Aug 2 21:26:02 2016] scsi target0:0:5: enclosure level(0x0000), connector name( ^A) [Tue Aug 2 21:26:03 2016] sd 0:0:5:0: task abort: SUCCESS scmd(ffff88103beb5240) [Tue Aug 2 21:26:03 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88107934e080) [Tue Aug 2 21:26:03 2016] sd 0:0:5:0: [sdf] CDB: [Tue Aug 2 21:26:03 2016] Read(10): 28 00 04 75 3b f8 00 00 08 00 [Tue Aug 2 21:26:03 2016] scsi target0:0:5: handle(0x000f), sas_address(0x50030480173ee945), phy(5) [Tue Aug 2 21:26:03 2016] scsi target0:0:5: enclosure logical id(0x50030480173ee97f), slot(5) [Tue Aug 2 21:26:03 2016] scsi target0:0:5: enclosure level(0x0000), connector name( ^A) [Tue Aug 2 21:26:03 2016] sd 0:0:5:0: task abort: SUCCESS scmd(ffff88107934e080) [Tue Aug 2 21:26:04 2016] sd 0:0:5:0: [sdf] CDB: [Tue Aug 2 21:26:04 2016] Read(10): 28 00 04 75 3b f8 00 00 08 00 [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: sas_address(0x50030480173ee945), phy(5) [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: enclosure logical id(0x50030480173ee97f), slot(5) [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: enclosure level(0x0000), connector name( ^A) [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: handle(0x000f), ioc_status(success)(0x0000), smid(35) [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: request_len(4096), underflow(4096), resid(-4096) [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: tag(65535), transfer_count(8192), sc->result(0x00000000) [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [Tue Aug 2 21:26:04 2016] mpt3sas_cm0: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18) [Tue Aug 2 22:14:51 2016] sd 0:0:6:0: attempting task abort! scmd(ffff880931d8c840) [Tue Aug 2 22:14:51 2016] sd 0:0:6:0: [sdg] CDB: [Tue Aug 2 22:14:51 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [Tue Aug 2 22:14:51 2016] scsi target0:0:6: handle(0x0010), sas_address(0x50030480173ee946), phy(6) [Tue Aug 2 22:14:51 2016] scsi target0:0:6: enclosure logical id(0x50030480173ee97f), slot(6) [Tue Aug 2 22:14:51 2016] scsi target0:0:6: enclosure level(0x0000), connector name( ^A) [Tue Aug 2 22:14:51 2016] sd 0:0:6:0: task abort: SUCCESS scmd(ffff880931d8c840) [Tue Aug 2 22:14:52 2016] sd 0:0:6:0: [sdg] CDB: [Tue Aug 2 22:14:52 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: sas_address(0x50030480173ee946), phy(6) [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: enclosure logical id(0x50030480173ee97f), slot(6) [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: enclosure level(0x0000), connector name( ^A) [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: handle(0x0010), ioc_status(success)(0x0000), smid(85) [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: request_len(0), underflow(0), resid(-8192) [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: tag(65535), transfer_count(8192), sc->result(0x00000000) [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [Tue Aug 2 22:14:52 2016] mpt3sas_cm0: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18) [Tue Aug 2 22:14:52 2016] end_request: I/O error, dev sdg, sector 16 [Tue Aug 2 22:14:52 2016] md: super_written gets error=-5, uptodate=0 [Tue Aug 2 22:14:52 2016] md/raid:md2: Disk failure on sdg, disabling device. md/raid:md2: Operation continuing on 5 devices. [Tue Aug 2 22:14:52 2016] RAID conf printout: [Tue Aug 2 22:14:52 2016] --- level:6 rd:6 wd:5 [Tue Aug 2 22:14:52 2016] disk 0, o:1, dev:sdc [Tue Aug 2 22:14:52 2016] disk 1, o:1, dev:sdd [Tue Aug 2 22:14:52 2016] disk 2, o:1, dev:sde [Tue Aug 2 22:14:52 2016] disk 3, o:1, dev:sdf [Tue Aug 2 22:14:52 2016] disk 4, o:0, dev:sdg [Tue Aug 2 22:14:52 2016] disk 5, o:1, dev:sdh [Tue Aug 2 22:14:52 2016] RAID conf printout: [Tue Aug 2 22:14:52 2016] --- level:6 rd:6 wd:5 [Tue Aug 2 22:14:52 2016] disk 0, o:1, dev:sdc [Tue Aug 2 22:14:52 2016] disk 1, o:1, dev:sdd [Tue Aug 2 22:14:52 2016] disk 2, o:1, dev:sde [Tue Aug 2 22:14:52 2016] disk 3, o:1, dev:sdf [Tue Aug 2 22:14:52 2016] disk 5, o:1, dev:sdh 

我假设错误“尝试任务中止!scmd”导致数组退化,但不知道是什么原因造成的。

更新20160806

我试过用相同的规格设置其他服务器。 没有mdadm RAID,每个磁盘都直接安装在ext4文件系统下。 经过一段时间内核日志显示“尝试任务中止!scmd”在一些磁盘上。 这导致/ dev / sdd1错误,然后重新安装到只读模式

 $ dmesg -T [Sat Aug 6 05:21:09 2016] sd 0:0:3:0: [sdd] CDB: [Sat Aug 6 05:21:09 2016] Read(10): 28 00 2d 29 21 00 00 00 20 00 [Sat Aug 6 05:21:09 2016] scsi target0:0:3: handle(0x000a), sas_address(0x4433221103000000), phy(3) [Sat Aug 6 05:21:09 2016] scsi target0:0:3: enclosure_logical_id(0x500304801a5d3f01), slot(3) [Sat Aug 6 05:21:09 2016] sd 0:0:3:0: task abort: SUCCESS scmd(ffff88006b206800) [Sat Aug 6 05:21:09 2016] sd 0:0:3:0: attempting task abort! scmd(ffff88019a3a07c0) [Sat Aug 6 05:21:09 2016] sd 0:0:3:0: [sdd] CDB: [Sat Aug 6 05:21:09 2016] Read(10): 28 00 08 46 8f 80 00 00 20 00 [Sat Aug 6 05:21:09 2016] scsi target0:0:3: handle(0x000a), sas_address(0x4433221103000000), phy(3) [Sat Aug 6 05:21:09 2016] scsi target0:0:3: enclosure_logical_id(0x500304801a5d3f01), slot(3) [Sat Aug 6 05:21:09 2016] sd 0:0:3:0: task abort: SUCCESS scmd(ffff88019a3a07c0) [Sat Aug 6 05:21:10 2016] sd 0:0:3:0: attempting device reset! scmd(ffff880f9a49ac80) [Sat Aug 6 05:21:10 2016] sd 0:0:3:0: [sdd] CDB: [Sat Aug 6 05:21:10 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [Sat Aug 6 05:21:10 2016] scsi target0:0:3: handle(0x000a), sas_address(0x4433221103000000), phy(3) [Sat Aug 6 05:21:10 2016] scsi target0:0:3: enclosure_logical_id(0x500304801a5d3f01), slot(3) [Sat Aug 6 05:21:10 2016] sd 0:0:3:0: device reset: SUCCESS scmd(ffff880f9a49ac80) [Sat Aug 6 05:21:10 2016] mpt3sas0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03) [Sat Aug 6 05:21:10 2016] mpt3sas0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03) [Sat Aug 6 05:21:10 2016] mpt3sas0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03) [Sat Aug 6 05:21:11 2016] end_request: I/O error, dev sdd, sector 780443696 [Sat Aug 6 05:21:11 2016] Aborting journal on device sdd1-8. [Sat Aug 6 05:21:11 2016] EXT4-fs error (device sdd1): ext4_journal_check_start:56: Detected aborted journal [Sat Aug 6 05:21:11 2016] EXT4-fs (sdd1): Remounting filesystem read-only [Sat Aug 6 05:40:35 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88024fc08340) [Sat Aug 6 05:40:35 2016] sd 0:0:5:0: [sdf] CDB: [Sat Aug 6 05:40:35 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [Sat Aug 6 05:40:35 2016] scsi target0:0:5: handle(0x000c), sas_address(0x4433221105000000), phy(5) [Sat Aug 6 05:40:35 2016] scsi target0:0:5: enclosure_logical_id(0x500304801a5d3f01), slot(5) [Sat Aug 6 05:40:35 2016] sd 0:0:5:0: task abort: FAILED scmd(ffff88024fc08340) [Sat Aug 6 05:40:35 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88019a12ee00) [Sat Aug 6 05:40:35 2016] sd 0:0:5:0: [sdf] CDB: [Sat Aug 6 05:40:35 2016] Read(10): 28 00 27 c8 b4 e0 00 00 20 00 [Sat Aug 6 05:40:35 2016] scsi target0:0:5: handle(0x000c), sas_address(0x4433221105000000), phy(5) [Sat Aug 6 05:40:35 2016] scsi target0:0:5: enclosure_logical_id(0x500304801a5d3f01), slot(5) [Sat Aug 6 05:40:35 2016] sd 0:0:5:0: task abort: SUCCESS scmd(ffff88019a12ee00) [Sat Aug 6 05:40:35 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88203eaddac0) 

更新20160930

控制器固件升级到最新版本(当前)12.00.02后,问题消失

结论

问题解决了

这是一个相当大的条纹,8-2 = 6 * 512K = 3MiB; 不是一个甚至一个。 将arrays带到10个磁盘(8个数据+2个奇偶校验),或者低至4 + 2个奇偶校验,每个驱动器的总条带大小为256K或64K。 这可能是caching疯狂在你未alignment的写入。 尝试重新configurationarrays之前,可以尝试将所有驱动器设置为直写模式。

更新7/20/16。

在这一点上,我确信你的RAIDconfiguration是个问题。 3MiB条纹只是奇数,即使它是您的分区偏移量[1](1MiB)的倍数,对于任何RAID,SSD或其他方式而言,这只是次优条带大小。 它可能会产生大量的未alignment的写入,这迫使你的SSD释放更多的页面,而不是现成的,这会将其不断地推入垃圾回收器,并缩短它的使用寿命。 驱动器无法获得足够快的写入空闲页面,所以当最终将caching刷新到磁盘(同步写入)时,它实际上会失败。 你没有崩溃一致的数组,例如你的数据是不安全的。

这是我的理论基础上的可用信息和时间,我可以花在它上面。 你现在有一个“成长机会”成为存储专家;)

重来。 不要使用分区。 设置一个系统,build立一个128K的总条带大小的数组(启动稍微保守一点)。 在N个总驱动器的RAID 6configuration中,只有N-2个驱动器可以在任何时间获取数据,其余的两个存储器奇偶校验信息。 所以如果N = 6,128K条纹将需要32K块。 你应该能够看到为什么8有点奇怪的数字来运行RAID 6。

然后在直接模式下运行fio [2]对“原始磁盘”,并击败它直到你确信它是稳定的。 接下来添加文件系统并通知底层条带大小(man mkfs。???)。 再次运行fio,但这次使用的文件(或者你会摧毁文件系统),并确认arrays保持。

我知道这是很多“东西”,从小处着手,试着理解它在做什么,并坚持下去。 像blktrace和iostat这样的工具可以帮助你理解你的应用程序是如何写的,这将告诉你最好的条带/块的大小使用。

  1. https://www.percona.com/blog/2011/06/09/aligning-io-on-a-hard-disk-raid-the-theory/

(my fio cheatsheet)2. https://wiki.mikejung.biz/Benchmarking#Fio_Random_Write_and_Random_Read_Command_Line_Examples

开始检查并显示SMART读数。 我怀疑你的磁盘有问题。 尝试读取/写入错误的扇区后,它看起来像是超时。 也可能是布线问题(接触不良,电缆断裂等)。 我也看到类似的固件问题的磁盘。 SMART之后,我应该有更多的build议。