脏降级arrays，无法超级块，内核恐慌

机器：Linux的centos 5.4与2硬盘和RAID 5（是的，第三盘丢失）。

情况：

所有运行正常（第三盘丢失）
然后closures电源（电池电源closures时系统自动closures）。
机器不回来

在屏幕上的消息：

Memory for crash kernel (0x0 to 0x0) notwithin permissible range PCI: BIOS Bug: MCFC area at e0000000 is not E820-reserved PCI: Not using MMCONFIG. Red Hat nash version 5.1.19.6 starting insmod: error inserting '/lib/raid456.ko': -1 File exists md: md2: raid array is not clean -- starting background reconstruction raid5: cannot start dirty degraded array for md2 raid5: failed to run raid set md2 md: pers->run() failed ... md: md2: raid array is not clean -- starting background reconstruction raid5: cannot start dirty degraded array for md2 raid5: failed to run raid set md2 md: pers->run() failed ... EXT3-fs: unable to read superblock mount: error mounting /dev/root on /sysroot as ext3: Invalid argument setuproot: moving /dev failed: No such file or directory00 setuproot: error mounting /proc: No such file or directory setuproot: mount failed: No such file or directory Kernel panic - not syncing: Attempted to kill init!

所以我在内存条上安装了sysresccd并用它启动。然后我运行这些testing：

 smartctl -t short /dev/sda smartctl -X /dev/sda smartctl -l selftest /dev/sda

和sdb一样。结果是：

 sda: test=Short offline, status="Completed without error", remaining=00%, lifetime=19230, firsterror=- sdb: test=Short offline, status="Completed: read failure", remaining=90%, lifetime=19256, firsterror=67031516

和sdb的细节：

 root@sysresccd /root % smartctl -A /dev/sdb smartctl 5.42 2011-10-20 r3458 [i686-linux-3.0.21-std250-i586] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 180 180 021 Pre-fail Always - 5975 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 33 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 19256 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 32 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 27 193 Load_Cycle_Count 0x0032 183 183 000 Old_age Always - 51128 194 Temperature_Celsius 0x0022 111 093 000 Old_age Always - 39 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

Current_Pending_Sector 17可能是一个问题。

然后进一步的步骤：1.买3x 2tb hdds 2.用记忆棒启动3.将2个旧的1.5tb磁盘复制到2个新的磁盘上：dd if = / dev / sda of = dev / sdc bs = 32M dd if = / dev / sdb of = dev / sdc bs = 32M 4.移除2个旧磁盘（不会使事情变得更糟）5.附加3个新磁盘。重新启动。

输出结果如下所示：

 Memory for crash kernel (0x0 to 0x0) notwithin permissible range PCI: BIOS Bug: MCFC area at e0000000 is not E820-reserved PCI: Not using MMCONFIG. Red Hat nash version 5.1.19.6 starting insmod: error inserting '/lib/raid456.ko': -1 File exists md: invalid raid superblock magic on sdb3 md: md2: raid array is not clean -- starting background reconstruction raid5: not enough operational devices for md2 (2/3 failed) raid5: failed to run raid set md2 md: pers->run() failed ... md: md2: raid array is not clean -- starting background reconstruction raid5: not enough operational devices for md2 (2/3 failed) raid5: failed to run raid set md2 md: pers->run() failed ... EXT3-fs: unable to read superblock mount: error mounting /dev/root on /sysroot as ext3: Invalid argument setuproot: moving /dev failed: No such file or directory setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory setuproot: mount failed: No such file or directory Kernel panic - not syncing: Attempted to kill init!

所以我用记忆棒里的新磁盘和sysresccd启动。这里是一些信息：

 fdisk -l shows the two full disks exactly like the output was on the old disks Device Boot Start End Blocks Id System /dev/sda1 * 63 610469 305203+ fd Linux raid autodetect /dev/sda2 610470 8803619 4096575 fd Linux raid autodetect /dev/sda3 8803620 2930272064 1460734222+ fd Linux raid autodetect /dev/sdb1 * 63 610469 305203+ fd Linux raid autodetect /dev/sdb2 610470 8803619 4096575 fd Linux raid autodetect /dev/sdb3 8803620 2930272064 1460734222+ fd Linux raid autodetect

sdc不包含有效的分区表（这是空的第三个磁盘）

 smartctl -t short /dev/sda smartctl -X /dev/sda smartctl -l selftest /dev/sda sda: test=Short offline, status="Completed without error", remaining=00%, lifetime=19230, firsterror=- sdb: test=Short offline, status="Completed: read failure", remaining=90%, lifetime=19256, firsterror=67031516 smartctl -A /dev/sdb offline_uncorrectable: 0

然后：

 root@sysresccd /root % cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md125 : inactive sda3[0](S) 1460734144 blocks md126 : active raid1 sda1[0] sdb1[1] 305088 blocks [2/2] [UU] md127 : active raid1 sda2[0] sdb2[1] 4096448 blocks [2/2] [UU] unused devices: <none>

注意：raid5在那里显示为md125。

127的详细资料：

 root@sysresccd /root % mdadm --detail /dev/md127 /dev/md127: Version : 0.90 Creation Time : Sun Dec 13 18:45:15 2009 Raid Level : raid1 Array Size : 4096448 (3.91 GiB 4.19 GB) Used Dev Size : 4096448 (3.91 GiB 4.19 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 127 Persistence : Superblock is persistent Update Time : Thu Mar 8 00:40:45 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 939f1a92:590d4172:2414ef47:5e2b15cb Events : 0.236 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2

对于126：

 root@sysresccd /root % mdadm --detail /dev/md126 /dev/md126: Version : 0.90 Creation Time : Sun Dec 13 19:21:09 2009 Raid Level : raid1 Array Size : 305088 (297.99 MiB 312.41 MB) Used Dev Size : 305088 (297.99 MiB 312.41 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Persistence : Superblock is persistent Update Time : Wed Mar 7 23:34:02 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : bde56644:86d3e3a4:1128f4fe:0f47f21f Events : 0.242 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1

125的细节：

 root@sysresccd /root % mdadm --detail /dev/md125 mdadm: md device /dev/md125 does not appear to be active.

sda3的：

 root@sysresccd /root % mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 0.90.00 UUID : 062f3190:b9337fc1:0b38f5df:7ec7c53b Creation Time : Sun Dec 13 18:45:15 2009 Raid Level : raid5 Used Dev Size : 1460733952 (1393.06 GiB 1495.79 GB) Array Size : 2921467904 (2786.13 GiB 2991.58 GB) Raid Devices : 3 Total Devices : 2 Preferred Minor : 2 Update Time : Sat Mar 3 22:48:34 2012 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Checksum : e5ac0d6c - correct Events : 26243911 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 0 8 3 0 active sync /dev/sda3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 0 0 2 faulty removed

sdb3：root @ sysresccd / root％mdadm –examine / dev / sdb3 mdadm：在/ dev / sdb3上没有检测到md超级块。

然后：

 root@sysresccd /root % mdadm --examine /dev/sd[ab]3 | egrep 'dev|Update|Role|State|Chunk Size' mdadm: No md superblock detected on /dev/sdb3. /dev/sda3: Update Time : Sat Mar 3 22:48:34 2012 State : active Chunk Size : 256K Number Major Minor RaidDevice State this 0 8 3 0 active sync /dev/sda3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3

 root@sysresccd /root % mdadm --verbose --examine --scan ARRAY /dev/md2 level=raid5 num-devices=3 UUID=062f3190:b9337fc1:0b38f5df:7ec7c53b devices=/dev/sda3 ARRAY /dev/md126 level=raid1 num-devices=2 UUID=bde56644:86d3e3a4:1128f4fe:0f47f21f devices=/dev/sdb1,/dev/sda1 ARRAY /dev/md127 level=raid1 num-devices=2 UUID=939f1a92:590d4172:2414ef47:5e2b15cb devices=/dev/sdb2,/dev/sda2

（注意：这里列出的是md125而不是md2）

 root@sysresccd /root % mdadm --verbose --create --assume-clean /dev/md2 --level=5 --raid-devices=3 /dev/sda3 /dev/sdb3 missing mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 512K mdadm: layout defaults to left-symmetric mdadm: layout defaults to left-symmetric mdadm: super1.x cannot open /dev/sda3: Device or resource busy mdadm: failed container membership check mdadm: device /dev/sda3 not suitable for any style of array

更新：可能是磁盘sdb的dd副本不成功。 sdb的副本看起来很可疑，所以我运行这个：

 root@sysresccd /root % dd if=/dev/sda3 of=/dev/sdc3 bs=128M 11144+1 records in 11144+1 records out 1495791843840 bytes (1.5 TB) copied, 42354.9 s, 35.3 MB/s root@sysresccd /root % dd if=/dev/sdb3 of=/dev/sdd3 bs=128M dd: reading `/dev/sdb3': Input/output error 222+1 records in 222+1 records out 29813932032 bytes (30 GB) copied, 676.459 s, 44.1 MB/s root@sysresccd /root %

这次只复制了sdb3分区，因为sdb1和sdb2都没问题。你可以看到它中止。因此我现在正在运行：

ddrescue -S -c 20480 -f / dev / sdb3 / dev / sdd3 / tmp / log3

再次复制，这次用ddrescue。这将需要更多的时间，到目前为止，errsize = 17928 kB和errors = 3。

我会更新这篇文章，当副本完成，我发现更多。

（回答我自己）

ddrescue解决了这个问题，之后有可能重新组装raid5arrays。