QNAP – HOW TO REPAIR RAID5 ARRAY WITH UNRECOVERED READ ERROR (URE) DURING REBUILD
Tonight I experienced first hand a fear that most QNAP NAS owners have whilst rebuilding a broken drive in a RAID 5 array. You guessed it, the dreaded URE (UNRECOVERED READ ERROR).
# dmesg
[23298.622891] md: md0: recovery done.
[23298.628401] md: Recovering done: md0, degraded=1
[23299.732589] ata6.00: exception Emask 0x0 SAct 0x7fe SErr 0x0 action 0x0
[23299.737963] ata6.00: irq_stat 0x40000008
[23299.743250] ata6.00: failed command: READ FPDMA QUEUED
[23299.748477] ata6.00: cmd 60/00:08:84:d9:44/04:00:57:00:00/40 tag 1 ncq 524288 in
[23299.748481] res 41/40:00:f0:dc:44/00:00:57:00:00/40 Emask 0x409 (media error) <F>
[23299.759028] ata6.00: status: { DRDY ERR }
[23299.764316] ata6.00: error: { UNC }
[23299.782380] ata6.00: configured for UDMA/133
[23299.787537] ata6: EH complete
[23301.855721] ata6.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
[23301.860887] ata6.00: irq_stat 0x40000008
[23301.866252] ata6.00: failed command: READ FPDMA QUEUED
[23301.871320] ata6.00: cmd 60/00:48:84:d9:44/04:00:57:00:00/40 tag 9 ncq 524288 in
[23301.871325] res 41/40:00:ef:dc:44/00:00:57:00:00/40 Emask 0x409 (media error) <F>
[23301.882098] ata6.00: status: { DRDY ERR }
[23301.887228] ata6.00: error: { UNC }
[23306.892039] ata6.00: qc timeout (cmd 0xec)
[23306.897030] ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[23306.902001] ata6.00: revalidation failed (errno=-5)
[23306.906919] ata6: hard resetting link
[23307.371054] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[23307.388929] ata6.00: configured for UDMA/133
[23307.393885] ata6: EH complete
[23308.431943] ata6: log page 10h reported inactive tag 0
[23308.436777] ata6.00: exception Emask 0x1 SAct 0x7fe SErr 0x0 action 0x0
[23308.441659] ata6.00: irq_stat 0x40000008
[23308.446492] ata6.00: failed command: READ FPDMA QUEUED
[23308.451684] ata6.00: cmd 60/00:08:84:d9:44/04:00:57:00:00/40 tag 1 ncq 524288 in
[23308.451689] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.461723] ata6.00: status: { DRDY }
[23308.466716] ata6.00: failed command: READ FPDMA QUEUED
[23308.471658] ata6.00: cmd 60/00:10:84:dd:44/03:00:57:00:00/40 tag 2 ncq 393216 in
[23308.471661] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.481539] ata6.00: status: { DRDY }
[23308.486502] ata6.00: failed command: READ FPDMA QUEUED
[23308.491486] ata6.00: cmd 60/08:18:7a:6c:10/00:00:00:00:00/40 tag 3 ncq 4096 in
[23308.491490] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.501645] ata6.00: status: { DRDY }
[23308.506405] ata6.00: failed command: READ FPDMA QUEUED
[23308.511115] ata6.00: cmd 60/08:20:72:6c:10/00:00:00:00:00/40 tag 4 ncq 4096 in
[23308.511119] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.520405] ata6.00: status: { DRDY }
[23308.524995] ata6.00: failed command: READ FPDMA QUEUED
[23308.529534] ata6.00: cmd 60/08:28:6a:6c:10/00:00:00:00:00/40 tag 5 ncq 4096 in
[23308.529538] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.538586] ata6.00: status: { DRDY }
[23308.543067] ata6.00: failed command: READ FPDMA QUEUED
[23308.547584] ata6.00: cmd 60/08:30:62:6c:10/00:00:00:00:00/40 tag 6 ncq 4096 in
[23308.547588] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.556706] ata6.00: status: { DRDY }
[23308.561263] ata6.00: failed command: READ FPDMA QUEUED
[23308.565820] ata6.00: cmd 60/08:38:5a:6c:10/00:00:00:00:00/40 tag 7 ncq 4096 in
[23308.565825] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.574937] ata6.00: status: { DRDY }
[23308.579488] ata6.00: failed command: READ FPDMA QUEUED
[23308.584079] ata6.00: cmd 60/08:40:52:6c:10/00:00:00:00:00/40 tag 8 ncq 4096 in
[23308.584082] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.593387] ata6.00: status: { DRDY }
[23308.597981] ata6.00: failed command: READ FPDMA QUEUED
[23308.602812] ata6.00: cmd 60/08:48:4a:6c:10/00:00:00:00:00/40 tag 9 ncq 4096 in
[23308.602817] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.612334] ata6.00: status: { DRDY }
[23308.616907] ata6.00: failed command: READ FPDMA QUEUED
[23308.621563] ata6.00: cmd 60/08:50:42:6c:10/00:00:00:00:00/40 tag 10 ncq 4096 in
[23308.621567] res 40/00:54:42:6c:10/00:00:00:00:00/40 Emask 0x1 (device error)
[23308.631025] ata6.00: status: { DRDY }
[23308.648602] ata6.00: configured for UDMA/133
[23308.653389] ata6: EH complete
[23310.704908] ata6.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[23310.709669] ata6.00: irq_stat 0x40000008
[23310.714362] ata6.00: failed command: READ FPDMA QUEUED
[23310.719089] ata6.00: cmd 60/00:48:84:d9:44/04:00:57:00:00/40 tag 9 ncq 524288 in
[23310.719092] res 41/40:00:ef:dc:44/00:00:57:00:00/40 Emask 0x409 (media error) <F>
[23310.728625] ata6.00: status: { DRDY ERR }
[23310.733749] ata6.00: error: { UNC }
[23310.751334] ata6.00: configured for UDMA/133
[23310.756049] ata6: EH complete
[23312.863828] ata6.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[23312.868659] ata6.00: irq_stat 0x40000008
[23312.873247] ata6.00: failed command: READ FPDMA QUEUED
[23312.877828] ata6.00: cmd 60/00:00:84:d9:44/04:00:57:00:00/40 tag 0 ncq 524288 in
[23312.877832] res 41/40:00:ef:dc:44/00:00:57:00:00/40 Emask 0x409 (media error) <F>
[23312.887045] ata6.00: status: { DRDY ERR }
[23312.891653] ata6.00: error: { UNC }
[23312.909053] ata6.00: configured for UDMA/133
[23312.913591] ata6: EH complete
[23314.959481] ata6.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[23314.964017] ata6.00: irq_stat 0x40000008
[23314.968487] ata6.00: failed command: READ FPDMA QUEUED
[23314.972937] ata6.00: cmd 60/00:00:84:d9:44/04:00:57:00:00/40 tag 0 ncq 524288 in
[23314.972941] res 41/40:00:ef:dc:44/00:00:57:00:00/40 Emask 0x409 (media error) <F>
[23314.982330] ata6.00: status: { DRDY ERR }
[23314.986831] ata6.00: error: { UNC }
[23315.004157] ata6.00: configured for UDMA/133
[23315.008655] sd 5:0:0:0: [sda] Unhandled sense code
[23315.013027] sd 5:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[23315.017468] sd 5:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
[23315.021882] Descriptor sense data with sense descriptors (in hex):
[23315.026341] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[23315.030854] 57 44 dc ef
[23315.035243] sd 5:0:0:0: [sda] Add. Sense: Unrecovered read error – auto reallocate failed
[23315.039729] sd 5:0:0:0: [sda] CDB: Read(10): 28 00 57 44 d9 84 00 04 00 00
[23315.044333] end_request: I/O error, dev sda, sector 1464130799
[23315.048820] md/raid:md0: read error not correctable (sector 1462010216 on sda3).
[23315.053422] raid5: some error occurred in a active device:0 of md0.
[23315.058027] raid5: Keep the raid device active in degraded mode but set readonly.
[23315.062687] md/raid:md0: read error not correctable (sector 1462010224 on sda3).
[23315.067315] raid5: some error occurred in a active device:0 of md0.
[25914.157523] md: ioctl lock interrupted, reason -4, cmd -2142762735
When you encounter a URE during a re-build, most of the advice on many of the forums suggest that the only course of action is to create a new array and restore your data from a backup.
As someone who does not give up that easy, I went looking for an alternative option and found my new best friend. ddrecue.
Using ddrescue, I was able to clone the disk that was having the URE (UNRECOVERED READ ERROR). To begin the process you must first identify the problem disk. Steps are:
- You need to ssh / putty into the NAS
- Run the command dmesg and locate a line similar to “read error not correctable (sector 1462010224 on sda3)“.
- Typically – sda3 = disk1, sdb3 = disk2, sdc3 = disk3, sdd3 = disk 4
- Eject the problem disk from the NAS
- Download and install SystemRescueCd on a USB stick – see http://www.system-rescue-cd.org/Installing-SystemRescueCd-on-a-USB-stick/
- On any computer, install the problem drive and the new replacement drive using SATA connectors.
- Configure the computer to boot from USB, plug in the SystemRescueCd on the USB stick and power-on
- At the command prompt use fdisk -l to identify your disks – typically /dev/sda and /dev/sdb
- To determine which is the source disk, you can use smartctl:|
smartctl -a /dev/sda
smartctl -a /dev/sdb - You can also determine drive serial numbers with hdparm:
hdparm -I /dev/sda
hdparm -I /dev/sdb - Once the source and destination disks are determined. Use ddrescue to copy the disk:
ddrescue -f /dev/sda /dev/sdb
Important Note: In this example /dev/sda is the source and /dev/sdb is the destination.
- Once the cloning of the disk completes, replace the problem disk with the new clone and start the RAID array rebuild again.