I would like to overwrite a very large hard drive (18TB) with random bytes, to then check smart data for reallocated sectors or other errors.
Since badblocks has some limitations on number of blocks it will work with in a single run, I have tried the "cryptsetup method" described on archlinux wiki:
https://wiki.archlinux.org/title/Badblocks#Finding_bad_sectors
I set up an encrypted logical device eld on the whole drive and then used the command "shred" to write zeroes to the opened eld device:
cryptsetup open /dev/device eld --type plain --cipher aes-xts-plain64
shred -v -n 0 -z /dev/mapper/eld
It went on to print lines such as
shred: /dev/mapper/eld: pass 1/1 (000000)...870MiB/17TiB 0%
shred: /dev/mapper/eld: pass 1/1 (000000)...1.7GiB/17TiB 0%
...
shred: /dev/mapper/eld: pass 1/1 (000000)...4.1TiB/17TiB 24%
but then it stopped at 4.1TiB/17TiB written. I've verified this with hexdump, zeroes were not written beyond byte address 0x428249b0000 (4570459340800 ~ 4.156 TiB):
hexdump -C --skip 0x428249a0000 /dev/mapper/eld | head
428249a0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
428249b0000 b3 cd d0 34 72 15 f2 2c f6 32 90 fb 69 24 1f ec |...4r..,.2..i$..|
428249b0010 a0 f4 88 a5 56 e7 13 82 94 e5 e0 f5 37 da c3 59 |....V.......7..Y|
428249b0020 9b 55 9f d8 39 a1 41 dc 52 ca 7b 3a 95 f5 59 e2 |.U..9.A.R.{:..Y.|
Many standard commands seem to have problems with high capacity disks because the numbers involved are too big for 32 bit data types. Which read/write tools on Linux are able to read/write beyond these 2TiB,4TiB imaginary boundaries reliably?
Edit: Updated according to comment
I would simply use
Here
/dev/sdX
is the device for the hard disk.Instead of cryptsetup + shred, I used cryptsetup + pv (cat should work instead of pv too, but it would not be giving any progress info) and pointed stdin to /dev/zero:
This has the advantage (as compared to dd) that no obscure arguments need to be specified and performance over a SATA 3.3 6Gb/s link is good (>200MiB/s).
pv still failed when the end was reached, but I have checked that nevertheless it did overwrite the whole logical device with zeroes. Which means dm-crypt overwrote the whole hard drive with pseudo-random bytes.
Now hard drive errors can be checked in at least two ways:
1.Looking for degraded SMART data (like reallocated sectors) in the output of
2.Reading data from /dev/mapper/eld and checking that all read bytes have value zero. Running cmp command from diffutils to do this comparison:
It will either print byte address of the first mismatch and exit with error, or it won't find any mismatch and then it will print "cmp EOF on /dev/mapper/eld ..." (and still exit with error).
Mismatch means that either hard drive has a permanent failure of record at that position, or it can be a random error that will not repeat exactly at the same position.
On the first run of cmp, I indeed got an error already after 8 seconds, which I was very surprised to see. SMART data did not show any degradation, and syslog didn't reveal any error messages regarding the hard drive.
I then tried to run the cmp command again to check if the record error is real, but the mismatch at that position didn't occur again. It was some random error in the whole read+evaluate process. So don't rely on a single run of cmp command; in case a mismatch is found, run it again. If the error disappears, then ignore the first mismatch or maybe try once again. If the error persists, then return the hard drive to the seller as it is most probably defective and its degradation in time may be faster compared to a healthy hard drive.
.