Before I tried to partition a 10TB HDD again, parted
saw it:
# parted /dev/sdb
(parted) print list
Model: ATA ST10000NM0016-1T (scsi)
Disk /dev/sdb: 10.0TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 10.0TB 10.0TB xfs primary
....
....
....
Then, I just tried to partition again but failed:
[root@localhost ~]# parted /dev/sdb
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt
Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? Yes
Error: end of file while reading /dev/sdb
Retry/Ignore/Cancel? Retry
Error: end of file while reading /dev/sdb
Retry/Ignore/Cancel? Cancel
(parted) q
Warning: Error fsyncing/closing /dev/sdb: Input/output error
Retry/Ignore? Retry
Warning: Error fsyncing/closing /dev/sdb: Input/output error
Retry/Ignore? Ignore
Then, the drive disappeared. I tried to reboot but still couldn't see the drive.
This post suggested to use gdisk /dev/sdb
. However, I think it is so corrupted that gdisk
can't recognize it:
# gdisk -l /dev/sdb
GPT fdisk (gdisk) version 0.8.10
Problem opening /dev/sdb for reading! Error is 2.
The specified file does not exist!
lsbk
's output:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 447.1G 0 disk
├─sda1 8:1 1 2G 0 part /boot
└─sda2 8:2 1 445.1G 0 part
├─centos-root 253:0 0 30G 0 lvm /
├─centos-swap 253:1 0 4G 0 lvm [SWAP]
├─centos-var 253:2 0 30G 0 lvm /var
├─centos-coredumps 253:3 0 30G 0 lvm /coredumps
└─centos-latest 253:4 0 351.1G 0 lvm /latest
ls -ltr /dev/sd*
's output:
brw-rw---- 1 root disk 8, 0 Feb 10 16:00 /dev/sda
brw-rw---- 1 root disk 8, 2 Feb 10 16:00 /dev/sda2
brw-rw---- 1 root disk 8, 1 Feb 10 16:00 /dev/sda1
lshw -class disk
, parted -l
and fdisk -l
also don't see the drive.
I see something fishy in dmesg
:
[Wed Feb 10 13:27:39 2021] ata13: softreset failed (1st FIS failed)
[Wed Feb 10 13:27:49 2021] ata13: softreset failed (device not ready)
[Wed Feb 10 13:28:06 2021] ata13: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Wed Feb 10 13:28:11 2021] ata13.00: qc timeout (cmd 0xec)
[Wed Feb 10 13:28:11 2021] ata13.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Wed Feb 10 13:28:17 2021] ata13: link is slow to respond, please be patient (ready=0)
[Wed Feb 10 13:28:21 2021] ata13: softreset failed (device not ready)
[Wed Feb 10 13:28:31 2021] ata13: softreset failed (1st FIS failed)
[Wed Feb 10 13:28:41 2021] ata13: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Wed Feb 10 13:28:51 2021] ata13.00: qc timeout (cmd 0xec)
[Wed Feb 10 13:28:51 2021] ata13.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Wed Feb 10 13:28:51 2021] ata13: limiting SATA link speed to 3.0 Gbps
[Wed Feb 10 13:28:52 2021] ata13: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[Wed Feb 10 13:29:13 2021] ata13.00: qc timeout (cmd 0x47)
[Wed Feb 10 13:29:13 2021] ata13.00: READ LOG DMA EXT failed, trying unqueued
[Wed Feb 10 13:29:13 2021] ata13.00: failed to get NCQ Send/Recv Log Emask 0x40
[Wed Feb 10 13:29:13 2021] ata13.00: ATA-10: ST10000NM0016-1TT101, SNE0, max UDMA/133
[Wed Feb 10 13:29:13 2021] ata13.00: 19532873728 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[Wed Feb 10 13:29:13 2021] ata13.00: failed to set xfermode (err_mask=0x40)
[Wed Feb 10 13:29:13 2021] ata13.00: disabled
[Wed Feb 10 13:29:13 2021] ata13: hard resetting link
[Wed Feb 10 13:29:23 2021] ata13: softreset failed (1st FIS failed)
[Wed Feb 10 13:29:23 2021] ata13: hard resetting link
[Wed Feb 10 13:29:33 2021] ata13: softreset failed (device not ready)
[Wed Feb 10 13:29:33 2021] ata13: hard resetting link
[Wed Feb 10 13:29:39 2021] ata13: link is slow to respond, please be patient (ready=0)
[Wed Feb 10 13:29:49 2021] ata13: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[Wed Feb 10 13:29:49 2021] ata13: EH complete
=================================
Update #1
I read this article and turned off acpi
, and another article suggested power issue and so I turned off tune-adm
. Then, the disk came back and I ran parted /dev/sdb
with mklabel gpt
just like last time, but this time, no Error: end of file while reading /dev/sdb
, but then when I continued to
mkpart primary xfs 0% 1%
, it gave me Error: /dev/sdb: unrecognised disk label
. I rebooted the machine and tried again:
(parted) mkpart primary xfs 0% 1%
(parted) mkpart primary xfs 1% 2%
(parted) mkpart primary ext4 2% 3%
(parted) mkpart primary ext4 3% 4%
(parted) mkpart primary btrfs 4% 5%
(parted) mkpart primary btrfs 5% 6%
(parted) mkpart primary xfs 6% 100%
(parted) print
Model: ATA ST10000NM0016-1T (scsi)
Disk /dev/sdb: 10.0TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 100GB 100GB xfs primary
2 100GB 200GB 100GB primary
3 200GB 300GB 100GB primary
4 300GB 400GB 100GB primary
5 400GB 500GB 100GB primary
6 500GB 600GB 100GB primary
7 600GB 10.0TB 9401GB primary
(parted) q
It works. But it seems so unstable. And I checked dmesg
again, and found similar but different failures:
[Thu Feb 11 00:58:31 2021] ata15.00: qc timeout (cmd 0xec)
[Thu Feb 11 00:58:31 2021] ata15.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Thu Feb 11 00:58:32 2021] ata15: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Thu Feb 11 00:58:42 2021] ata15.00: qc timeout (cmd 0xec)
[Thu Feb 11 00:58:42 2021] ata15.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Thu Feb 11 00:58:42 2021] ata15: limiting SATA link speed to 3.0 Gbps
[Thu Feb 11 00:58:44 2021] ata15: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[Thu Feb 11 00:59:12 2021] ata15.00: ATA-10: ST10000NM0016-1TT101, SNE0, max UDMA/133
[Thu Feb 11 00:59:12 2021] ata15.00: 19532873728 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[Thu Feb 11 00:59:12 2021] ata15.00: configured for UDMA/133
[Thu Feb 11 00:59:12 2021] scsi 14:0:0:0: Direct-Access ATA ST10000NM0016-1T SNE0 PQ: 0 ANSI: 5
[Thu Feb 11 00:59:12 2021] sd 14:0:0:0: [sdb] 19532873728 512-byte logical blocks: (10.0 TB/9.09 TiB)
[Thu Feb 11 00:59:12 2021] sd 14:0:0:0: [sdb] 4096-byte physical blocks
[Thu Feb 11 00:59:12 2021] sd 14:0:0:0: [sdb] Write Protect is off
[Thu Feb 11 00:59:12 2021] sd 14:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[Thu Feb 11 00:59:12 2021] sd 14:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[Thu Feb 11 00:59:19 2021] sdb:
[Thu Feb 11 00:59:19 2021] sd 14:0:0:0: [sdb] Attached SCSI removable disk
[Thu Feb 11 00:59:37 2021] SGI XFS with ACLs, security attributes, no debug enabled
Any idea what's going on?
Thanks.
It turns out it is a faulty SATA controller.
I replaced SATA cable, and the entire HDD. Same issue. Reinstalling the entire OS, same issue.
Replacing the SATA controller solves the issue.
While having similar HW problems with quite new disk - dmesg:
and similar until total fail - I checked nearly everything and googled the whole Internet.
I noted a scrubbing sound and sometimes the short sounds of spinning down and up. In my case the insufficient power cable was the reason. When connected to separate power source no new fail was registered.