Being a VSAN beta tester I decided to upgrade to GA version as VMWare suggests for production sites. They say, it is not possible/supported to upgrade from beta to GA version, and I did not upgrade, but full wipe/reinstall of ESX hosts. However during install I found that system is extremely slow, installer boots up in a couple of hours, then all system scan operations took about 30-40 minutes each. Installed system always stuck at
usbarbitrator start
message.
I enabled logging to a serial console and here what I see these messages:
2014-03-31T20:00:54.517Z cpu2:33262)LSOMCommon: LSOM_RegisterDiskAttrHandle:99: t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710 is a SATA disk
2014-03-31T20:00:54.532Z cpu2:33262)LSOMCommon: LSOM_RegisterDiskAttrHandle:103: DiskAttrHandle:0x4111c977b928 is added to disk:t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710 by module:plog
2014-03-31T20:00:54.551Z cpu2:33262)PLOG: PLOG_InitMDDevice:830: Registered diskAttrHandle:0x4111c977b928 on disk t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710
2014-03-31T20:00:54.568Z cpu2:33262)PLOG: PLOG_AllocOneRDT:539: You're wasting 524288 bytes by not requesting a length that is not a multiple of the allocation granularity 1048576
2014-03-31T20:00:54.583Z cpu2:33262)PLOG: PLOG_InitElevator:1782: Initializing PLOG Elevator UUID 5287745f-e1c5-269f-ce67-c8d8d4c03967
2014-03-31T20:00:54.595Z cpu2:33262)LSOMCommon: LSOMSetWCEnableSATA:1071: SATA disk t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710 disabling cache...
2014-03-31T20:00:54.611Z cpu2:33262)PLOG: PLOG_InitElevator:1845: Initializing PLOG Elevator UUID on device t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710:2 5287745f-e1c5-269f-ce67-c8d8d4c03967
2014-03-31T20:00:54.630Z cpu2:33262)PLOG: PLOG_InitMDDevice:843: PLOG device t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710:2 is initialized with device handles
2014-03-31T20:01:24.648Z cpu1:32798)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x4136804461c0, 0) to dev "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" on path "vmhba37:C0:T0:L0" Failed: H:0x5 D:0x0 P:0x0 Possible sense $
2014-03-31T20:01:24.670Z cpu1:32798)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" state in doubt; requested fast path state update...
2014-03-31T20:01:24.691Z cpu1:32798)ScsiDeviceIO: 2337: Cmd(0x4136804461c0) 0x28, CmdSN 0x1 from world 0 to dev "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2014-03-31T20:01:24.713Z cpu1:32798)LSOMCommon: IORETRYCompleteIO:389: Throttled: 0x4136c8c6af00 IO type 264 (READ) isOdered:NO since 30065 msec status Maximum kernel-level retries exceeded
2014-03-31T20:01:24.729Z cpu9:33541)WARNING: LSOM: LSOMEventNotify:4570: VSAN device 5287745f-e1c5-269f-ce67-c8d8d4c03967 is under permanent error.
2014-03-31T20:01:24.743Z cpu9:33541)WARNING: LSOM: LSOMPostDiskEvent:2114: Unable to post disk event for 5287745f-e1c5-269f-ce67-c8d8d4c03967: Not ready
2014-03-31T20:01:24.757Z cpu9:33541)LSOM: LSOMPublishDisk:1959: Throttled: Unable to post disk event for 5287745f-e1c5-269f-ce67-c8d8d4c03967: Not ready
2014-03-31T20:01:54.774Z cpu1:32798)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x413680441bc0, 0) to dev "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" on path "vmhba37:C0:T0:L0" Failed: H:0x5 D:0x0 P:0x0 Possible sense $
2014-03-31T20:01:54.797Z cpu1:32798)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" state in doubt; requested fast path state update...
2014-03-31T20:01:54.817Z cpu1:32798)ScsiDeviceIO: 2337: Cmd(0x413680441bc0) 0x28, CmdSN 0x2 from world 0 to dev "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2014-03-31T20:01:54.839Z cpu1:32798)LSOMCommon: IORETRYCompleteIO:389: Throttled: 0x4136c8c6ae40 IO type 264 (READ) isOdered:NO since 30063 msec status Maximum kernel-level retries exceeded
2014-03-31T20:02:05.014Z cpu15:32958)VMW_SATP_LOCAL: satp_local_updatePathStates:458: Failed to update path "vmhba37:C0:T0:L0" state. Status=Transient storage condition, suggest retry
2014-03-31T20:02:19.017Z cpu1:32798)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" state in doubt; requested fast path state update...
2014-03-31T20:02:19.038Z cpu1:32798)ScsiDeviceIO: 2337: Cmd(0x413680444b40) 0x12, CmdSN 0x318 from world 0 to dev "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0.
2014-03-31T20:02:24.857Z cpu1:32798)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x4136804411c0, 0) to dev "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" on path "vmhba37:C0:T0:L0" Failed: H:0x5 D:0x0 P:0x0 Possible sense $
2014-03-31T20:02:24.879Z cpu1:32798)ScsiDeviceIO: 2337: Cmd(0x4136804411c0) 0x28, CmdSN 0x3 from world 0 to dev "t10.ATA_____WDC_WD2000FYYZ2D01UL1B0_______________________WD2DWCC1P0395710" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
I dont see them if I take out all of the disks. I verified all disks are readable, no errors no bad blocks,etc. I understand that my server could be not in HCL, but Beta version worked fine, only GA has this issue.
To mark it as answered I repeat my comment above as an answer here:
I found the issue and its solution, it is strange, but the following operation helped: before installing ESXi I booted from linux live cd and checked all my disks. If did full read/write test to the disk it did not have errors after during install. So I went and wiped all drives and installation went well. it looks to me that VSAN started using some different mechanism or data labeling and old information left on the drives. I did not find anything about this error anywhere, so I leave this here for those who have the same issue.