First thing, we are not even sure this is a udev problem, but we need somewhere to start asking... We have a Hitachi fibre-channel SAN serving volumes to a couple of machines running ubuntu server 12.04 amd64.
For mapping purposes we use the udev-generated /dev/disk/by-id identifiers
...
/dev/disk/by-id/scsi-1HITACHI_750505270125
/dev/disk/by-id/scsi-1HITACHI_750505270125-part1
/dev/disk/by-id/scsi-1HITACHI_750505270126
/dev/disk/by-id/scsi-1HITACHI_750505270126-part1
...
where the last 4 digits (0125, 0126, 0127...) identify the LUNs created on the Hitachi, so we know which physical volume we're accessing.
We found a weird problem, where we had a 1.1T volume on LUN 125 and we broke it down into smaller chunks on the cabin side. After reassigning one of the new drives to the server it seems the volume size is cached (see the 1150.5 GB size)...
root@server1:~# fdisk -l /dev/disk/by-id/scsi-1HITACHI_750505270125
Disk /dev/disk/by-id/scsi-1HITACHI_750505270125: 1150.5 GB, 1150514364416 bytes
255 heads, 63 sectors/track, 139875 cylinders, total 2247098368 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/disk/by-id/scsi-1HITACHI_750505270125-part1 63 1048575999 524287968+ 83 Linux
The weird part is that we have the same volumes connected to a different machine. They are not active, but they are still visible. We saw the same behaviour, but after rebooting the drives look as they should (see the 536.9 GB size):
root@server2:~# fdisk -l /dev/disk/by-id/scsi-1HITACHI_750505270125
Disk /dev/disk/by-id/scsi-1HITACHI_750505270125: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders, total 1048576000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/disk/by-id/scsi-1HITACHI_750505270125-part1 63 1048575999 524287968+ 83 Linux
The funny part is that we partitioned the drive on the second server (server2), the one that sees the right size, and on the first server (server1) we can see that partition, even though the actual drive size is still the old one. We even formatted it and mounted it on server2, wrote a txt file, unmounted it, remounted it on server1 and, sure enough, we can see and access the txt file.
Looks like somewhere along the way someone is caching volume sizes?
Just in case, after detaching and reattaching the drives we re-scan the LUNs and run udevadm trigger
to refresh the udev tree...
We are not really comfortable using the drives with this disparity, and if we need to reboot to get the system to show real sizes we lose all the advantages of hotplugging... Any ideas on how is this happening and is it safe to use those volumes without restarting?
As a side question, when we detach the drives from the fibre cabin, we run udevadm trigger
and looks like udev just adds new drives (devices), but it doesn't remove devices that are gone... is that supposed to be that way?