We have a CentOS 6.4 based server attached to Hitachi HNAS 3080 storage and observed the kernel remount the filesystem in read-only mode:
May 16 07:31:03 GNS3-SRV-CMP-001 kernel: [1259725.675814] EXT3-fs (dm-1): error: remounting filesystem read-only
This happened after several I/O errors and all paths to the device reportedly going down:
May 16 07:31:03 GNS3-SRV-CMP-001 multipathd: mpatha: remaining active paths: 0
I have been looking at sar logs and can see few very large (2 seconds) await times:
07:40:00 dev8-0 17.91 112.04 98.03 11.73 0.00 0.20 0.07 0.12
07:40:00 dev8-16 0.23 1.85 0.00 8.00 0.00 3.71 3.71 0.09
07:40:00 dev8-32 91.50 8338.76 5292.93 148.98 8.38 91.60 9.76 89.35
07:40:00 dev252-0 91.27 8336.91 5292.93 149.34 17.79 194.88 9.79 89.38
07:40:00 dev252-1 674.80 8168.16 5292.93 19.95 1473.53 2183.60 1.32 88.98
The duration between 07:30:00-07:40:00 does happen the time when the filesystem got mounted read-only. However, even under normal conditions, one repeated observation is that the await time for underlying devices is much lower than that of the multipath device. For instance:
00:00:00 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
00:10:00 dev8-0 19.27 129.41 78.61 10.80 0.01 0.27 0.16 0.32
00:10:00 dev8-16 0.23 1.80 0.00 8.00 0.00 0.86 0.84 0.02
00:10:00 dev8-32 94.88 10285.16 3363.48 143.86 3.39 35.76 6.83 64.82
00:10:00 dev252-0 94.65 10283.34 3363.48 144.18 3.64 38.47 6.86 64.89
00:10:00 dev252-1 435.06 10087.12 3363.48 30.92 118.42 272.21 1.47 64.12
dev8-0 happens to be the local disk, while dev8-16 (/dev/sdb
) and dev8-32 (/dev/sdc
) are the underlying ones for dev252-0 (/dev/mapper/mpatha
). dev252-1 (/dev/mapper/mpathap1
) is a single partition spanning the whole of the multipath device. Here is output from multipath -ll
:
mpatha (2521501cbffffffffe96773b50ec30020) dm-0 BlueArc,NAS Platform
size=10T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=enabled
| `- 9:0:0:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=1 status=active
`- 8:0:0:0 sdb 8:16 active ready running
Why should the await time for /dev/mapper/mpathap1
be so much higher than that of /dev/mapper/mpatha
or even /dev/sdb
or /dev/sdc
?
As user the-wabbit suggests, there is request-merging going on. You can see that in the column avgrq-sz, the average request size - which shows a significant increase.
Now 'await' is the time spent in the queue plus the time spent servicing those requests. If a small request, let's call it 'x', is merged with a couple of other requests (y and z, issued after x), then x will
This will obviously have a negative impact on the await statistic, mostly because of the way await is calculated, without actually signifying a problem in itself.
Now let's take a look at /dev/sdb (dev8-16). Did you know that you are not using that path? You have two priority groups in your multipath config, one is
and on is
You probably have
in your configuration (which is the default).
If you want to prevent the IO errors in case both paths are down, you could try:
in your multipath.confNow the real question remains, why do both paths go down?