I run a Proxmox cluster, and on this cluster, I have a few VMs on a private network, with a (Proxmox-managed) CEPH storage backend for the VM disks.
One (KVM) VM running "Ubuntu 16.04 server minimal vm" is configured with a second "hard disk", set up as a one disk ZFS pool "storage", using
zpool create storage /dev/sdb1
which gets automounted to /storage. This VM also runs the nfs-kernel-server.
This directory is then exported through nfs with the following line in /etc/exports:
/storage 10.10.0.0/16(rw,sync)
I mount this export from two other machines (one VM running Ubuntu 14.04, one physical machine running Ubuntu 16.04 server) through
mount -t nfs4 10.10.3.1:/storage /mnt
Since this is my playground for testing a storage setup for a planned two web servers hosting an old perl app writing to Berkeley DB files, I decided to test concurrent writes in a simple way to test my shared storage backend, with a simple php script:
<?php
$line = str_repeat($argv[1], 30) . "\n";
for ($i = 1; $i <= 10000; $i++)
{
$of = fopen("test.txt", "a") or DIE("can't open output file\n");
fwrite($of, sprintf("%04d-", $i) . $line);
fclose($of);
}
?>
I go to the shared storage directory (this is where the php script is also located), and run it using
php test.php 1
from the first remote machine, and with
php test.php 2
from the second machine.
My issue is that some writes don't seem to make it to the destination file, i.e. I get output like this:
9286-222222222222222222222222222222
9287-222222222222222222222222222222
9288-222222222222222222222222222222
9289-222222222222222222222222222222
7473-111111111111111111111111111111
7474-111111111111111111111111111111
7475-111111111111111111111111111111
7476-111111111111111111111111111111
7477-111111111111111111111111111111
7478-111111111111111111111111111111
7479-111111111111111111111111111111
9297-222222222222222222222222222222
9298-222222222222222222222222222222
7481-111111111111111111111111111111
9300-222222222222222222222222222222
7482-111111111111111111111111111111
9302-222222222222222222222222222222
7484-111111111111111111111111111111
and verifying that the line doesn't get cached and written at a different position in the file:
nas:/storage# grep "9290-" test.txt
9290-111111111111111111111111111111
nas:/storage#
i.e. it's missing (among others) the
9290-222222222222222222222222222222
line. At this point, I'm hoping that I'm simply missing some configuration parameters or a step or two during setup that would fix this problem.
Edit: I only just noticed the writes seem to block each other out, i.e. the gaps between the line numbers always correspond with the number of interleaving writes from the other remote "writer". I'm still no closer to an explanation of why this happens nor how to resolve it, though.
Also, I had "Discard" and "IO thread" active on proxmox for the vm hard disk, and disabled these two options, to no effect (didn't think it would, but checked nevertheless). The behavior is the same.
Okay, apparently Berkeley DB offers locking mechanisms for concurrent access, so my "simple test scenario" is inadequate in that locking is required to happen on the application level; my test script doesn't do anything of the kind, so the test doesn't match the use case.
Consequently, I'm considering this question answered. Thanks for the replies!