Ping a Specific Port

Question

David Scherfgen

Asked: 2018-05-31 07:31:37 +0800 CST2018-05-31 07:31:37 +0800 CST 2018-05-31 07:31:37 +0800 CST

Should sector-unaligned HDD writes be slower than sector-aligned writes?

772

Background: I'm planning to use ZFS and I need to find the correct ashift parameter for my harddrive, which should be log2(sector_size), e.g. 9 for 512 byte sectors.

My harddrive reports a physical and logical sector size of 512 bytes. I read that some harddrives report wrong information to prevent compatibility problems with operating systems that assume 512 byte sectors. I'm not sure whether that's the case with my harddrive.

So I wrote a small program to help me determine the true physical sector size. The program opens an empty partition on my harddrive and writes blocks of 4096 bytes at 1000 randomly selected locations spread within 1 GiB. The random locations are first aligned to 4096 bytes, then an offset is added. The program performs these 1000 random writes using different offsets and measures how long the writes took for each offset. The first offset is zero, then it is increased in steps of 256 bytes.

When opening the partition for writing, I use the O_WRONLY | O_SYNC | O_DIRECT flags to get as close to the hardware as I can, i.e. circument as many caches as I can. I also make sure that my buffer is properly aligned in memory.

Here's what I would expect:

For non-zero offsets, the addresses I'm writing to are not aligned to the harddrive's physical sectors (regardless of whether it has 512 or 4096 byte physical sectors). There is at least one sector that has to be modified only partially, so the harddrive has to read that sector, update parts of it and then write it back. That should be the slower case because a read is involved (read-modify-write).
For zero offset, regardless of whether the harddrive has physical 512 or 4096 byte sectors, the write operations should not require reading any sectors. All sectors affected by the writes should simply be overwritten. This should be the faster case.

But in fact, I cannot notice any difference. The 1000 writes always take around 8.5 seconds. The offset doesn't seem to have any influence:

Offset  Time (ms) for 1000 random writes
------  --------------------------------
0       8459.11
256     8450.69
512     8633.82
768     8533.94
1024    8467.36
1280    8450.63
1536    8525.72
1792    8533.96
2048    8450.64
2304    8450.79
2560    8442.37
2816    8442.38
3072    8442.28
3328    8450.82
3584    8442.27
3840    8450.81

Additional observations/remarks:

Writing units of 512 bytes results in similar numbers (i.e. no noticeable influence of the offset).
Just for the case that my partition itself is not aligned to a physical sector boundary, I also tried increasing the offset in 1 byte steps. That way, the "ideal" offset would be found eventually - but still, I couldn't identify any difference.

Can anyone explain this?

For sake of completion, here's my program (in case anyone wants to run it, insert the path to an empty block device into the open call):

#include <chrono>
#include <fcntl.h>
#include <iostream>
#include <random>
#include <unistd.h>

int main()
{
        const int bufferSize = 4096;
        char buffer[bufferSize] __attribute__((aligned(4096)));
        for (int offset = -256; offset < 4096; offset += 256)
        {
                std::mt19937 generator;
                std::uniform_int_distribution<int> distribution(0, 1024 * 1024 * 1024 / 4096);
                if (offset >= 0) std::cout << offset << "\t";
                else std::cout << "Warming up ..." << std::endl;
                int f = open("PATH_TO_EMPTY_BLOCK_DEVICE", O_WRONLY | O_SYNC | O_DIRECT);
                auto t0 = std::chrono::high_resolution_clock::now();
                for (int i = 0; i < 1000; ++i)
                {
                        lseek(f, SEEK_SET, 4096 * distribution(generator) + offset);
                        if (write(f, buffer, bufferSize) != bufferSize) exit(1);
                }
                auto t1 = std::chrono::high_resolution_clock::now();
                close(f);
                if (offset >= 0) std::cout << (1000 * std::chrono::duration_cast<std::chrono::duration<double>>(t1 - t0).count()) << std::endl;
        }

        return 0;
}

2 Answers

Voted

Martin · Answer 1 · 2018-05-31T08:34:06+08:00

Martin

2018-05-31T08:34:06+08:002018-05-31T08:34:06+08:00

4096 bytes x 1000 times = 4 MBytes data. Chances are that your hard drive has 64 MB of cache, if not more, 256 MB is not uncommon on modern drives.

Your methodology will work better if you increase the write size significantly, maybe 64 times, in order to actually see the physical drives characteristics.

0

Anon · Answer 2 · 2018-06-18T20:49:50+08:00

For non-zero offsets, the addresses I'm writing to are not aligned to the harddrive's physical sectors (regardless of whether it has 512 or 4096 byte physical sectors).

[...]

Just for the case that my partition itself is not aligned to a physical sector boundary, I also tried increasing the offset in 1 byte steps.

Which OS are you using? If it's Linux, how were you able to do write at a starting offset which wasn't a multiple of 512 bytes when you were using O_DIRECT against an underlying block device?

Should sector-unaligned HDD writes be slower than sector-aligned writes?

Aligning to the "true" sector size should be less of a hit but just how much better is highly device, data and pattern dependent (Toshiba claim the performance decrease due to misalignment could be as high as 20%). SSDs (which are not what you're asking about but may have to do large erases before laying down the data) are an excellent example because bad write alignment can lead to unnecessary write amplification. Having said that I'm told that modern devices internally have sectors much larger than 4kbytes but almost never expose this to higher levels.

Can anyone explain this [the results I see]?

Well you're most likely to see the impact of a read-modify-write (RMW) when you are going in the fastest possible situation that hits it (as the difference will be largest). Because you're doing random writes that force the OS to wait for true completion it is likely you are in a slower situation and the performance hit is just lost in the noise. As others have stated you also have to defeat any cache that may mask problems - if you have somehow populated the cache with the sectors that are going to be used by the RMW process then again the hit could be entirely masked. It could be your example program is flawed. Have you considered using fio?

My harddrive reports a physical and logical sector size of 512 bytes

If the disk wants to lie to that extent (not indicating a better physical size) trying to second guess it's behaviour beyond aligning partitions to 4kbytes is going to be challenging. OpenZFS does contain a list of drives whose fake block size it will try to compensate for though.

The main reason I've read for people using a non-default ashift with ZFS is to be able to add disks which have a 4kbyte native block size into the mix at a later stage.

Should sector-unaligned HDD writes be slower than sector-aligned writes?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?