I am in the process of building my first RAID5 array. I've used mdadm to create the following set up:
root@bondigas:~# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90
Creation Time : Wed Oct 20 20:00:41 2010
Raid Level : raid5
Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Wed Oct 20 20:13:48 2010
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 1% complete
UUID : f6dc829e:aa29b476:edd1ef19:85032322 (local to host bondigas)
Events : 0.12
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
4 8 64 3 spare rebuilding /dev/sde
While that's going I decided to format the beast with the following command:
root@bondigas:~# mkfs.ext4 /dev/md1p1
mke2fs 1.41.11 (14-Mar-2010)
/dev/md1p1 alignment is offset by 63488 bytes.
This may result in very poor performance, (re)-partitioning suggested.
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=16 blocks, Stripe width=48 blocks
97853440 inodes, 391394047 blocks
19569702 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
11945 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Writing inode tables: ^C 27/11945
root@bondigas:~# ^C
I am unsure what to do about "/dev/md1p1 alignment is offset by 63488 bytes." and how to properly partition the disks to match so I can format it properly.
Since alignment pops up in a lot of places -
- I'll expand a bit on the question.
Aligning partitions
"Linux on 4kB-sector disks" (IBM developerWorks) walks through the steps with fdisk, parted and GPT fdisk.
With fdisk:
Aligning the file system
This is primarily relevant for RAID (levels 0, 5 and 6; not level 1); the file system performs better if it is created with knowledge of the stripe sizes.
It can also be used for SSDs if you wish to align the file system to the SSD erase block size (Theodore Tso, Linux kernel developer).
In the OP post
mkfs
apparently auto-detected the optimal settings, so no further action was required.If you wish to verify, for RAID the relevant parameters are:
stripe size / block size
(ex. 64k / 4k = 16)stride * #-of-data-disks
(ex. 4 disks RAID 5 is 3 data disks; 16*3 = 48)From Linux Raid Wiki. See also this simple calculator for different RAID levels and number of disks.
For SSD erase block alignment the parameters are:
From Theodore's SSD post.
Aligning LVM extents
The potential issue is that LVM creates a 192k header. This is a multiple of 4k (so no issue with 4k-block disks) but may not be a multiple of RAID stripe size (if LVM runs on a RAID) or SSD erase block size (if LVM runs on SSD).
See Theodore's post for the workaround.
A friend of mine pointed out that I can just mkfs.ex4 right on
/dev/md1
without partitioning anything, so I deleted the partition and did that and it appears to be formatting now.I find this way to be the easiest
or an alternate dirty method would simply go like this
It seems like mkfs.ext4 wants filesystems on your RAID to start on a 64 KiB boundary. If you use the whole disk, it starts at 0 which is of course also a multiple of 64 KiB...
Most partitioning tools nowadays will use a 1 MiB boundary by default anyway (fdisk probably doesn't).
The reason for this is that most hard disks & SSDs use fysical sectors on the device that are much bigger than the logical sectors. The result of that is that if you read a logical sector of 512 bytes from disk, the hardware actually has to reads a much larger amount of data.
In case of your software RAID device something similar happens: data on it is stored in "chunks" of 64 KiB with the default mdadm settings.