Spin off from these previously asked questions
How to get free space from mounted drive Redhat 7
Update crypttab asks for Passphrase for fstrim
We have a HP 3PAR StoreServ 7400 with 170 VM's speread out across 38 hosts.
Here is the problem as I understand it: (Also I have been told some information that im not sure if it's true or not, I have read over the HP 3PAR StoreServ 7400 whitepaper and really cannot find anything that backs up what my storage guy is telling me. So throughout the below if anyone notices anything not true please let me know.)
The 3 PAR is broken up into 3 sections,
Layer 1: SSD used to cache and quick access of commonly accessed files.
Layer 2: and Layer 3: Some kind of spinning disc, what and why there are additional 2 layers im unsure of but my assumption is Layer 2 is used for data that is not most commonly accessed but access a bit and Layer 3 is used for storage of the rest.
Within the SSD portion as I have read in many articles when data is written to a SSD block and then deleted that block is not zeroed until new data is written to it, So when the data within the block is deleted the table that stores the mapping info gets updated, then when new data is written to that same block the block first needs to be zeroed and then it can be written to. This process within SSD if the drive is not trimed periodicity can lead to lower w/r speeds.
The 3PAR LUN is thin provisioned the VM's are Eager Thick provisioned.
According to my storage guy the 3PAR has a special feature built in that allows SSD storage not being used to be available to the other VM's as needed which makes no sense.
Fact Check:
A thick provisioned VM is a VMDK files, when the VM is created you specify the size of the VM and this creates a VMDK file. In my mind that tells me that if the VM is being accessed regularly the entire VMDK file is then moved to SDD, and what they are telling me is that even if the VMDK is set to use 40GB that some of that 40GB can be used on other VM's? That sounds more to me like a thin provisioned VM not a thick.
Ok getting to the problem.
On our windows systems we use sdelete to find and zero out unused blocks.
On our Linux Fedora system I have been all over trying to figure out how to get fstrim to work.
I did try the dd=write-big-file delete-big-file command and that sent the disk I/O through the roof, which was noticed, and I was told not to do that again.
Doing a little research it looks to me that sdelete pretty much does the same thing as dd=write-big-file delete-big-file so why does the disk I/O not go through the roof on windows systems?
So i think i have whittled it down to two solutions. Neither of which I know how to do.
- Somehow without v-motioning the VMs to a different storage array be able to run a fstrim like function on the entire SSD portion of the SAN.
Side note: If i understand everything I have read fstrim looks at every block to see if data is there and if it is needed, if not needed will zero out the block, where as sdelete writes a huge file and then deletes it. Which is why I am looking for a fstrim option across the entire SSD portion of the 3PAR.
- Longshot but the error i get with fstrim is:
[root@rhtest ~]# fstrim -v / fstrim: /: the discard operation is not supported
I have read that the discard option needs to be set on both the OS and the datastore but i cannot figure out where or how to set a discard option on the 3PAR i have both SSH and GUI access to the 3PAR.
I have been through countless walkthroughs on setting up discards within the OS and not matter how many different ways I spin it I always get the same error.
Yes i have also looked into other options zerofree was one, and a couple others that do not come to mind however they either worked like zdelete, or i read that they were very dangerous, I looked into the hdparam etc.
Below I will put some output about the OS in question they are all the same.
[root@rhtest ~]# hostnamectl
Static hostname: rhtest.domain.com
Icon name: computer-vm
Chassis: vm
Machine ID: f52e8e75ae704c579e2fbdf8e7a1d5ac
Boot ID: 98ba6a02443d41cba9cf457acf5ed194
Virtualization: vmware
Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.2:GA:server
Kernel: Linux 3.10.0-327.el7.x86_64
Architecture: x86-64
[root@rhtest ~]# blkid
/dev/block/8:2: UUID="2OHGU8-ir1w-LLGB-6v72-zZqN-CIaX-FjGImJ" TYPE="LVM2_member"
/dev/block/253:1: UUID="ad872f09-5147-4252-af56-aa6244219515" TYPE="xfs"
/dev/block/8:1: UUID="83aac355-a443-4ff9-90fa-9f6da8e31cc2" TYPE="xfs"
/dev/block/253:0: UUID="dbe56f6a-2a4a-42da-82e2-bef9a73caafb" TYPE="swap"
[root@rhtest ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 50G 0 disk
ââsda1 8:1 0 500M 0 part /boot
ââsda2 8:2 0 49.5G 0 part
âârhel_-rhtest-swap 253:0 0 2G 0 lvm [SWAP]
âârhel_-rhtest-root 253:1 0 47.5G 0 lvm /
sdb 8:16 0 50G 0 disk
sr0 11:0 1 1024M 0 rom
[root@rhtest ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel_-rhtest-root 48G 883M 47G 2% /
devtmpfs 991M 0 991M 0% /dev
tmpfs 1001M 0 1001M 0% /dev/shm
tmpfs 1001M 8.5M 993M 1% /run
tmpfs 1001M 0 1001M 0% /sys/fs/cgroup
/dev/sda1 497M 124M 374M 25% /boot
tmpfs 201M 0 201M 0% /run/user/0
Being able to run fstrim on the / partitions would be the best solution however with they way your ESXi is configured it would not be possible.
You need to be able to enable discards on both the VM and the storage device.
Trying to reduce to size of a partition or logical volume with the xfs filesystem cannot be done this is a known bug with fedora. If you are interested in this functionality please contact Red Hat support and reference Red Hat bugzilla 1062667, and provide your use-case for needing XFS reduction / shrinking.
As a possible work around in some environments, thin provisioned LVM volumes can be considered as an additional layer below the XFS file system.
If the VM's are eager thick provisioned VMDK, which means that there is nothing to reclaim when you are attempting to trim (technically speaking; SCSI UNMAP) your volumes.
If the back-end storage is running thin provisioning then you also need to use lazy zeroed VMDK files in order to reduce the storage and make it possible for the backend to cache/dedup the warm data.
Two possible options:
From what I can tell this does the same thing as sdelete however it can cause a spike in disk I/O as well as take a while to run.
Something to try overnight
Either option is not the best but reformatting every VM to get ext3 or ext4 does not sound feasible.
What you might be able to do is setup an affinity rule for all linux VM’s and use option 1 from above.
You are using eager thick provisioned VMDK, which means that there is nothing to reclaim when you are attempting to trim (technically speaking; SCSI UNMAP) your volumes.
If the back-end storage is running thin provisioning then you also need to use lazy zeroed VMDK files in order to reduce the storage and make it possible for the backend to cache/dedup the warm data.