cyberx86's questions -server

cyberx86

Asked: 2011-11-06 23:01:04 +0800 CST

5.5GB written daily to 1.2GB root volume - 4 times previous levels

Problem: I recently revamped one of my servers, it was tested prior to use, and functions well, however, a few days ago, I noticed approximately 4 times the usual amount of writes to the root volume. This is not a performance issue - the server runs fine.

My revamping was fairly extensive (a full rebuild) so I don't have a lot to go on in terms of cause. Briefly, my changes included:

Upgrading Amazon's Linux (from 2011.02 to 2011.09) - which also resulted in a change from ext3 to ext4 for the root volume
Moving from php-fcgi to php-fpm (currently using tcp)
Moving from a reverse-proxy (nginx -> apache) setup, to nginx only
Replacing vsftpd with pure-ftpd
Replacing dkim-proxy with opendkim
Replacing webmin with ispconfig
Adding varnish as a caching layer for dynamic files (overkill for the volume of hits these sites get, but its an experiment)
Adding a swap partition

Basic Setup:

My swap space is mounted on its own EBS volume - the writes to the swap volume are negligible - I have essentially discounted this as the cause (there is ample free memory - and both free and iostat show minimal swap usage).
My data (mysql database, user files (websites), all logs (from /var/log), mail, and varnish files on their own EBS volume (using mount --bind). The underlying EBS volume is mounted at /mnt/data
My remaining files - operating system and core server applications (e.g. nginx, postfix, dovecot, etc.) - are the only thing on the root volume - a total of 1.2GB.

The new setup runs 'smoother' (faster, less memory, etc.) than the old system, and has been stable for 20 days (mid-October) - as far as I can tell, the elevated writes have existed for all this time.

Contrary to what I would expect, I have a low read volume (my reads are about 1.5% of my writes, both in terms of blocks and bytes on my root volume). I have not changed anything on the root volume (e.g. new installations, etc) in the past few days, yet write volume continues to be much higher than expected.

Objective: to determine the cause of the increased writes to the root volume (essentially, figure out if it is a process (and what process), the different (ext4) file system, or another issue (e.g. memory)).

System Information:

Platform: Amazon's EC2 (t1.micro)
O/S: Amazon's Linux 2011.09 (CentOS/RHEL derived)
Linux kernel: 2.6.35.14-97.44.amzn1.i686
Architecture: 32-bit/i686
Disks: 3 EBS volumes:
- xvdap1, root, ext4 filesystem (mounted with noatime)
- xvdf, data, xfs filesystem (mounted with noatime,usrquota,grpquota)
- xvdg, swap

The root and data volumes are snapshotted once a day - however, this should be a 'read' operation, not a write one. (Additionally, the same practise was used on the previous server - and the previous server was also a t1.micro.)

The data that caused me to look into the I/O was in the details of my last AWS bill (which had above normal I/O - not unexpected, since I was setting up this server, and installing a lot of things at the start of the month), and subsequently at the CloudWatch metrics for the attached EBS volumes. I arrive at the '4 times normal' figure by extrapolating the i/o activity from November (when I haven't altered the server) to estimate a monthly value and comparing that with the I/O from past months when I was not working on my previous server. (I do not have exact iostat data from my previous server). The same quantity of writes have persisted through November, 170-330MB/hr.

Diagnostic information (uptime for the following outputs is 20.6 days):

Cloudwatch metrics:

root volume (write): 5.5GB/day
root volume (read): 60MB/day
data volume (write): 400MB/day
data volume (read): 85MB/day
swap volume (write): 3MB/day
swap volume (read): 2MB/day

Output of: df -h (for root volume only)

Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            4.0G  1.2G  2.8G  31% /

The used space has not increased noticeably since this system was launched (which to me suggests that files are being updated, not created/appended).

Output of: iostat -x (with Blk_read, Blk_wrtn added in):

Linux 2.6.35.14-95.38.amzn1.i686  11/05/2011      _i686_

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s   Blk_read   Blk_wrtn avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.00     3.42    0.03    2.85     0.72    50.19  2534636  177222312   17.68     0.18   60.93   0.77   0.22
xvdf              0.00     0.03    0.04    0.35     1.09     8.48  3853710   29942167   24.55     0.01   24.28   2.95   0.12
xvdg              0.00     0.00    0.00    0.00     0.02     0.04    70808     138160   31.09     0.00   48.98   4.45   0.00

Output of: iotop -d 600 -a -o -b

Total DISK READ: 6.55 K/s | Total DISK WRITE: 117.07 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
  852 be/4 root          0.00 B     26.04 M  0.00 %  0.42 % [flush-202:1]
  539 be/3 root          0.00 B    528.00 K  0.00 %  0.08 % [jbd2/xvda1-8]
24881 be/4 nginx        56.00 K    120.00 K  0.00 %  0.01 % nginx: worker process
19754 be/4 mysql       180.00 K     24.00 K  0.00 %  0.01 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3106 be/4 mysql         0.00 B    176.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19751 be/4 mysql         4.00 K      0.00 B  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3194 be/4 mysql         8.00 K     40.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3156 be/4 mysql         4.00 K     12.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3099 be/4 mysql         0.00 B      4.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
24216 be/4 web14         8.00 K     10.43 M  0.00 %  0.00 % php-fpm: pool web14
24465 be/4 web19         0.00 B      7.08 M  0.00 %  0.00 % php-fpm: pool web19
 3110 be/4 mysql         0.00 B    100.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
  579 be/4 varnish       0.00 B     76.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
  582 be/4 varnish       0.00 B    144.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
  586 be/4 varnish       0.00 B      4.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
  587 be/4 varnish       0.00 B     40.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
 1648 be/4 nobody        0.00 B      8.00 K  0.00 %  0.00 % in.imapproxyd
18072 be/4 varnish     128.00 K    128.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
 3101 be/4 mysql         0.00 B    176.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19749 be/4 mysql         0.00 B     32.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19750 be/4 mysql         0.00 B      0.00 B  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19752 be/4 mysql         0.00 B    108.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19788 be/4 mysql         0.00 B     12.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
  853 be/4 root          4.00 K      0.00 B  0.00 %  0.00 % [flush-202:80]
22011 be/4 varnish       0.00 B    188.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish

To summarize the above (and extrapolate to daily values) it looks like, over the 10 minute period:

[flush-202] wrote 26MB = 3.6GB/day
php-fpm wrote 17.5MB = 2.4GB/day
MySQL wrote 684KB = 96MB/day
Varnish wrote 580KB = 82MB/day
[jbd2] wrote 528KB = 74MB/day
Nginx wrote 120KB = 17MB/day
IMAP Proxy wrote 8KB = 1.1MB/day

Interestingly enough, it would appear that between [flush-202] and php-fpm it is possible to account for the daily volume of writes.

Using ftop, I am unable to track down either the flush or php-fpm writes (e.g. ftop -p php-fpm.

At least part of my problem stems from identifying which processes are writing to the root volume. Of those listed above, I would expect all to be writing to the data volume (since the relevant directories are symlinked there) (e.g. nginx, mysql, php-fpm, varnish directories all point to a different EBS volume)

I believe JBD2 is the journaling block device for ext4, and flush-202 is the background flush of dirty pages. The dirty_ratio is 20 and the dirty_background_ratio is 10. Dirty memory (from /proc/meminfo) was typically between 50-150kB). Page size (getconf PAGESIZE) is the system default (4096).

Output of: vmstat -s | grep paged

3248858 pages paged in 104625313 pages paged out

Output of: sar -B | grep Average

                pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
Average:         1.38     39.57    113.79      0.03     36.03      0.38      0.02      0.29     73.66

The above appears to suggest a high number of pages paged out - however, I would expect that pages would be written to my swap partition if necessary, not to my root volume. Of the total memory, the system typically has 35% in use, 10% in buffers, and 40% cached, 15% unused (i.e. 65% free).

Output of: vmstat -d

disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
xvda1 105376  14592 2548092  824418 10193989 12264020 179666824 626582671      0   7872
xvdf  126457    579 3882950  871785 1260822  91395 30081792 32634413      0   4101
xvdg    4827   4048   71000   21358   1897  15373  138160  307865      0     29

vmstat consistently displays si and so values of 0

Output of: swapon -s

Filename                                Type            Size    Used    Priority
/dev/xvdg                               partition       1048572 9252    -1

On the hunch that the I/O writes may be memory related, I disabled varnish, and restarted server. This changed my memory profile to 10% in use, 2% in buffers, and 20% cached, 68% unused (i.e. 90% free). However, over a 10 minute run, iotop gave similar results as previously:

[flush-202] wrote 19MB
php-fpm wrote 10MB

In the hour since the restart, there have already been 330MB written to the root volume with 370K pages swapped out.

Output of inotifywatch -v -e modify -t 600 -r /[^mnt]*

Establishing watches...
Setting up watch(es) on /bin /boot /cgroup /dev /etc/ home /lib /local /lost+found /opt /proc /root /sbin /selinux /src /sys /usr /var
OK, /bin /boot /cgroup /dev /etc/ home /lib /local /lost+found /opt /proc /root /sbin /selinux /src /sys /usr /var is now being watched.
Total of 6753 watches.
Finished establishing watches, now collecting statistics.
Will listen for events for 600 seconds.
total  modify  filename
23     23      /var/log/
20     20      /usr/local/ispconfig/server/temp/
18     18      /dev/
15     15      /var/log/sa/
11     11      /var/spool/postfix/public/
5      5       /var/log/nginx/
2      2       /var/run/pure-ftpd/
1      1       /dev/pts/

Looking a bit further into the above, almost all of the writes can be attributed to an (unknown) process that is running every 5 minutes and checking the status of a variety of services (like chkservd on cPanel, but I don't use cPanel, and didn't install this). It amounts to 4 log files (cron, maillog, ftp, imapproxy) updated during the 10 minutes - and a few related items (postfix sockets, pure-ftpd connections). The other items are primarily modified ispconfig sessions, system accounting updates, and invalid (non-existent server_name) web access attempts (logged to /var/log/nginx).

Conclusions and Questions:

Let me start by saying I am a bit perplexed - I am usually fairly thorough, but I feel that I am missing something obvious on this one. Clearly, flush and php-fpm account for the bulk of the writes, however, I don't know why this might be the case. Firstly, let's take php-fpm - it shouldn't even be writing to the root volume. It's directories (both files and logs) are symlinked to another EBS volume. Secondly, the primary things that php-fpm should be writing are sessions and page-caches - which are both few and small - certainly not on the order of 1MB/min (more like 1K/min, if that). Most of the sites are read-only, with only occasional updates. The total size of all web files modified in the last day is 2.6MB.

Secondly, considering flush - the significant writes from it suggest to me that dirty pages are frequently being flushed to disk - but given that I typically have 65% free memory and a separate EBS volume for swap space, I can't explain why this would affect the writes on my root volume, especially to the extent that is occurring. I realize that some processes will write dirty pages to their own swap space (instead of using system swap space), but surely, immediately following a restart with the vast majority of my memory being free, I should not be running into any substantial amount of dirty pages. If you do believe this to be the cause, please let me know how I might identify which processes are writing to their own swap space.

It is entirely possible that the whole dirty pages idea is simply a red herring and is completely unrelated to my problem (I hope it is actually). If that is the case, my only other idea that there is something relating to ext4 journaling that wasn't present in ext3. Beyond that, I am currently out of ideas.

Update(s):

Nov 6, 2011:

Set dirty_ratio = 10 and dirty_background_ratio = 5; updated with sysctl -p (confirmed via /proc); reran 10 minute iotop test with similar results (flush wrote 17MB, php-fpm wrote 16MB, MySQL wrote 1MB, and JBD2 wrote 0.7MB).

I have changed all the symlinks I setup to use mount --bind instead. Re-enabled varnish, restarted server; reran 10 minute iotop test with similar results (flush wrote 12.5MB, php-fpm wrote 11.5MB, Varnish wrote 0.5MB, JBD2 wrote 0.5MB, and MySQL wrote 0.3MB).

As at the above run, my memory profile was 20% in use, 2% in buffers, and 58% cached, 20% unused (i.e. 80% free) Just in case my interpretation of free memory, in this context, is flawed, here is the output of free -m (this is a t1.micro). total used free shared buffers cached Mem: 602 478 124 0 14 347 -/+ buffers/cache: 116 486 Swap: 1023 0 1023

Some additional information: Output of: dmesg | grep EXT4

[    0.517070] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
[    0.531043] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
[    2.469810] EXT4-fs (xvda1): re-mounted. Opts: (null)

I also ran ftop and iotop simultaneously, and was surprised to notice that entries showing up in iotop, did not appear in ftop. The ftop list was filtered to php-fpm, since I could trigger writes of that process fairly reliably. I noted about 2MB of writes per page view for php-fpm - and I have yet to figure out what it could possibly be writing - any ideas on ascertaining what is being written would be appreciated.

I will try to turn off journaling in the next few days, and see if that improves things. For the moment though, I find myself wondering if I have an I/O problem or a memory problem (or both) - but I am having a hard time seeing the memory problem, if there is one.

Nov 13, 2011:

As the file system uses extents, it was not possible to mount it as ext3, additionally, attempts to mount it as read-only, simply resulted in it being remounted as read-write.

The file-system does indeed have journaling enabled (128MB journal), as is evident from the following:

Output of: tune2fs -l /dev/sda1 | grep features

has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

As per the following, about 140GB has been written to this volume in a bit under a month - just about 5GB/day.

Output of: dumpe2fs -h /dev/sda1

Filesystem volume name:   /
Last mounted on:          /
Filesystem UUID:          af5a3469-6c36-4491-87b1-xxxxxxxxxxxx
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              262144
Block count:              1048576
Reserved block count:     10478
Free blocks:              734563
Free inodes:              210677
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      511
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stride:              32582
Flex block group size:    16
Filesystem created:       Wed Sep 21 21:28:43 2011
Last mount time:          Sun Nov 13 16:10:11 2011
Last write time:          Sun Oct 16 16:12:35 2011
Mount count:              13
Maximum mount count:      28
Last checked:             Mon Oct 10 03:04:13 2011
Check interval:           0 (<none>)
Lifetime writes:          139 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       18610
Default directory hash:   half_md4
Directory Hash Seed:      6c36b2cc-b230-45e2-847e-xxxxxxxxxxx
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0002d91c
Journal start:            1

Continuing to look for open files, I tried using fuser on the root volume:

Output of: fuser -vm / 2>&1 | awk '$3 ~ /f|F/'

root       1111 Frce. dhclient
root       1322 frce. mysqld_safe
mysql      1486 Fr.e. mysqld
root       1508 Frce. dovecot
root       1589 Frce. master
postfix    1600 Frce. qmgr
root       1616 Frce. crond
root       1626 Frce. atd
nobody     1648 Frce. in.imapproxyd
postfix    1935 Frce. tlsmgr
root       2808 Frce. varnishncsa
root      25818 frce. sudo
root      26346 Fr.e. varnishd
postfix   26925 Frce. pickup
postfix   28057 Frce. smtpd
postfix   28070 Frce. showq

Nothing unexpected, unfortunately. On the off-chance that it was due to underlying hardware, I restored yesterday's snapshot of the root volume (nothing had changed in the last day), and replaced the instance's root volume with the new one. As expected, this had no effect on the problem.

My next step would have been to remove journalling, however I stumbled across the solution before getting to that.

The problem lay in APC using file-backed mmap. Fixing this dropped disk i/o by about 35x - to (an estimated) 150MB/day (instead of 5GB). I might still consider removing journalling as this appears to be the major remaining contributor to this value, however, this number is quite acceptable for the time being. The steps taken to arrive at the APC conclusion are posted in an answer, below.

5.5GB written daily to 1.2GB root volume - 4 times previous levels

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?