Ping a Specific Port

Question

mark

Asked: 2009-05-01 08:58:27 +0800 CST2009-05-01 08:58:27 +0800 CST 2009-05-01 08:58:27 +0800 CST

Severe write performance problem

772

[Sorry, I tried to stay short but it just was not possible]

I'm running Linux 2.6.22.19 with linux-vserver 2.3.0.34 on a Fujitsu Siemens Primergy RX200 S3 for development purpose. Intel(R) Xeon(R) 5140 @ 2.33Ghz with 4GB RAM (where mostly 500MB are still free). The server has two hot-plug 250GB used in a mirroring RAID configuration:

dev:~# mpt-status --newstyle
ioc:0 vol_id:0 type:IM raidlevel:RAID-1 num_disks:2 size(GB):231 state: OPTIMAL flags: ENABLED
ioc:0 phys_id:0 scsi_id:1 vendor:ATA      product_id:WDC WD2500JS-55N revision:2E01 size(GB):232 state: ONLINE flags: NONE sync_state: 100 ASC/ASCQ:0xff/0xff SMART ASC/ASCQ:0xff/0xff
ioc:0 phys_id:1 scsi_id:8 vendor:ATA      product_id:WDC WD2500JS-55N revision:2E01 size(GB):232 state: ONLINE flags: NONE sync_state: 100 ASC/ASCQ:0xff/0xff SMART ASC/ASCQ:0xff/0xff
scsi_id:0 100%
scsi_id:1 100%

I'm running LVM and ext3.

We've using this machine since around Jul/Aug 2007 and beside a broken RAM which was fixed the same day, there were no problems. And, basically, everything was better then the machine we had before.

However, performance problems were first noticed around Aug 2008 and it was not until recently that I've at least some confidence were the problem comes from. An average of seven vservers is running nowadays (three MySQL machines, two tomcats, three Apaches, Hudson, CruiseControl, MediaWiki, Samba, etc.). But not to get a false impression, we're a small company in terms of developers and other people accessing the server so there's not that much going on (browsing MediaWiki, Hudson automation is run in the evening, most Apache/PHP apps have much static content).

Once I installed munin I started to see interesting things especially during the night. Since within every vserver there was find running (at the same time, everywhere) the load shot up to unreal numbers like 15, 17 or 20.

But ultimately the problem didn't just stay during the night (I started disabling the find jobs, there weren't used anyway) but also during the day, especially when we recently started on a project were we had to work with a database with a 500MB MySQL dump file.

The first time I imported the dump file during working hours (mysql < dump.sql; within one of our vservers where MySQL instance was running) the timed output was:

real    71m28.630s
user    0m15.477s
sys     0m10.185s

Since I wasn't paying attention and was in a meeting, it was just my co-workers asking me what's on with the server because he was terribly slow.

I remade the test during night, installed a vanilla Debian MySQL on the host (not inside a guest; shut down all of them) and hit the following numbers:

real    48m33.067s
user    0m15.397s
sys     0m13.517s

and I still was like Yeah, fine, it's 500MB dump file; dumping into InnoDB space takes up around 1GB, that's quite some data. I did some tests like writing a single line to a file with vim during such a test and capturing it with strace:

   0.000048 write(1, "\33[?25l\"foo\" \33[78;7H\33[K", 22) = 22
   0.000033 stat64("foo", 0xbfda1f04) = -1 ENOENT (No such file or directory)
   0.000033 open("foo", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
   0.000035 write(4, "thios isthis ia a testa\n", 24) = 24
   0.000144 fsync(4)            = 0
7737.631089 stat64("foo", {st_mode=S_IFREG|0664, st_size=24, ...}) = 0
   0.000084 close(4)            = 0
   0.000117 write(1, "\33[78;7H[New] 1L, 24C written", 28) = 28
   0.000051 lseek(3, 0, SEEK_SET) = 0
   0.000022 write(3, "b0VIM 7.0\0\0\0\0\20\0\0\0\0\0\0\0\0\0\0!\21\0\0mark"..., 4096) = 4096
   0.000052 select(1, [0], NULL, [0], {0, 0}) = 1 (in [0], left {0, 0})

This was an unbelievable yet a fact number for me. It seems that stat64 syscall was forced to wait for the dump operation to be finished. Not to mention serving pages in MediaWiki during such a dump takes minutes, too.

Anyway, I arranged a test timeframe with our hosting company to test on our production server during the night and I got a completely different picture:

real    7m4.063s
user    0m2.690s
sys     0m30.500s

I was blown away. I've also got an Amazon EC2 approval for testing and I got even lower numbers (around 5 minutes on a m1.large instance where the MySQL data was written to to the EBS volume to stay permanent).

Additionally, when importing the data, every other operation gets so slow that things just get unusable and the load rises quickly up to 5 or 7 (although not really much seems to be going on; it looks like the processes start to block each other for now reason).

I then started to do bonnie++ tests, which were looking like this (I actually had a test run from last year which I almost forgot). So here is Oct 2008:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
vserver-host     8G 12332  21 12600   3 10290   2 48519  74 52431   6 235.8   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
vserver-host,8G,12332,21,12600,3,10290,2,48519,74,52431,6,235.8,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

and here's the current one from Apr 2009:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
vserver-host     8G  9705  16  7268   1  4823   1 29620  45 41336   5  73.1   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 11678  15 +++++ +++ +++++ +++ +++++ +++ +++++ +++ 27739  31
vserver-host,8G,9705,16,7268,1,4823,1,29620,45,41336,5,73.1,0,16,11678,15,+++++,+++,+++++,+++,+++++,+++,+++++,+++,27739,31

I don't know where to tune what to get rid of the problems, as I'm not completely sure yet where/what the real problem is. I guess I started failing to see the wood for the trees.

Update 1:

Thanks for the feedback so far, I've picked the things first I can easily test without scheduling a maintenance time.

First I've tested the MySQL dump against /dev/null:

dev01:~$ time mysqldump -u root -h db01 database-p >/dev/null
Enter password:

real    0m18.871s
user    0m12.713s
sys     0m0.512s

The system load was barely noticeable:

 10:27:18 up 77 days,  9:10,  7 users,  load average: 0.16, 0.75, 0.67
[...]
 10:27:33 up 77 days,  9:10,  7 users,  load average: 0.52, 0.79, 0.69

Also the sar output during this test didn't reveal anything special:

12:11:45 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:11:46 PM    dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:11:46 PM  dev254-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:11:46 PM  dev254-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

12:11:46 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:11:47 PM    dev8-0      5.00      0.00    200.00     40.00      0.18     36.00     20.00     10.00
12:11:47 PM  dev254-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:11:47 PM  dev254-1     25.00      0.00    200.00      8.00      0.74     29.60      4.00     10.00

Then I ran the tests when dumping onto a file:

dev01:~$ time mysqldump -u root -h db01 database -p >foo.sql

real    1m7.527s
user    0m13.497s
sys     0m2.724s

The time it took doesn't look that unusual for me, the dump ended with a 570MB file.

However, the load was quite interesting ...

 10:30:49 up 77 days,  9:14,  7 users,  load average: 0.76, 0.89, 0.75
 10:30:57 up 77 days,  9:14,  7 users,  load average: 1.34, 1.01, 0.79
 10:31:05 up 77 days,  9:14,  7 users,  load average: 2.13, 1.19, 0.85
 10:31:13 up 77 days,  9:14,  7 users,  load average: 2.68, 1.32, 0.89
 10:31:21 up 77 days,  9:14,  7 users,  load average: 3.79, 1.60, 0.99
 10:31:29 up 77 days,  9:14,  7 users,  load average: 4.05, 1.69, 1.02
 10:31:37 up 77 days,  9:14,  7 users,  load average: 4.82, 1.93, 1.10
 10:31:45 up 77 days,  9:15,  7 users,  load average: 4.54, 1.97, 1.12
 10:31:53 up 77 days,  9:15,  7 users,  load average: 4.89, 2.08, 1.16
 10:31:57 up 77 days,  9:15,  7 users,  load average: 5.30, 2.22, 1.21
 10:32:01 up 77 days,  9:15,  7 users,  load average: 5.12, 2.23, 1.22
 10:32:05 up 77 days,  9:15,  7 users,  load average: 5.03, 2.26, 1.24
 10:32:13 up 77 days,  9:15,  7 users,  load average: 4.62, 2.22, 1.23

... as was sar ...

12:16:47 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:16:48 PM    dev8-0    116.00      0.00  21712.00    187.17    134.04    559.31      8.62    100.00
12:16:48 PM  dev254-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:16:48 PM  dev254-1   3369.00      0.00  26952.00      8.00   3271.74    481.27      0.30    100.00

12:16:48 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:16:49 PM    dev8-0    130.00      0.00  17544.00    134.95    143.78    752.89      7.69    100.00
12:16:49 PM  dev254-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:16:49 PM  dev254-1   1462.00      0.00  11696.00      8.00   2749.12   1407.78      0.68    100.00

12:16:49 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:16:50 PM    dev8-0    128.00      0.00  18400.00    143.75    143.68   1593.91      7.84    100.40
12:16:50 PM  dev254-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:16:50 PM  dev254-1    810.00      0.00   6480.00      8.00   1598.99   4911.94      1.24    100.40

During this tests I quickly fired up vim to do a simple write tests. Loading of vim itself was sluggish too, but I wasn't able to distill this information properly from the strace. Only the pending stat64 call was clearly visible again:

0.000050 stat64("bla", 0xbf98caf4) = -1 ENOENT (No such file or directory)
0.000038 open("bla", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
0.000053 write(4, "fooo\n", 5) = 5
0.000051 fsync(4)            = 0
5.385200 stat64("bla", {st_mode=S_IFREG|0664, st_size=5, ...}) = 0
0.000073 close(4)            = 0

I'll append further file system tests ASAP.

Update 2 / Solution:

Based on the feedback from Mihai about MTP devices and RAID configuration I dug into that topic a bit more. Through lspci I retrieved the RAID controller information:

dev:~$ sudo lspci|grep -i lsi
05:05.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)

Out into the blue a gave a shot at http://www.ask.com/web?q=linux+problem+performance+SAS1068 and found write performance problem with LSI Logic SAS1068. The bug description further linked to blog talking about a DELL PE860 performance problem which too uses the SAS1068.

The article mentions the use of LSIUtil which is available from lsi.com. The article also has a extensive walk through the lsiutil menus which described how to turn on write caching.

Not surprisingly, turns out this has a major effect on the performance. Before activating:

dev:~# time (dd if=/dev/zero of=bigfile count=1024 bs=1M; sync)
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 180.902 seconds, 5.9 MB/s

real    3m8.472s
user    0m0.004s
sys     0m3.340s

After enabling the write cache:

dev:~# time (dd if=/dev/zero of=bigfile count=1024 bs=1M; sync)
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 43.7899 seconds, 24.5 MB/s

real    0m46.413s
user    0m0.000s
sys     0m3.124s

This is like day and night, black and white. Sometimes feeling stupid is OK. I'm probably supposed to know that the RAID controller ships with write caching disabled by default.

Update 3:

Rather by accident I found a munin plugin called diskstat which gives detailed IO, read/write/wait, etc. graphs for all block devices. There's also a very nice and detailed blog post from the author, titled Graphing Linux Disk I/O statistics with Munin.

4 Answers

Voted

Mihai Limbăşan · Answer 1 · 2009-05-03T16:47:09+08:00

This was an unbelievable yet a fact number for me. It seems that stat64 syscall was forced to wait for the dump operation to be finished.

That's the clue you're looking for. My bet is that the culprit here is ext3. By default, under ext3, any fsync issued by any application under any user account forces the entire journal to disk. This is especially annoying if you're running databases which often have to issue a fsync() call. This is even more annoying when running over RAID devices which don't support write barriers (don't know if MPT devices support them, sorry.)

If you're reasonably sure about your UPS and server stability, try mounting that file system using the data=writeback mount option instead of the default data=ordered. You will be sacrificing reliability in the face of a server crash, but if your performance gets hit this bad anyway it's not a big loss.

The long term solution might be switching to a better file system, though. Currently I would recommend XFS. I've used it for years without major problems. The only drawback is that it can't be shrunk the way ext2/3 can be shrunk - it can be expanded though.

David Schmitt · Answer 2 · 2009-05-03T07:38:25+08:00

David Schmitt

2009-05-03T07:38:25+08:002009-05-03T07:38:25+08:00

Try to do the mysql dump do /dev/null. That will show you whether the writing or the reading part is responsible.
Try to write the dump to a separate logical volume/file system or a RAM-disk.
Check the journalling options on the file system.
Check the noatime/atime status on the file system
Is the file system filling up? Typically performance decreases when the amount of data on it reaches a certain threshold.

3

Jon Topper · Answer 3 · 2009-05-04T09:37:21+08:00

Best Answer

Jon Topper

2009-05-04T09:37:21+08:002009-05-04T09:37:21+08:00

Some good suggestions here from other posters about ruling out the software, and tweaking your RAID performance. It's worth mentioning that if your workload is write-heavy, then going for hardware with a battery-backed write cache is likely to be the right thing to do if you're considering replacing your kit.

3

womble · Answer 4 · 2009-05-03T16:35:04+08:00

womble

2009-05-03T16:35:04+08:002009-05-03T16:35:04+08:00

It smells like you're just filling up on IO capacity. First check to make sure that one of your disks isn't dying (sector remapping can do bad things to IO), and run sar -d 1 0 and look at the %util numbers for the physical disks. If that's up around 100%, then I'd just throw more spindles at it, putting them in a RAID-10. You won't get double the performance, but it'll give you some more ooomph. 250GB disks aren't exactly mind-bendingly expensive these days.

Oh, and to pimp my work some more, and put a wall of text into battle against your wall of text, a guide I wrote to track down performance problems: http://www.anchor.com.au/hosting/development/HuntingThePerformanceWumpus covers everything you should need to know about tracking down performance problems.

1

Severe write performance problem

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?