We’ve got a performance problem in production.
QA and DEV environments are 2 instances on the same physical server: Windows 2003 Enterprise SP2, 32 GB RAM, 1 Quad 3.5 GHz Intel Xeon X5270 (4 cores x64), SQL 2005 SP3 (9.0.4262), SAN Drives
Prod: Windows 2003 Datacenter SP2, 64 GB RAM, 4 Dual Core 1.6 GHz Intel Family 80000002, Model 6 Itanium (8 cores IA64), SQL 2005 SP3 (9.0.4262), SAN Drives, Veritas Cluster
I am seeing excessive Signal Wait Percentages (> 250%) and Page Reads /s (>50) and Page Writes /s (>25) are both high occasionally.
I did test this query on both QA and PROD and it has the same execution plan and even the same stats:
SELECT
top 40000000 *
INTO
dbo.tmp_tbl
FROM
dbo.tbl
GO
Scan count 1, logical reads 429564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
As you can see it’s just logical reads, however: QA: 0:48 Prod: 2:18
So It seems like a processor related issue, however I’m not sure where to go next, any ideas?
Thanks,
Aaron
This was caused by two issues - indexes different between prod and QA along with improperly configured maxdop.
Was there anything else happening on the Prod server? Looks like the QA server had only this query to run, while the Prod system had to content for CPU with other queries running at the same time. How do the elapsed_time and worker_time compare in QA and Prod?
Also, make sure the plans are exactly identical, including DOP.
I have a couple of suggestions for things that you might investigate on the SAN:
Are you seeing significant page I/O latch waits on the production SAN?
Are the DB logs on a busy shared volume?
In the former case, there could be a SAN configuration or other performance issue relating to the SAN controllers. I have seen this happen on IBM shark hardware; moving to a DS8000 substantially mitigated the problem.
In the latter case you could be getting problems with random seeking disrupting the log writing activity. Log writes are a mostly sequential process with large numbers of small sequential writes. On quiet disks this is fast as the disk access patterns are mostly sequential. On busy disks the other disk traffic turns the sequential log writes into random writes, which are much slower. This can turn log drives into a significant performance bottleneck.
Note that SQL Server requires log writes to be flushed to disk before a transaction will commit, and most SAN vendors have enrolled in a certification program where they guarantee that the controllers will honour this standard. This means that no amount of cache memory will mitigate this issue if your logs reside on a busy shared volume.
"So It seems like a processor related issue, however I’m not sure where to go next, any ideas?"
It seems not unreasonable to me that a 3.5GHz Core 2 Architecture CPU is perhaps 100% faster than a 1.6GHz Itanium (isn't Intel Family 80000002 the original Itanium 2 family, Fanwood or Madison ?). If you need more speed, look at a CPU upgrade, perhaps to a x5600 series Xeon.