What WQL queries would you use for monitoring typical Windows bottlenecks? Which would you use to obtain data similar to 'top' or 'netstat'? What interval would you poll at?
Here are a few that I find helpful.
SELECT PercentDiskTime, AvgDiskQueueLength, DiskReadBytesPerSec, DiskWriteBytesPerSec FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk
SELECT Caption, CommittedBytes, AvailableBytes, PercentCommittedBytesInUse, PagesPerSec, PageFaultsPerSec FROM Win32_PerfFormattedData_PerfOS_Memory
SELECT PercentProcessorTime FROM Win32_PerfFormattedData_PerfOS_Processor
SELECT Caption, WorkingSet, PageFaultsPerSec,IOReadBytesPerSec, IOWriteBytesPerSec, ThreadCount, HandleCount FROM Win32_PerfFormattedData_PerfProc_Process
SELECT Caption, BytesReceivedPerSec, BytesSentPerSec FROM Win32_PerfFormattedData_Tcpip_NetworkInterface
This is a truly great question, and it's a shame it has not gotten more love!
My basic theory of bottleneck analysis is to treat the system as a box with 4 sorts of finite resources: processor, memory, disk, and network. So I want to get basic numbers for each of these to determine the health of the box. I want numbers that are easy to interpret: high is bad, low is good. 0 is best, though never perfectly achievable (after all we bought the computer to do work, eh?). Once I see which of the four resources is the main bottleneck I can proceed to determining which program or process is eating all the resources, and make an educated decision as to whether I need to increase that resource - or tune the program/process to use less of the resource.
I will format the main performance counters I use, from this article, as WMIC queries, because no scripting is required (although it is certainly possible!). You can enter each of these queries directly into the cmd console:
Above is Processor Queue Length. This tells how many threads are waiting in queue to be handled by the CPU. High numbers bad, low numbers good. Generally I consider a value <10 to be a healthy system.
Above is Memory, Pages Input per Second, the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in physical memory, and must be retrieved from disk. This counter works best in Perfmon's graph view, though. On a healthy (not bottlenecked) computer, you'll see occasional spikes as data is read from disk into RAM the more spikes you see, and the higher they go, the more memory constrained the system is. If the system often stays at a nonzero value for periods longer than, say, five seconds, you probably have a memory bottlenecked system.
Above is PhysicalDisk, Average Disk Queue Length. I consider this to be the key indicator of system health, since memory bottlenecks will also bog down the disk due to excessive pagefile swapping - and will often push up CPU utilization as well. It will show an item for each mounted disk as well as a total of all disks. A well performing single disk will have this value at 2 or lower. For arrays, divide the number of spindles by the queue length (eg: 4 spindles in array divided by a queue length of 8 = 2, which means the array is performing well).
And finally, above we have NIC performance. Specifically Network Interface, Output Queue Length and Packets Received Errors. These two counters let us know how many packets are waiting to be sent, and how many inbound packets caused errors which probably resulted in retransmits. We want both numbers to stay at zero. In this query I also get the current bandwidth of the NIC which is useful information.
Once I've determined which resource is overused, I usually depend on either Process Explorer or Perfmon's process object to discover which process is the resource hog.