Apart from easier deployment and administration, are there any performance benefits in virtualizing an application server?
For example, could I fit more operational users on 4 app server VMs on a single Xenserver, than I could on 1 physical app server with the same hardware spec?
The vast majority of server workloads never push systems to a level that fully utilizes the resources on a modern server, in my experience the average physical system runs at 10% load levels or less. All virtualization solutions allow you to consolidate those loads in a way that enables better overall utilization of the physical hardware. Good virtualization solutions provide administration tools that make managing multiple systems easier, enable fault tolerance\high availability, live migration, simpler workload provisioning and a host of other benefits.
While there are generally performance trade offs with virtualization they are relatively low. 5% for general purpose workloads and at worst 10-20%. Given that server performance improves by between 20% and 50% per generation the trade off in performance terms is rarely a show stopper. It can be though and those edge cases should be excluded from virtualization.
There are some scenarios where a virtual solution can outperform a physical implementation across multiple servers. Like the scenario where the virtual performance hit is unacceptable these situations are rare but they do exist. A multi-tier application with Front End services\an app layer that relies on network connectivity to one or more back end databases for example can benefit from the efficiency of the virtual network stack within a single virtual host. @pjz gave some good examples of single workload type scenarios where virtualizing can improve overall performance because scale-out (into multiple parallel systems) is easier to do efficiently than scale-up (building an efficient multi-threaded app that scales out to current server core counts efficiently).
Given your example the answer is that it depends but if the application server you are talking about is XenApp and the Hypervisor is XenServer then I think you will be able to support more users by taking the virtual route than running natively provided you are talking about a recent server with lots of cores. VMware have an old article about XenApp running on ESX 3.5 vs native performance and their claim is that scaling is better with the servers running virtually. Whether these claims are still true on up to date XenApp and VMware Hypervisors very much depends on how well current versions of XenApp handle the NUMA architectures and multi-core scaleout on modern hardware, I suspect it is still true although I would take their claims of a 30% improvement with vSphere with some level of scepticism. Citrix claim that XenApp on XenServer outperforms XenApp on vSphere, I can't say if that is true, but the scale out behaviour on virtual systems is clear enough - this Citrix whitepaper shows that with similar workload profiles a dual Xeon 5570 based XenServer can support 3x the number of concurrent users that a physical XenApp server running on two cores of an X7350 based server (which is a generation older). Scaleout with multiple VM's definitely works pretty well butscaleup from dual core to quad-core is not hugely efficient. This AMD presentation indicates a 70% improvement in the number of concurrent sessions that can be supported when moving from dual core to quad core. I suspect that scale-up for XenApp running natively on current Generation 12\16\48 core servers will degrade very rapidly, I can't find anything definitive about this though.
The main advantages of virtualization are ease of administration, failover, and deployment, as you pointed out. A properly designed, modern application should theoretically do slightly better running straight on the hardware than in a VM under a hypervisor. The caveats I can think of are:
If you've got a multiprocessor machine and a singlethreaded application that refuses to allow multiple copies of itself to coexist on the same machine, you can 'fool' it into doing so by running each copy in a VM on the same machine.
If you've got a multiprocessor, 64-bit 16+GB-RAM machine, then a 32-bit application that's running at the limits of its available memory (3-4GB, right?) may end up performing better if sharded into 4 vm's under a hypervisor on the same machine, since that will allow it to address more memory than it could otherwise.
Performance isn't a virtualization strongpoint; the strong point is that you can manage 4 different applications all with conflicting software infrastructure requirements on one machine.