Howdy cowboys and cowgirls,
If I have a VM (either KVM or ESXi) serving static content on Apache and a video streaming webapp on Tomcat, is there a logic in running multiple instances on the same VM on the same piece of kit and load balancing them? To me it seems conceptually pointless as when taking the same incoming web requests they will merely share the resources that would have otherwise been dedicated to a single instance, however I can imagine scenarios whereby a higher capacity can be leveraged by using 2, 3 or more identical VM's, maybe along the lines of threading performance within tomcat or such like, however any reason I think of tends to imply bad coding and workaround territory not best practise design. Example hardware here is a fairly capable box like a HP DL380 with 8 or 12 cores and 64GB of RAM serving around 4000 concurrent media connections, one way or another.
Update: In terms of other benefits like redundancy and patching, these are not issues as this scenario is likely to be replicated on upto 100 physical machines, all load balanced.
Update2: I also have concerns in my head about the ability to load balance multiple identical services from an external LB. if you are monitoring connection latency and such from the LB then it should be the case that 2 vm's - 1 with 5 connections, one with 500 should actually appear to be functioning identically as they are both pulling resource from the same pool (without VM CPU pinning etc.). Hammering one box would also cripple the other quiet one, so make the distribution of connections really abnormal and confusing.
Thanks
Chris
Given that a virtual machine can only have a limited number of resources (4-8 vCPUs depending on platform) if you want the web servers to be able to access all the resources of the host hardware then yes you will want to run multiple guests.
Also if you have multiple guests you can take them offline for patching without any interruption of service to the end users.
The only reason to do this would be if (a) you get operational benefits around patching and the like, or (b) you can find some sort of inability of your hypervisor to map vCPUs to real CPUs in a linear fashion (i.e. 2 x 4 vCPU guests get better throughput than 1 x 8 vCPU guest). You'd only every prove that through stress or real production load, though.