I am building an infrastructure on EC2 where I have nginx as a load balancer in front of apache+php servers. Each apache server has some virtual hosts.
I am thinking the best way to scale when a virtual host gets too many requests to handle.
I think I could limit the connections per each virtual host, say 100, then when the connections become more than 100, I create a new instance (using ruby and the fog gem) and configure another virtual host on that instance, add another backend in the nginx balancer with the IP of the new instance.
Is this the way to go, or what do you recommend?
It's as much balancing act as it is when you're scaling out physical hardware. It likely won't help fixating on a single metric to decide when to spawn new instances.
An arbitrary limit of 100 connections might be far less than the instance is really capable of handling, and you'll overspend on compute power as a result. On the other hand, your instance might choke on RAM/CPU or I/O before it reaches that point.
You should take a look at AWS auto-scaling. This will let you set up policies that will automatically expand and contract your compute cluster within specified hard limits and based on a whole raft of Cloud Watch metrics. You'll need to use the API to set up appropriate policies, then observe and tune them as your load and budget dictates.