I want to deploy an interactive rendering service that uses the WebSocket protocol. The service is containerized and can serve up to N (let's say 1 <= N <= 10) clients. The reason for the small N is that each new connection creates a new process (within the container) that has quite some resource requirements. I have read some tutorials about application load balancers, service load balancers, AWS Fargate, etc. These load balancing strategies mainly focus on CPU load or memory usage. But I couldn't find any documentation that does a "connection based" routing.
Is there an AWS technique/subsystem that allows me to fire up a new container (task?) when one the "current" container (task?) is (almost) "out of seats"? The routing must not route any further connections to the "full" container but the existing connections of that full container must not be dropped. Once a container (task?) has no connections anymore, it should be stopped (if it is not the last one) to avoid consuming too much unneeded resources.