I want to deploy an interactive rendering service that uses the WebSocket protocol. The service is containerized and can serve up to N (let's say 1 <= N <= 10) clients. The reason for the small N is that each new connection creates a new process (within the container) that has quite some resource requirements. I have read some tutorials about application load balancers, service load balancers, AWS Fargate, etc. These load balancing strategies mainly focus on CPU load or memory usage. But I couldn't find any documentation that does a "connection based" routing.
Is there an AWS technique/subsystem that allows me to fire up a new container (task?) when one the "current" container (task?) is (almost) "out of seats"? The routing must not route any further connections to the "full" container but the existing connections of that full container must not be dropped. Once a container (task?) has no connections anymore, it should be stopped (if it is not the last one) to avoid consuming too much unneeded resources.
Are you the author/developer of this service? If you are you better redesign it because what you describe is not a suitable, scalable architecture.
The closest you can get to your stated requirements is using AWS Lambda - Lambdas are created per-request, i.e. if you have 1 client you'll have 1 lambda running, if you have 100 clients you'll have 100 lambdas running. Automatically. Typically you'd use Lambdas with AWS API Gateway as their frontend.
Doing the same with ALB, ECS, Fargate, etc would require some complex orchestration that you'd have to develop. I wouldn't go that way.
Use Lambda or rearchitect it for a batch processing or something like that.
Hope that helps :)