I have a question regarding connection reliability running a tcp server from an ec2 instance.
We are currently serving mobile customers around the world from the Oregon region using a c3.4xl ec2 instance. Our product is a live game server written in python using the gevent framework. Right now we serve about 200 - 300 customers concurrently.
The issue is that we have a lot of customers from the other side of the world that are having trouble connecting and staying connected to the server. The server consistently has the clients time out without closing the socket. We're seeing times of > 30s without hearing back from a heartbeat.
Is it wrong of us to assume that a mobile client can establish a long term tcp connection from around the world and have it not be interrupted?
If so, what would be the best way to mitigate this problem?
If not, does anyone have any strategies for debugging the lost connections?
Thanks in advance :)