We have a device that sends POST requests or TCP messages in fixed intervals, with a JSON payload, through the internet to our server running a node application, on another location.
The JSON payload has a timestamp and some other values. We compare that timestamp to the server's timestamp to calculate the time difference, that we call lag.
The interval is 100ms. What we experience, is that initially, the lag is under 200ms. We let the system run for a few days and we observe the lag is increasing, after 2-3 days it is 2000 - 3000ms, and increasing further .. after 6 days around 6000ms.
Once we restart the server process the lag is back to normal, so I assume the sender is OK. This happens with both, the POST requests implementation, and the TCP messages implementation.
Does anyone have any idea why this may happen, or how to narrow down the problem?
Do you have clocks on both server and sender synchronized? 1 second per day isn't unusual clock drift.
This would indicate what's known as a resource leak. Something in your application is slowing down processing over time as a queue or memory or some other resource builds up.
I'd recommend adding instrumentation to your NodeJS code to track any internal queues, memory allocated, database lookups, etc. Log all the things so you can start to identify where the bottleneck is.
If you're running in the cloud there are good tools to help instrument code, such as AWS X-Ray.