I'm banging my head against the wall now for 3 days and counting. I hope you can tell me what am I doing wrong.
I have a Redis cluster with 3 masters and 3 slaves. A NodeJS application wishes to use the redis to store and serve data quickly. All seemed fine, however, we experienced occasional drops in the application. Turns out, it frequently restarted itself. At the time of a restart, the following messages are logged by the application (not by redis)
2020-08-05 08:48 +02:00: events.js:288
2020-08-05 08:48 +02:00: throw er; // Unhandled 'error' event
2020-08-05 08:48 +02:00: ^
2020-08-05 08:48 +02:00:
2020-08-05 08:48 +02:00: Error: connect EMFILE 10.0.0.201:7101 - Local (undefined:undefined)
2020-08-05 08:48 +02:00: at internalConnect (net.js:917:16)
2020-08-05 08:48 +02:00: at defaultTriggerAsyncIdScope (internal/async_hooks.js:311:12)
2020-08-05 08:48 +02:00: at net.js:1007:9
2020-08-05 08:48 +02:00: at processTicksAndRejections (internal/process/task_queues.js:79:11)
2020-08-05 08:48 +02:00: at runNextTicks (internal/process/task_queues.js:66:3)
2020-08-05 08:48 +02:00: at listOnTimeout (internal/timers.js:518:9)
2020-08-05 08:48 +02:00: at processTimers (internal/timers.js:492:7)
2020-08-05 08:48 +02:00: Emitted 'error' event on Socket instance at:
2020-08-05 08:48 +02:00: at emitErrorNT (internal/streams/destroy.js:92:8)
2020-08-05 08:48 +02:00: at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
2020-08-05 08:48 +02:00: at processTicksAndRejections (internal/process/task_queues.js:84:21)
2020-08-05 08:48 +02:00: at runNextTicks (internal/process/task_queues.js:66:3)
2020-08-05 08:48 +02:00: at listOnTimeout (internal/timers.js:518:9)
2020-08-05 08:48 +02:00: at processTimers (internal/timers.js:492:7) {
2020-08-05 08:48 +02:00: errno: 'EMFILE',
2020-08-05 08:48 +02:00: code: 'EMFILE',
2020-08-05 08:48 +02:00: syscall: 'connect',
2020-08-05 08:48 +02:00: address: '10.0.0.201',
2020-08-05 08:48 +02:00: port: 7101
2020-08-05 08:48 +02:00: }
2020-08-05 08:48 +02:00: events.js:288
2020-08-05 08:48 +02:00: throw er; // Unhandled 'error' event
2020-08-05 08:48 +02:00: ^
2020-08-05 08:48 +02:00:
2020-08-05 08:48 +02:00: Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
2020-08-05 08:48 +02:00: at ChildProcess.target.send (internal/child_process.js:679:16)
2020-08-05 08:48 +02:00: at Worker.send (internal/cluster/worker.js:45:28)
2020-08-05 08:48 +02:00: at Server.<anonymous> (/totally/secret/folder/structure/app.js:136:16)
2020-08-05 08:48 +02:00: at Server.emit (events.js:311:20)
2020-08-05 08:48 +02:00: at TCP.onconnection (net.js:1554:8)
2020-08-05 08:48 +02:00: Emitted 'error' event on Worker instance at:
2020-08-05 08:48 +02:00: at ChildProcess.<anonymous> (internal/cluster/worker.js:29:12)
2020-08-05 08:48 +02:00: at ChildProcess.emit (events.js:311:20)
2020-08-05 08:48 +02:00: at internal/child_process.js:683:35
2020-08-05 08:48 +02:00: at processTicksAndRejections (internal/process/task_queues.js:79:11) {
2020-08-05 08:48 +02:00: code: 'ERR_IPC_CHANNEL_CLOSED'
2020-08-05 08:48 +02:00: }
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: (node:28699) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'emit' of undefined
2020-08-05 08:48 +02:00: at /totally/secret/folder/structure/app.js:323:34
2020-08-05 08:48 +02:00: at runMicrotasks (<anonymous>)
2020-08-05 08:48 +02:00: at processTicksAndRejections (internal/process/task_queues.js:97:5)
2020-08-05 08:48 +02:00: (node:28699) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 530)
2020-08-05 08:48 +02:00: (node:28699) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
2020-08-05 08:48 +02:00: (node:28699) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 201 error listeners added to [Cluster]. Use emitter.setMaxListeners() to increase limit
What I've tried so far is to disable either AOF or RDB but with no success, the occasional restarts still happen. What else could be the cause of this issue? The developers say the application simply cannot connect to Redis, thus failing. But that is impossible. Pinging the server and watching the uptime of the service confirms 100%, that the server itself is always reachable and available for the application. All 6 of them.
The error EMFILE is "Too many open files". In the conext of opening a network connection, a file descriptor is required, so since you are opening thousands of simultaneous connections, you are running into the open files ulimit, which by default is 1024.
You can raise this by setting
MAX_OPEN_FILES=
in/etc/default/pm2
, and restarting pm2 and your app.