I've been trying to get to the bottom of a memory issues for some time now and I simply can't fathom out what the problem is. Any help is greatly appreciated.
The error is:
[![OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000005662c0000, 10632822784, 0) failed; error='Cannot allocate memory' (errno=12)
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 10632822784 bytes for committing reserved memory.]
I have a very small spark job that I'm running on a cluster. Of the various permutations that I've run these are my findings: (new clusters in each case, all identical in configuration)
CLI ONLY - One cluster I start and run all of the steps via the cli, each step results in a slight increase in memory that persists, Ganglia shows the cluster cache memory increasing with every step, it drops again after completion but not to the base level. Eventually resulting in there not being enough memory to allocate for a new JVM to run any additional steps. using htop on this cluster shows spark history servers the main memory intensive process - could the history server be retaining too much information?
CONSOLE ONLY - This cluster was created in very much the same way as the others, the difference is that I add the steps via the console. (I came to try this as I was simply out of ideas) This cluster has only run one step thus far, htop showing oozie as being the highest memory consuming process.
Others - All other clusters ran and failed in the same manner, an interesting case is where a new cluster was started, one step ran, it completed but gradually consumed memory to the point where the exception occurred again. For all of these other clusters, hadoop was always the task at the top of the process tree for memory consumption.
Any help or suggestions with respect to how to solve would be fantastic, thank you in advance.
I've attached a few images that may help explain the above.