The National Institute for Computational Sciences

Darter: I get the error message "OOM killer terminated this process". What is OOM?

This error message indicates that the node is running Out Of Memory. This could be the result of a bug in the code, or memory requirements for the given input. Note that due to optimistic memory allocation, you probably will not get a null pointer, even if you are out of memory. The program should be killed at the point the memory is used.

One quick solution might be to run with only four MPI processes per socket so each process gets a larger share of the memory on the node:

aprun -n  -S 4 ./a.out

Where is the total number of MPI processes. The above solution uses 4 out of 8 cores on each socket, so naively, each MPI task should get twice as much memory. If this is not enough memory, it is possible to reduce the number of tasks per core further (-S 2). The best solution may be to identify the memory requirements in the code and make any necessary changes there, in terms of memory parameters, domain decomposition, etc.