-
Description: A generally high memory usage or increase of memory usage can be dangerous if the user isn't aware of it, as it can easily lead to a job cancellation due to OOM.
-
Criterion:
- high memory usage: the memory used by a process is very high compared to the memory requested by the job, e.g., used>=0.9*requested. This should be seen as a warning instead of because of the danger of the job failing by running out of memory.
- systematic increase of memory usage: the memory used by the job grows roughly linearly with time over large stretches of the job. This could be seen as a warning because of the danger of the job failing by running out of memory.
-
Works for shared jobs with existing metrics:
- no, because no job-specific memory usage is recorded right now
-
Possible false positives:
- workload with naturally high memory usage or increase pattern
-
Possible cures/workarounds:
- increasing memory usage:
- possibly a memory leak:
- Check the code
- often: try other MPI implementations
- possibly a memory leak:
- high memory usage:
- request more memory
- increasing memory usage:
Comments
Please register or sign in to add a comment.