-
Description: A overly high CPU load, i.e., the number of processes and threads waiting for CPU time, can indicate one of the following:
- wrong parallelization settings, defaults, or automatisms in applications for the number of MPI ranks or number of threads
- unintentionally using additional parallelization layers like in underlying libraries like BLAS
- blocking of processes due to IO or network activity
-
Criterion:
- CPU load one-minute average is significantly higher than the number of CPU cores or hardware threads allocated by the job on the node
-
Works for shared jobs with existing metrics:
- yes, if not the load_one is used but the CPU-core-specific cpu_load_core.
-
Possible Cures:
- checking parallelization settings
Comments
Please register or sign in to add a comment.