Low cpu load · Wiki · PathoJobs / Documentation

Description: A significantly low CPU load, i.e., the number of processes and threads waiting for CPU time, can indicate one of the following:
- wrong parallelization settings, defaults, or automatisms in applications for the number of MPI ranks or number of threads
- inefficient parallelization of the code or load imbalance between MPI ranks or threads
Criterion:
- CPU load one-minute average is significantly lower than the number of CPU cores or hardware threads allocated by the job on the node
Works for shared jobs with existing metrics:
- yes, if not the load_one is used but the CPU-core-specific cpu_load_core.
Possible false positives:
- section in the application with less parallelization like IO or stages in the computation
Possible cures/workarounds:
- checking parallelization settings

Eingabe:
* Metrik cpu_load im node scope
* Metadatum numHwthreads
* Parameter threshold für einen core

Regel:
load_mean      = cpu_load.mean('all')
load_threshold = job.numHwthreads * threshold
lowload        = load_mean < load_threshold

Ausgabe: lowload ist True

Comments

Please register or sign in to add a comment.