Low GPU utilization · Wiki · PathoJobs / Documentation

Description: a low GPU utilization usually hints at an inefficient use of the GPUs
Criterion:
- nv_util << 1
- (or nv_mem_util << 1)
Works for shared jobs with existing metrics:
- yes, if a GPU is allocated to at most one compute job
Possible false positives: *
Possible cures/workarounds:
- check MPI-rank-GPU affinity/visibility
- check the suitability of the GPU setup/configuration for the given workload
- check applications internal GPU distribution settings

Eingabe:
* Metrik acc_utilization im accelerator scope
* Metadatum numAcc
* Parameter threshold für einen accelerator

Regel:
load_mean      = acc_utilization.mean('all')
load_threshold = job.numAcc * threshold
lowload        = load_mean < load_threshold

Ausgabe: lowload ist True

Comments

Please register or sign in to add a comment.