-
Description: When multiple processes use the same GPU in an uncoordinated way this can lead to a dramatic reduction of efficiency.
-
Criterion:
- nv_compute_processes>1
-
Works for shared jobs with existing metrics:
- yes, if a GPU is allocated to at most one compute job
-
Possible false positives: *
-
Possible cures/workarounds:
- check MPI-rank-GPU affinity/visibility
- deploy MPS (https://docs.nvidia.com/deploy/mps/index.html)
Eingabe:
* Metrik nv_compute_processes im accelerator scope
Regel:
max_process = nv_compute_processes.max('all')
multiprocess = max_process > 1
Ausgabe: multiprocess ist True