Uncoordinated multi process GPU usage · Wiki · PathoJobs / Documentation

Description: When multiple processes use the same GPU in an uncoordinated way this can lead to a dramatic reduction of efficiency.
Criterion:
- nv_compute_processes>1
Works for shared jobs with existing metrics:
- yes, if a GPU is allocated to at most one compute job
Possible false positives: *
Possible cures/workarounds:
- check MPI-rank-GPU affinity/visibility
- deploy MPS (https://docs.nvidia.com/deploy/mps/index.html)

Eingabe:
* Metrik nv_compute_processes im accelerator scope

Regel:
max_process    = nv_compute_processes.max('all')
multiprocess   = max_process > 1

Ausgabe: multiprocess ist True

Comments

Please register or sign in to add a comment.