-
Description: Job distributes tasks over multiple nodes even if they fit one node. This might lead to increased communication latency and redundant network traffic when reading from filesystems.
-
Criterion: Number of allocated cores fit one node. Additional criterion: High Network or Lustre Bandwidth indicates high communication overhead (possibly unreliable metric on shared nodes). Possibly, check SLURM arguments.
-
Possible false positives: MPI distributed program uses multiple accelerators and scheduler allocates accelerators on different nodes. Program actually benefits from multiples nodes since it uses a lot of memory bandwidth.
-
Possible cures/workarounds: Use -N to specify number of nodes. Use --contiguous to use contiguous resources.
Comments
Please register or sign in to add a comment.