@@ -9,7 +9,7 @@ A TensorFlow native version that is constraint to a single compute node with mul
...
@@ -9,7 +9,7 @@ A TensorFlow native version that is constraint to a single compute node with mul
A TensorFlow native version that utilizes multiple processes (1 process per GPU) that work together using a `tf.distribute.MultiWorkerMirroredStrategy`. Although this is not constraint to a single node, it requires a bit more preparation to setup the distributed environment (via `TF_CONFIG` environment variable)
A TensorFlow native version that utilizes multiple processes (1 process per GPU) that work together using a `tf.distribute.MultiWorkerMirroredStrategy`. Although this is not constraint to a single node, it requires a bit more preparation to setup the distributed environment (via `TF_CONFIG` environment variable)
## Version 3: (`submit_job_container_horovod.sh`)
## Version 3: (`submit_job_container_horovod.sh`)
A version that is using Horovod ontop of TensorFlow to perform the distributed training and communication of e.g. model weights/updates. Typically, these calls also use 1 process per GPU.
A version that is using Horovod ontop of TensorFlow to perform the distributed training and communication of e.g. model weights/updates. Typically, these cases also use 1 process per GPU.
More information and examples concerning Horovod can be found under:
More information and examples concerning Horovod can be found under: