Skip to content
Snippets Groups Projects
Unverified Commit f02676aa authored by Andres Felipe Posada Moreno's avatar Andres Felipe Posada Moreno Committed by GitHub
Browse files

Update README.md

parent 75aa6caa
No related branches found
No related tags found
No related merge requests found
# Example Repository
## 1) Build and launch your docker container (optional) Welcome to the example repository! This guide will help you understand the structure of the repository, set up your environment, and run the code locally, on a high-performance cluster (HPC), or on the GPU server using docker. Additionally, it covers how to manage data with Rclone.
Docker will allow you to execute your code in different machines (with docker) and have the same behavior. This is specially important if you have stochastic issues and different results between windows and linux computers. To avoid these issues, and to have control over the full environment (including gpu drivers) where our code work, we use docker. ## Structure of the Repository
- **modules**: Contains the core code for what you want to do. e.g. training and testing models.
- **data**: Directory to store datasets and related files.
- `config/`: Configuration files for Hydra.
- `train_model.yaml`: (e.g) Default configuration for training.
- **env setup**: Environment setup files.
- `Dockerfile`: Docker configuration for containerized environment.
- `requirements.txt`: Python dependencies.
- **runs**: Contains scripts for experimental runs, create datasets, train models, analyze them, etc.
- `01_train_model.py`: Main script for training the model.
-
## Setting Up the Virtual Environment
The docker image that we are going to use is the one on 'env_setup/Dockerfile'. ### Using Conda
```bash ```bash
# build image # Create a new conda environment with Python 3.11 and place it in the ./.venv directory
docker build -t andresfp14/xaicu118 ./env_setup conda create --prefix ./.venv python=3.11
# push image to docker repo (if you want to make it available in general) # Activate the newly created conda environment
docker push andresfp14/xaicu118 conda activate ./.venv
# Examples of how to launch it in windows # Install all required Python packages as listed in the requirements.txt file
docker run -it --rm --name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118 pip install -r ./env_setup/requirements.txt
docker run -d --rm --name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118 bash
# Examples of how to launch it in linux # Export the list of installed packages to a new requirements file
docker run -itd --rm --name xaicu118 --shm-size 5G --gpus all -p 8888:8888 -p 6007:6007 -v $(pwd):/home/example andresfp14/xaicu118 bash pip freeze > ./env_setup/requirements2.txt
docker run -idt --rm --name xai_1 --shm-size 5G --gpus '"device=0:0"' -v ~/data/datasets:/home/example/data/datasets -v $(pwd):/home/example andresfp14/xaicu118
docker run -idt --rm --name xai_2 --shm-size 5G --gpus '"device=0:0"' -v $(pwd):/home/example andresfp14/xaicu118
# Deactivate the conda environment
conda deactivate
``` ```
## 2) Build and activate your virtual environment ### Using Virtualenv
Our virtual environment will be the collection of libraries that this project requires, and the versions of each library that are required.
In general, this is defined in the file 'env/requirements.txt'.
```bash ```bash
############################### # Create a new virtual environment named .venv
# with conda
###############################
# create environment
conda create --prefix ./.venv python=3.11
# activate environment
conda activate ./.venv
# install requirements
pip install -r ./env_setup/requirements.txt
# export environment (if you want to update it)
pip freeze > ./env_setup/requirements2.txt
# deactivate virtual environment
conda deactivate
###############################
# with virtualenv
###############################
# creates a virtualenv
python -m venv .venv python -m venv .venv
# activates the virtualenv
# Activate the newly created virtual environment
source .venv/bin/activate source .venv/bin/activate
. .venv/bin/activate
# install requirements # Install all required Python packages as listed in the requirements.txt file
pip install -r ./env_setup/requirements.txt pip install -r ./env_setup/requirements.txt
# export environment (if you want to update it)
# Export the list of installed packages to a new requirements file
pip freeze > ./env_setup/requirements2.txt pip freeze > ./env_setup/requirements2.txt
# deactivate virtual environment
deactivate
# Deactivate the virtual environment
deactivate
# if you are using the HPC, consider: # If you are using a high-performance computing cluster (HPC), consider loading a specific Python module from the beginning
module load Python/3.10.4 module load Python/3.10.4
``` ```
## 3) Run code ## Running Code Locally
Now, with the environment setup, we can run the needed code from the base directory. We recommend using the "fire" library to avoid argparsers and maintain cleaner code. ### Single Run
```bash ```bash
############################### # Display help message with all available options and arguments
# Getting help
###############################
python 01_train_model.py --help python 01_train_model.py --help
############################### # Execute the script with default configuration settings
# Executing with default arguments
###############################
python 01_train_model.py python 01_train_model.py
```
############################### ### Manual run
# Executing and changing an argument
############################### ```bash
# Execute the script with specific arguments, changing the number of epochs to 2 and the seed to 7
python 01_train_model.py training.epochs=2 training.seed=7 python 01_train_model.py training.epochs=2 training.seed=7
```
### Sweep with Hydra
############################### ```bash
# Executing multiple runs with different model sizes using Hydra's multirun feature # Execute multiple runs with different model sizes using Hydra's multirun feature
############################### # This command will run the script for each combination of the specified values
python 01_train_model.py --multirun training.epochs=2 model.num_layers=1,2,3 python 01_train_model.py --multirun training.epochs=2 model.num_layers=1,2,3
############################### # Execute multiple runs as defined in a configuration file
# Executing multiple runs with launchers python 01_train_model.py +experiment=sweep_models_lr
############################### ```
python 01_train_model.py --multirun training.epochs=2 model.num_layers=1,2,3 +launcher=joblib
# or ### Launchers
python 01_train_model.py --multirun training.epochs=2 model.num_layers=1,2,3 +launcher=slurm ```bash
# Execute multiple runs with Hydra's joblib launcher
# This will run the script for each combination of the specified values using joblib for parallel execution
python 01_train_model.py --multirun training.epochs=2 model.num_layers=1,2,3 +launcher=joblib
or # Or use Hydra's slurm launcher for running on a Slurm-based cluster
python 01_train_model.py --multirun training.epochs=2 model.num_layers=1,2,3 +launcher=slurm
# Or use Slurm with GPU support, running the script with multiple seed values
python 01_train_model.py --multirun training.epochs=2 training.seed=0,1,2,3,4 +launcher=slurmgpu python 01_train_model.py --multirun training.epochs=2 training.seed=0,1,2,3,4 +launcher=slurmgpu
```
# or ## Run Code with Docker (GPU Server)
python 01_train_model.py --multirun +experiment=sweep_models_lr +launcher=slurm Docker allows you to execute your code on different machines with the same environment, ensuring consistent results. This is particularly useful for avoiding stochastic issues and differences between Windows and Linux.
### Build and Launch Docker Container
```bash
# Build a Docker image from the Dockerfile located in the env_setup directory
docker build -t andresfp14/xaicu118 ./env_setup
# (Optional) Push the built image to a Docker repository for public access
docker push andresfp14/xaicu118
# Examples of how to launch the Docker container in Windows
# Run the container interactively, remove it after exiting, name it xaicu118, use all GPUs, map ports, and mount the current directory
docker run -it --rm --name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118 bash
# Run the container in detached mode, remove it after exiting, name it xaicu118, use all GPUs, map ports, and mount the current directory
docker run -d --rm --name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118 bash
# Examples of how to launch the Docker container in Linux
# Run the container interactively, remove it after exiting, name it xaicu118, allocate 100G of shared memory, use all GPUs, map ports, and mount the current directory
docker run -it --rm --name xaicu118 --shm-size 100G --gpus all -p 8888:8888 -p 6007:6007 -v $(pwd):/home/example andresfp14/xaicu118 bash
# Run the container in detached mode, remove it after exiting, name it xaicu118, allocate 50G of shared memory, use all GPUs, map ports, and mount the current directory
docker run -d --rm --name xaicu118 --shm-size 50G --gpus all -p 8888:8888 -p 6007:6007 -v $(pwd):/home/example andresfp14/xaicu118 bash
# Run the container in detached and interactive mode, remove it after exiting, name it xai_1, allocate 50G of shared memory, use the first GPU device, and mount specified directories
docker run -idt --rm --name xai_1 --shm-size 50G --gpus '"device=0:0"' -v ~/data/datasets:/home/example/data/datasets -v $(pwd):/home/example andresfp14/xaicu118 bash
# Run the container in detached and interactive mode, remove it after exiting, name it xai_2, allocate 50G of shared memory, use the first GPU device, and mount the current directory
docker run -idt --rm --name xai_2 --shm-size 50G --gpus '"device=0:0"' -v $(pwd):/home/example andresfp14/xaicu118 bash
``` ```
## Moving Data Around with Rclone
Rclone is a command-line program to manage files on cloud storage. It is useful for transferring large datasets to and from remote servers.
### Installing Rclone
Follow the instructions on the [Rclone website](https://rclone.org/install/) to install Rclone on your system.
### Configuring Rclone
Run the following command to configure Rclone with your cloud storage provider:
```bash
# Configure Rclone with your cloud storage credentials and settings
rclone config
```
### Using Rclone
#### Copying Data to Remote Storage
```bash
# Copy data from a local directory to a remote storage bucket
rclone copy ./data remote:bucket/path
```
#### Copying Data from Remote Storage
```bash
# Copy data from a remote storage bucket to a local directory
rclone copy remote:bucket/path ./data
```
This setup ensures that you can efficiently manage your project environment, run your code in different scenarios, and handle data transfers seamlessly. For more details, refer to the [repository](https://github.com/andresfp14/example).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment