## 1) Build and launch your docker container (optional)
Welcome to the example repository! This guide will help you understand the structure of the repository, set up your environment, and run the code locally, on a high-performance cluster (HPC), or on the GPU server using docker. Additionally, it covers how to manage data with Rclone.
Docker will allow you to execute your code in different machines (with docker) and have the same behavior. This is specially important if you have stochastic issues and different results between windows and linux computers. To avoid these issues, and to have control over the full environment (including gpu drivers) where our code work, we use docker.
## Structure of the Repository
-**modules**: Contains the core code for what you want to do. e.g. training and testing models.
-**data**: Directory to store datasets and related files.
-`config/`: Configuration files for Hydra.
-`train_model.yaml`: (e.g) Default configuration for training.
-**env setup**: Environment setup files.
-`Dockerfile`: Docker configuration for containerized environment.
-`requirements.txt`: Python dependencies.
-**runs**: Contains scripts for experimental runs, create datasets, train models, analyze them, etc.
-`01_train_model.py`: Main script for training the model.
-
## Setting Up the Virtual Environment
The docker image that we are going to use is the one on 'env_setup/Dockerfile'.
### Using Conda
```bash
```bash
# build image
# Create a new conda environment with Python 3.11 and place it in the ./.venv directory
docker build -t andresfp14/xaicu118 ./env_setup
conda create --prefix ./.venvpython=3.11
# push image to docker repo (if you want to make it available in general)
# Activate the newly created conda environment
docker push andresfp14/xaicu118
conda activate ./.venv
# Examples of how to launch it in windows
# Install all required Python packages as listed in the requirements.txt file
docker run -it--rm--name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118
pip install-r ./env_setup/requirements.txt
docker run -d--rm--name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118 bash
# Examples of how to launch it in linux
# Export the list of installed packages to a new requirements file
docker run -itd--rm--name xaicu118 --shm-size 5G --gpus all -p 8888:8888 -p 6007:6007 -v$(pwd):/home/example andresfp14/xaicu118 bash
pip freeze > ./env_setup/requirements2.txt
docker run -idt--rm--name xai_1 --shm-size 5G --gpus'"device=0:0"'-v ~/data/datasets:/home/example/data/datasets -v$(pwd):/home/example andresfp14/xaicu118
docker run -idt--rm--name xai_2 --shm-size 5G --gpus'"device=0:0"'-v$(pwd):/home/example andresfp14/xaicu118
# Deactivate the conda environment
conda deactivate
```
```
## 2) Build and activate your virtual environment
### Using Virtualenv
Our virtual environment will be the collection of libraries that this project requires, and the versions of each library that are required.
In general, this is defined in the file 'env/requirements.txt'.
```bash
```bash
###############################
# Create a new virtual environment named .venv
# with conda
###############################
# create environment
conda create --prefix ./.venv python=3.11
# activate environment
conda activate ./.venv
# install requirements
pip install-r ./env_setup/requirements.txt
# export environment (if you want to update it)
pip freeze > ./env_setup/requirements2.txt
# deactivate virtual environment
conda deactivate
###############################
# with virtualenv
###############################
# creates a virtualenv
python -m venv .venv
python -m venv .venv
# activates the virtualenv
# Activate the newly created virtual environment
source .venv/bin/activate
source .venv/bin/activate
. .venv/bin/activate
# install requirements
# Install all required Python packages as listed in the requirements.txt file
pip install-r ./env_setup/requirements.txt
pip install-r ./env_setup/requirements.txt
# export environment (if you want to update it)
# Export the list of installed packages to a new requirements file
pip freeze > ./env_setup/requirements2.txt
pip freeze > ./env_setup/requirements2.txt
# deactivate virtual environment
deactivate
# Deactivate the virtual environment
deactivate
# if you are using the HPC, consider:
# If you are using a high-performance computing cluster (HPC), consider loading a specific Python module from the beginning
module load Python/3.10.4
module load Python/3.10.4
```
```
## 3) Run code
## Running Code Locally
Now, with the environment setup, we can run the needed code from the base directory. We recommend using the "fire" library to avoid argparsers and maintain cleaner code.
### Single Run
```bash
```bash
###############################
# Display help message with all available options and arguments
# Getting help
###############################
python 01_train_model.py --help
python 01_train_model.py --help
###############################
# Execute the script with default configuration settings
# Executing with default arguments
###############################
python 01_train_model.py
python 01_train_model.py
```
###############################
### Manual run
# Executing and changing an argument
###############################
```bash
# Execute the script with specific arguments, changing the number of epochs to 2 and the seed to 7
Docker allows you to execute your code on different machines with the same environment, ensuring consistent results. This is particularly useful for avoiding stochastic issues and differences between Windows and Linux.
### Build and Launch Docker Container
```bash
# Build a Docker image from the Dockerfile located in the env_setup directory
docker build -t andresfp14/xaicu118 ./env_setup
# (Optional) Push the built image to a Docker repository for public access
docker push andresfp14/xaicu118
# Examples of how to launch the Docker container in Windows
# Run the container interactively, remove it after exiting, name it xaicu118, use all GPUs, map ports, and mount the current directory
docker run -it--rm--name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118 bash
# Run the container in detached mode, remove it after exiting, name it xaicu118, use all GPUs, map ports, and mount the current directory
docker run -d--rm--name xaicu118 --gpus all -p 8888:8888 -p 6007:6007 -v %cd%:/home/example andresfp14/xaicu118 bash
# Examples of how to launch the Docker container in Linux
# Run the container interactively, remove it after exiting, name it xaicu118, allocate 100G of shared memory, use all GPUs, map ports, and mount the current directory
docker run -it--rm--name xaicu118 --shm-size 100G --gpus all -p 8888:8888 -p 6007:6007 -v$(pwd):/home/example andresfp14/xaicu118 bash
# Run the container in detached mode, remove it after exiting, name it xaicu118, allocate 50G of shared memory, use all GPUs, map ports, and mount the current directory
docker run -d--rm--name xaicu118 --shm-size 50G --gpus all -p 8888:8888 -p 6007:6007 -v$(pwd):/home/example andresfp14/xaicu118 bash
# Run the container in detached and interactive mode, remove it after exiting, name it xai_1, allocate 50G of shared memory, use the first GPU device, and mount specified directories
# Run the container in detached and interactive mode, remove it after exiting, name it xai_2, allocate 50G of shared memory, use the first GPU device, and mount the current directory
docker run -idt--rm--name xai_2 --shm-size 50G --gpus'"device=0:0"'-v$(pwd):/home/example andresfp14/xaicu118 bash
```
```
## Moving Data Around with Rclone
Rclone is a command-line program to manage files on cloud storage. It is useful for transferring large datasets to and from remote servers.
### Installing Rclone
Follow the instructions on the [Rclone website](https://rclone.org/install/) to install Rclone on your system.
### Configuring Rclone
Run the following command to configure Rclone with your cloud storage provider:
```bash
# Configure Rclone with your cloud storage credentials and settings
rclone config
```
### Using Rclone
#### Copying Data to Remote Storage
```bash
# Copy data from a local directory to a remote storage bucket
rclone copy ./data remote:bucket/path
```
#### Copying Data from Remote Storage
```bash
# Copy data from a remote storage bucket to a local directory
rclone copy remote:bucket/path ./data
```
This setup ensures that you can efficiently manage your project environment, run your code in different scenarios, and handle data transfers seamlessly. For more details, refer to the [repository](https://github.com/andresfp14/example).