diff --git a/README.md b/README.md index 1a895c98f49c92fc54da772aadd0e13820689cbc..74aba94b08348165086d9fe200ee9a3f3bc90e2b 100644 --- a/README.md +++ b/README.md @@ -11,9 +11,11 @@ For general help, documentation, and trainings please refer to the following pag ## What can you find here? -| Folder | Description | -|--------|-------------| -| [generic-job-scripts](generic-job-scripts) | General-purpose job submission scripts (our current workload is Slurm) for various workloads, including CPU and GPU-based computations. | -| [pytorch](pytorch) | Example scripts and best practices for running PyTorch workloads on an HPC cluster, including distributed training and GPU utilization. | -| [scikit-learn](scikit-learn) | HPC-friendly examples of using Scikit-Learn, including job submission scripts for machine learning model training. | -| [tensorflow](tensorflow) | TensorFlow job scripts and performance optimization techniques for running deep learning models on CPUs and GPUs in an HPC environment. | \ No newline at end of file +| Folder | Sub-Folder | Description | +|--------|------------|-------------| +| [generic-job-scripts](generic-job-scripts) | - | General-purpose job submission scripts (our current workload is Slurm) for various workloads, including CPU and GPU-based computations. | +| [machine-and-deep-learning](machine-and-deep-learning) | - | Collection of different examples in the context of Machine Learning (ML), Deep Learning (DL) and Large Language Models (LLMs) | +| | [ollama](machine-and-deep-learning/ollama) | Examples how to run and use LLMs with Ollama. | +| | [pytorch](machine-and-deep-learning/pytorch) | Example scripts and best practices for running PyTorch workloads on an HPC cluster, including distributed training and GPU utilization. | +| | [scikit-learn](machine-and-deep-learning/scikit-learn) | HPC-friendly examples of using Scikit-Learn, including job submission scripts for machine learning model training. | +| | [tensorflow](machine-and-deep-learning/tensorflow) | TensorFlow job scripts and performance optimization techniques for running deep learning models on CPUs and GPUs in an HPC environment. | \ No newline at end of file diff --git a/machine-and-deep-learning/ollama/README.md b/machine-and-deep-learning/ollama/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b0d1fe73ce19d2bc75824109face6d69e37997e0 --- /dev/null +++ b/machine-and-deep-learning/ollama/README.md @@ -0,0 +1,42 @@ +# Ollama - Running temporary Large Language Models (LLMs) + +This directory outlines two distinct scenarios and approaches, differing in the method of running the base Ollama server and the LLM: +1. An approach utilizing the official Ollama container image, which encompasses the entire software stack and necessary binaries to operate Ollama. +2. An approach involving manual setup of Ollama within your user directories, requiring you to download binaries and modify paths accordingly. + +Furthermore, this directory includes two examples: +- Using a standard REST API request to prompt the LLM +- Engaging with the LLM via the `ollama-python` library. + +Please find more information to Ollama in the following links: +- https://github.com/ollama/ollama +- https://github.com/ollama/ollama-python + +## 1. Running Ollama with the official Container (recommended) + +... follows soon ... + +## 2. Downloading and Running Ollama manually + +Before beeing able to execute Ollama and run the exaples, you need to download Ollama and make it available to the upcoming workflow steps. Additionally, we use a Python virtual environment, to demonstrate how Ollama can be used via the `ollama-python` library. + +Execute the following instructions **ONCE** to download Ollama and create the virtual environment: +```bash +# Specify the Ollama root directory, where binaries should be placed and where venv should be created, such as: +export OLLAMA_ROOT_DIR=${HOME}/ollama + +# initialize environment variables that refer to installation and virtual environment +source set_paths.sh + +# Download the Ollama binaries and create the venv +zsh download_and_create_venv.sh +``` + +Now you can execute the examples, either in the current shell or by submitting a batch job that runs the examples on a backend node: +```bash +# run in current active shell +zsh submit_job_venv.sh + +# submit batch job +sbatch submit_job_venv.sh +``` \ No newline at end of file diff --git a/machine-and-deep-learning/ollama/download_and_create_venv.sh b/machine-and-deep-learning/ollama/download_and_create_venv.sh new file mode 100644 index 0000000000000000000000000000000000000000..9e61f7340850c6b86aa65c18d5544c06317b377f --- /dev/null +++ b/machine-and-deep-learning/ollama/download_and_create_venv.sh @@ -0,0 +1,18 @@ +#!/usr/bin/zsh + +# create required directory +mkdir -p ${OLLAMA_ROOT_DIR} +mkdir -p ${OLLAMA_INSTALL_DIR} + +# download Ollama binaries +cd ${OLLAMA_INSTALL_DIR} +curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz +tar -xzf ollama-linux-amd64.tgz + +# create Python virtual +module load Python +python -m venv ${OLLAMA_VENV_DIR} +# activate the environment +source ${OLLAMA_VENV_DIR}/bin/activate +# install the ollama-python library +pip install ollama \ No newline at end of file diff --git a/machine-and-deep-learning/ollama/ollama-example.py b/machine-and-deep-learning/ollama/ollama-example.py new file mode 100644 index 0000000000000000000000000000000000000000..679032582b2b7d02af37c93115bd91a4d43d6d6e --- /dev/null +++ b/machine-and-deep-learning/ollama/ollama-example.py @@ -0,0 +1,13 @@ +from ollama import chat +from ollama import ChatResponse + +response: ChatResponse = chat(model='llama3.2', messages=[ + { + 'role': 'user', + 'content': 'Why is the sky blue?', + }, +]) + +print(response['message']['content']) +# or access fields directly from the response object +# print(response.message.content) \ No newline at end of file diff --git a/machine-and-deep-learning/ollama/set_paths.sh b/machine-and-deep-learning/ollama/set_paths.sh new file mode 100644 index 0000000000000000000000000000000000000000..9b23d6bbbf378f62e1288921f61a286030a05278 --- /dev/null +++ b/machine-and-deep-learning/ollama/set_paths.sh @@ -0,0 +1,8 @@ +#!/usr/bin/zsh + +# path where Ollama binaries will be placed after download and extraction +export OLLAMA_INSTALL_DIR=${OLLAMA_ROOT_DIR}/install +# path to Python virtual environment +export OLLAMA_VENV_DIR=${OLLAMA_ROOT_DIR}/venv_ollama +# extend path to make it executable in the shell +export PATH="${OLLAMA_INSTALL_DIR}/bin:${PATH}" \ No newline at end of file diff --git a/machine-and-deep-learning/ollama/submit_job_venv.sh b/machine-and-deep-learning/ollama/submit_job_venv.sh new file mode 100644 index 0000000000000000000000000000000000000000..f9ca578cad9e3c97ba9974d9d084fddc84a5a019 --- /dev/null +++ b/machine-and-deep-learning/ollama/submit_job_venv.sh @@ -0,0 +1,67 @@ +#!/usr/bin/zsh +############################################################ +### Slurm flags +############################################################ + +#SBATCH --time=00:15:00 +#SBATCH --partition=c23g +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=24 +#SBATCH --gres=gpu:1 + +############################################################ +### Load modules or software +############################################################ + +# specify your Ollama root directory +export OLLAMA_ROOT_DIR=${HOME}/ollama + +# set dependent paths +source set_paths.sh + +# load Python and activate venv +module load Python +source ${OLLAMA_VENV_DIR}/bin/activate + +############################################################ +### Parameters and Settings +############################################################ + +# print some information about current system +echo "Job nodes: ${SLURM_JOB_NODELIST}" +echo "Current machine: $(hostname)" +nvidia-smi + +############################################################ +### Execution (Model Training) +############################################################ + +# run server in background and redirect output +ollama serve &> log_ollama_serve.log & +# remember ID of process that has just been started in background +export proc_id_serve=$! + +# wait until Ollama server is up +sleep 5 + +# run desired model instance in background +ollama run llama3.2 & + +# wait until model is up +sleep 5 + +# Example: prompt against LLM via REST API (Note: streaming is typically only useful when using a Chat frontends) +echo "========== Example REST API ==========" +curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "prompt":"Why is the sky blue?", "stream": false}' +echo "\n" + +# Example: prompt against LLM through ollama-python +echo "========== Example Python via ollama-python ==========" +python3 ollama-example.py + +# cleanup: stop model and kill serve and run processes +ollama stop llama3.2 +kill -9 ${proc_id_serve} +# kill remaining ollama procs if not already done +ps aux | grep '[o]llama' | awk '{print $2}' | xargs -r kill -9 diff --git a/pytorch/cifar10/submit_job_container.sh b/machine-and-deep-learning/pytorch/cifar10/submit_job_container.sh similarity index 100% rename from pytorch/cifar10/submit_job_container.sh rename to machine-and-deep-learning/pytorch/cifar10/submit_job_container.sh diff --git a/pytorch/cifar10/submit_job_utilization_monitoring.sh b/machine-and-deep-learning/pytorch/cifar10/submit_job_utilization_monitoring.sh similarity index 100% rename from pytorch/cifar10/submit_job_utilization_monitoring.sh rename to machine-and-deep-learning/pytorch/cifar10/submit_job_utilization_monitoring.sh diff --git a/pytorch/cifar10/submit_job_venv.sh b/machine-and-deep-learning/pytorch/cifar10/submit_job_venv.sh similarity index 100% rename from pytorch/cifar10/submit_job_venv.sh rename to machine-and-deep-learning/pytorch/cifar10/submit_job_venv.sh diff --git a/pytorch/cifar10/train_model.py b/machine-and-deep-learning/pytorch/cifar10/train_model.py similarity index 100% rename from pytorch/cifar10/train_model.py rename to machine-and-deep-learning/pytorch/cifar10/train_model.py diff --git a/pytorch/cifar10_distributed/set_vars.sh b/machine-and-deep-learning/pytorch/cifar10_distributed/set_vars.sh similarity index 100% rename from pytorch/cifar10_distributed/set_vars.sh rename to machine-and-deep-learning/pytorch/cifar10_distributed/set_vars.sh diff --git a/pytorch/cifar10_distributed/submit_job_container.sh b/machine-and-deep-learning/pytorch/cifar10_distributed/submit_job_container.sh similarity index 100% rename from pytorch/cifar10_distributed/submit_job_container.sh rename to machine-and-deep-learning/pytorch/cifar10_distributed/submit_job_container.sh diff --git a/pytorch/cifar10_distributed/submit_job_venv.sh b/machine-and-deep-learning/pytorch/cifar10_distributed/submit_job_venv.sh similarity index 100% rename from pytorch/cifar10_distributed/submit_job_venv.sh rename to machine-and-deep-learning/pytorch/cifar10_distributed/submit_job_venv.sh diff --git a/pytorch/cifar10_distributed/train_model.py b/machine-and-deep-learning/pytorch/cifar10_distributed/train_model.py similarity index 100% rename from pytorch/cifar10_distributed/train_model.py rename to machine-and-deep-learning/pytorch/cifar10_distributed/train_model.py diff --git a/pytorch/mnist/submit_job_container.sh b/machine-and-deep-learning/pytorch/mnist/submit_job_container.sh similarity index 100% rename from pytorch/mnist/submit_job_container.sh rename to machine-and-deep-learning/pytorch/mnist/submit_job_container.sh diff --git a/pytorch/mnist/submit_job_utilization_monitoring.sh b/machine-and-deep-learning/pytorch/mnist/submit_job_utilization_monitoring.sh similarity index 100% rename from pytorch/mnist/submit_job_utilization_monitoring.sh rename to machine-and-deep-learning/pytorch/mnist/submit_job_utilization_monitoring.sh diff --git a/pytorch/mnist/submit_job_venv.sh b/machine-and-deep-learning/pytorch/mnist/submit_job_venv.sh similarity index 100% rename from pytorch/mnist/submit_job_venv.sh rename to machine-and-deep-learning/pytorch/mnist/submit_job_venv.sh diff --git a/pytorch/mnist/train_model.py b/machine-and-deep-learning/pytorch/mnist/train_model.py similarity index 100% rename from pytorch/mnist/train_model.py rename to machine-and-deep-learning/pytorch/mnist/train_model.py diff --git a/pytorch/mnist_distributed/set_vars.sh b/machine-and-deep-learning/pytorch/mnist_distributed/set_vars.sh similarity index 100% rename from pytorch/mnist_distributed/set_vars.sh rename to machine-and-deep-learning/pytorch/mnist_distributed/set_vars.sh diff --git a/pytorch/mnist_distributed/submit_job_container.sh b/machine-and-deep-learning/pytorch/mnist_distributed/submit_job_container.sh similarity index 100% rename from pytorch/mnist_distributed/submit_job_container.sh rename to machine-and-deep-learning/pytorch/mnist_distributed/submit_job_container.sh diff --git a/pytorch/mnist_distributed/submit_job_venv.sh b/machine-and-deep-learning/pytorch/mnist_distributed/submit_job_venv.sh similarity index 100% rename from pytorch/mnist_distributed/submit_job_venv.sh rename to machine-and-deep-learning/pytorch/mnist_distributed/submit_job_venv.sh diff --git a/pytorch/mnist_distributed/train_model.py b/machine-and-deep-learning/pytorch/mnist_distributed/train_model.py similarity index 100% rename from pytorch/mnist_distributed/train_model.py rename to machine-and-deep-learning/pytorch/mnist_distributed/train_model.py diff --git a/scikit-learn/clustering/README.md b/machine-and-deep-learning/scikit-learn/clustering/README.md similarity index 100% rename from scikit-learn/clustering/README.md rename to machine-and-deep-learning/scikit-learn/clustering/README.md diff --git a/scikit-learn/clustering/scikit-learn_clustering.py b/machine-and-deep-learning/scikit-learn/clustering/scikit-learn_clustering.py similarity index 100% rename from scikit-learn/clustering/scikit-learn_clustering.py rename to machine-and-deep-learning/scikit-learn/clustering/scikit-learn_clustering.py diff --git a/scikit-learn/clustering/submit_job_container.sh b/machine-and-deep-learning/scikit-learn/clustering/submit_job_container.sh similarity index 100% rename from scikit-learn/clustering/submit_job_container.sh rename to machine-and-deep-learning/scikit-learn/clustering/submit_job_container.sh diff --git a/scikit-learn/regression/README.md b/machine-and-deep-learning/scikit-learn/regression/README.md similarity index 100% rename from scikit-learn/regression/README.md rename to machine-and-deep-learning/scikit-learn/regression/README.md diff --git a/scikit-learn/regression/scikit-learn_regression.py b/machine-and-deep-learning/scikit-learn/regression/scikit-learn_regression.py similarity index 100% rename from scikit-learn/regression/scikit-learn_regression.py rename to machine-and-deep-learning/scikit-learn/regression/scikit-learn_regression.py diff --git a/scikit-learn/regression/submit_job_container.sh b/machine-and-deep-learning/scikit-learn/regression/submit_job_container.sh similarity index 100% rename from scikit-learn/regression/submit_job_container.sh rename to machine-and-deep-learning/scikit-learn/regression/submit_job_container.sh diff --git a/tensorflow/cifar10/set_vars.sh b/machine-and-deep-learning/tensorflow/cifar10/set_vars.sh similarity index 100% rename from tensorflow/cifar10/set_vars.sh rename to machine-and-deep-learning/tensorflow/cifar10/set_vars.sh diff --git a/tensorflow/cifar10/submit_job_container.sh b/machine-and-deep-learning/tensorflow/cifar10/submit_job_container.sh similarity index 100% rename from tensorflow/cifar10/submit_job_container.sh rename to machine-and-deep-learning/tensorflow/cifar10/submit_job_container.sh diff --git a/tensorflow/cifar10/submit_job_venv.sh b/machine-and-deep-learning/tensorflow/cifar10/submit_job_venv.sh similarity index 100% rename from tensorflow/cifar10/submit_job_venv.sh rename to machine-and-deep-learning/tensorflow/cifar10/submit_job_venv.sh diff --git a/tensorflow/cifar10/train_model.py b/machine-and-deep-learning/tensorflow/cifar10/train_model.py similarity index 100% rename from tensorflow/cifar10/train_model.py rename to machine-and-deep-learning/tensorflow/cifar10/train_model.py diff --git a/tensorflow/cifar10_distributed/README.md b/machine-and-deep-learning/tensorflow/cifar10_distributed/README.md similarity index 100% rename from tensorflow/cifar10_distributed/README.md rename to machine-and-deep-learning/tensorflow/cifar10_distributed/README.md diff --git a/tensorflow/cifar10_distributed/create_tfconfig.py b/machine-and-deep-learning/tensorflow/cifar10_distributed/create_tfconfig.py similarity index 100% rename from tensorflow/cifar10_distributed/create_tfconfig.py rename to machine-and-deep-learning/tensorflow/cifar10_distributed/create_tfconfig.py diff --git a/tensorflow/cifar10_distributed/execution_wrapper.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/execution_wrapper.sh similarity index 100% rename from tensorflow/cifar10_distributed/execution_wrapper.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/execution_wrapper.sh diff --git a/tensorflow/cifar10_distributed/limit_gpu_visibility.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/limit_gpu_visibility.sh similarity index 100% rename from tensorflow/cifar10_distributed/limit_gpu_visibility.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/limit_gpu_visibility.sh diff --git a/tensorflow/cifar10_distributed/set_vars.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/set_vars.sh similarity index 100% rename from tensorflow/cifar10_distributed/set_vars.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/set_vars.sh diff --git a/tensorflow/cifar10_distributed/submit_job_container.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_container.sh similarity index 100% rename from tensorflow/cifar10_distributed/submit_job_container.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_container.sh diff --git a/tensorflow/cifar10_distributed/submit_job_container_horovod.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_container_horovod.sh similarity index 100% rename from tensorflow/cifar10_distributed/submit_job_container_horovod.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_container_horovod.sh diff --git a/tensorflow/cifar10_distributed/submit_job_container_single-node.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_container_single-node.sh similarity index 100% rename from tensorflow/cifar10_distributed/submit_job_container_single-node.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_container_single-node.sh diff --git a/tensorflow/cifar10_distributed/submit_job_venv.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_venv.sh similarity index 100% rename from tensorflow/cifar10_distributed/submit_job_venv.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_venv.sh diff --git a/tensorflow/cifar10_distributed/submit_job_venv_horovod.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_venv_horovod.sh similarity index 100% rename from tensorflow/cifar10_distributed/submit_job_venv_horovod.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_venv_horovod.sh diff --git a/tensorflow/cifar10_distributed/submit_job_venv_single-node.sh b/machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_venv_single-node.sh similarity index 100% rename from tensorflow/cifar10_distributed/submit_job_venv_single-node.sh rename to machine-and-deep-learning/tensorflow/cifar10_distributed/submit_job_venv_single-node.sh diff --git a/tensorflow/cifar10_distributed/train_model.py b/machine-and-deep-learning/tensorflow/cifar10_distributed/train_model.py similarity index 100% rename from tensorflow/cifar10_distributed/train_model.py rename to machine-and-deep-learning/tensorflow/cifar10_distributed/train_model.py diff --git a/tensorflow/cifar10_distributed/train_model_horovod.py b/machine-and-deep-learning/tensorflow/cifar10_distributed/train_model_horovod.py similarity index 100% rename from tensorflow/cifar10_distributed/train_model_horovod.py rename to machine-and-deep-learning/tensorflow/cifar10_distributed/train_model_horovod.py