clean wiki authored by David Gilbert's avatar David Gilbert
**Table of Contents**
[[_TOC_]]
This is WIP.
## Accounts and storage
### The `vrdemo` account
The `vrdemo` account is meant as a **maintenance account** and provides a shared filesystem accessible by all other users of the Cave cluster. File permissions are unfortunately a bit tricky, a few quirks still need to be worked out. It is not a replacement for `av006de`, you should use your own user account to interact with the system. Additionally, it fulfills the following functions:
- Runs the aixcave-weblauncher as an apptainer service, which you can interact with through the webbrowser.
- Runs a Switchboard Listener apptainer instance on the whole cluster which you can connect to via Switchboard.
- Updates engine sources, apptainer images and various other utilities.
### User accouts
You log in with your regular cluster user accounts and can do the following:
- Access the weblauncher via your browser to start demos
- Use Switchboard to interact with `vrdemo`'s Switchboard Listeners
- Use Apptainer to launch UnrealEditor directly
- Use Apptainer to launch your own fleet of SwitchboardListeners
### Storage / File Organisation
The current file system structure looks as follows:
- `/home/vrdemo`
- `aixcave-weblauncher`
- `configs` _contains configs for switchboard, nDisplay, dtrack and vrpn_
- `scripts` _contains scripts for apptainer, maintenance, launching_
- `tmp` _contains temporary data, mainly for nDisplay config copies_
- `/work/vrdemo`
- `apptainer_images` _contains weblauncher, runtime and switchboard listener images_
- `projects` _contains CI projects and finished demos_
- `unreal_engines` _contains binary and source versions of various unreal engines_
- `users` _contains shared user data, such as projects in development_
## Configurations and Scripts
- nDisplay config
- launchscripts
- switchboard
## Apptainer
### Why Apptainer
We use a container solution, as Unreal Engine needs a glibc `v2.35+` for productive use but the Cluster OS only provide a glibc of lower version (CentOS 7 `v2.17` & Rocky 8 `v2.28`, Rocky 9 `v2.34`).
Using a containerized environment we can resort to a newer glibc than the system provides.
When using an older glibc than recommended provokes a sorting bug which results in excessively long startup times of Unreal Engine.
We use Apptainer as it is the chosen container system of the RWTH (for which help articles are provided [here](https://help.itc.rwth-aachen.de/service/rhr4fjjutttf/article/e6f146d0d9c04d35aeb98da8d261e38b/)).
### Usage
Apptainer is currently used for three things:
#### 1. Run the editor directly
The Unreal Editor can easily be run by using the [right container image](https://github.com/adamrehn/ue4-runtime). Based on the docker image, an apptainer .sif image is created at `/work/vrdemo/apptainer_images/ue4-runtime:22.04-cudagl12-x11.sif`.
To actually run a container, simply execute:
`apptainer shell --nv --bind /work/vrdemo,/home/vrdemo /work/vrdemo/apptainer_images/ue4-runtime:22.04-cudagl12-x11.sif`
Then, just use the same commands as always to [generate, build and start your project](/Unreal/Unreal-on-Linux). Engines are generally located at `work/vrdemo/unreal_engines`.
#### 2. Run Switchboard Listener on the cluster and launch the Cave
Running the Switchboard Listener on the cluster serves three main purposes. All of those will be executed **within** the container the Listener runs in, as it simply spawns sub-processes. You can interact with them via Switchboard.
- **Launch a project on the Cave:** This is the main purpose. Once the Listener is deployed, open Switchboard and connect to the instances on the cluster. Starting them will send a command to the Listener on each node, including the primary, and start the respective Unreal Instances with the correct nDisplay arguments.
- **Start the Unreal Editor locally via the Switchboard Listener:** While the editor can be started from within a container shell as well, using Switchboard to start it is more convenient. You can additionally build your project directly from Switchboard as well (see [here](Unreal/Unreal-on-Linux#switchboard). This will build and start your project on the primary node.
- **Deploy a dedicated server via the Switchboard Listener:** For multiplayer applications on the cave, a dedicated server can easily be started within the Listener's container instance via Switchboard.
To actually use and launch switchboard, head over to [starting switchboard and switchboard listener](Unreal/Launching-Projects-on-CAVE#starting-switchboard-and-switchboard-listener).
#### 3. Run aixcave-weblauncher
The aixcave-weblauncher can also be run as a container instance. Ideally, it is already running. Make sure you run this from the `vrdemo` account.
`apptainer instance start --nv --writable-tmpfs --bind /vr,/run/user/30356,/etc/machine-id /work/vrdemo/apptainer_images/aixcave-weblauncher.sif instance-weblauncher`
#### 4. Stop SwitchboardListener
It can happen that another user is running SwitchboardListener while you are trying to deploy your own listeners. They then get assigned the wrong port, and you automatically connect to the wrong ones.
If you notice that this happens, you can kill the connected listeners via the Tools Menu in Switchboard:
![KillListener](uploads/22642d7726a3039a4258c982c0bfba09/KillListener.png)
### Maintenance
See [Maintenance](Unreal/Unreal-Maintenance#apptainer) for Apptainer specific maintenance
### Open Questions / Todos
- How do we permanently keep the switchboard listener images running on vrdemo?
- How can we kill Unreal processes within those images if we're not authenticated as vrdemo?
- Can we somehow start/stop those instances as a different user without su vrdemo?
- Look into overlay/underlay usage
- Access to av006de for vrdemo and legacy demos?
## [Unreal Maintenance](Unreal/Unreal Maintenance)
See [Unreal Maintenance](Unreal/Unreal Maintenance) for more information on how to maintain Apptainer, Unreal and Switchboard.
## Troubleshooting and FAQ
Sometimes things don't work out. Below are some countermeasures (sorted from least to most extreme).
#### My application crashes after executing `launch_aixcave.sh`
- Run `clean_aixcave.sh`, wait some time and execute launch script again
#### Error: `Cannot find a compatible vulkan device that supports surface presentation`
- Scenario: Happens on master when executing the application via `launch_aixcave.sh`
- ~~Open `launch_aixcave.sh` and change `dc_gpu` **for master node** to the other value (0/1). I.e., look for the line containing `dc_gpu` below the comment `#Launching master`~~
- This should never happen on the master node again, as there is only 1 GPU now. Make sure the index is set to **0**
- If it also happens on the slave nodes, swap gpu numbers also under `# set display based on name of node` for both eyes.
- Scenario: Happens on master when executing the application locally (not on cluster via script)
- You need to select another GPU for presentation. Add `-graphicsadapter=0` or `-graphicsadapter=1` to the command.
- **This should also never happen anymore on the master due to only one GPU. Make sure it is set to 0**
#### Error: Problems with SDL initialize on the slave nodes
- Restart XServer on slaves
```
su vrsw
/home/vrsw/gpucluster/bin/gpucluster-execute.sh -p -m -s /home/vrsw/gpucluster/bin/restart_xserver.sh;
/home/vrsw/gpucluster/bin/setBackGround2RedBlue_VCI_ITC.sh
```
\ No newline at end of file