diff --git a/iclr2024_conference.pdf b/iclr2024_conference.pdf
index d5ca8098e692ec577e00661db708f80fe6313212..dc3d8d92834c80df030d73b4ab62257cd93b49fd 100644
Binary files a/iclr2024_conference.pdf and b/iclr2024_conference.pdf differ
diff --git a/iclr2024_conference.tex b/iclr2024_conference.tex
index b78794fea3019146acd285dc6e077c8666a7eeb0..2edca24151e6212fbf7c3527c269a4ab66df1037 100644
--- a/iclr2024_conference.tex
+++ b/iclr2024_conference.tex
@@ -221,11 +221,10 @@
 	% Machine description
 	All experiments are run on a cluster of 30 machines equipped with single RTX3070Ti GPUs.
 	% General experimental setup -> ML domain
-	Replay is investigated in a supervised CIL-scenario, assuming known task-boundaries and disjoint classes.
+	Replay is investigated in a supervised CIL-scenario, assuming known task-boundaries and disjoint classes. All of the following details apply to all investigated CL algorithms, namely AR, ER and DGR with VAEs.
 	% Balancing of Tasks/Classes
 	Tasks $T_{i}$ contain all samples of the corresponding classes defining them, see \cref{tab:slts} for details. 
-	% TODO: OK ???
-	It is assumed that data from all tasks occurs with equal probability, however, it is not ensured that the amount/variability of samples per class is balanced, see e.g., SVHN classes 1 \& 2, which may render certain sub-task settings as more difficult.
+	It is assumed that data from all tasks occur with equal probability. Some datasets are slightly unbalanced, for example Fruits and SVHN classes 1 and 2, which may render certain sub-task settings as more difficult.
 	% Initial/Replay
 	Training consists of an (initial) run on $T_1$, followed by a sequence of independent (replay) runs on $T_{i>1}$.
 	% Averaged over runs & baseline experiments
@@ -265,14 +264,17 @@
 		\label{tab:slts}
 		}
 	\end{table}
+	% TODO: scholars are not M?
+	We set the training mini-batch size to $\beta=100$ ($\beta=50$ for the Fruits dataset). Selective replay of $D_i$ samples is performed before training on task $T_{i}, i>1$ using the current scholar $S_{i-1}$, where $D_i$ represents the amount of training samples contained $T_i$.
+    This strategy keeps the number of generated samples constant w.r.t the number of tasks, and thus comes with modest temporary storage requirements instead of growing linearly with an increasing amount of incoming tasks. 
 	
-	We set the training and sampling mini-batch size to $\beta=100$ ($\beta=50$ for the Fruits dataset). Sample generation is performed before training on $T_{i>1}$ using the current scholar $S_{i-1}$. When replaying, a mini-batch is constituted of real samples $\beta_R$ from task $T_i$, and generated/artificial ones $\beta_R$ from $S_{i-1}$, mixed in a 1:1 proportion. For training, mini-batches are randomly drawn from this resulting merged subset $\mathcal{D}_{T_i}$.
+	When replaying, mini-batches of $\beta$ samples are randomly drawn, in equal proportions, from the real samples from task $T_i$ and the generated samples representing previous tasks.
+	%For training, mini-batches are randomly drawn from this resulting merged subset $\mathcal{D}_{T_i}$.
 	%
-	Furthermore, we limit the number of generated samples to $D_i$ for replay, where $D_i$ is equal to the amount of samples contained in the current training set of $T_i$. This strategy keeps the number of generated samples constant w.r.t the task size, and thus comes with modest temporary storage requirements instead of growing linearly with an increasing amount of incoming tasks. 
-	%
-	Additionally, this trivial approach dismisses all assumptions regarding task/class balancing for the resulting merged data. Please also note that we do not assume an equal distribution of past task classes, nor is it required to preserve any information about previously encountered data instances/labels.
+	
+	It is worth noting that classes will, in general, \textit{not} be balanced in the merged generated/real data at $T_i$, and that it is not required to store the statictics of previously encountered class instances/labels.
 	%-------------------------------------------------------------------------
-	\subsection{Variant generation with GMMs}
+	\subsection{Selective replay functionality}
 	%
 	\begin{figure}[h!]
 		\centering
@@ -285,7 +287,7 @@
 		\caption{\label{fig:vargen} An example for variant generation in AR, see \cref{sec:approach} and \cref{fig:var} for details. Left: centroids of the current GMM scholar trained on MNIST classes 0, 4 and 6. Middle: query samples of MNIST class 9. Right: variants generated in response to the query. Component weights and variances are not shown.
 		}
 	\end{figure}
-	First, we demonstrate the ability of a GMM layer $L_{(G)}$ to query its internal representation through data samples and selectively generate artificial data that \enquote{best match} those that define the query. To illustrate this, we train a GMM layer of $K=25$ components on MNIST classes 0,4 and 6 for 50 epochs using the best-practice rules described in \cref{app:ar}. Then, we query the trained GMM with samples from class 9 uniquely, as described in \cref{sec:gmm}. The resulting samples are all from class 4, since it is the class that is \enquote{most similar} to the query class. These results are visualized in \cref{fig:var}. Variant generation results for deep convolutional extensions of GMMs can be found in \cite{gepperth2021new}, emphasizing that the AR approach can be scaled to more complex problems.
+	First, we demonstrate the ability of a trained GMM to query its internal representation through data samples and selectively generate artificial data that \enquote{best match} those that define the query. To illustrate this, we train a GMM layer of $K=25$ components on MNIST classes 0,4 and 6 for 50 epochs using the best-practice rules described in \cref{app:ar}. Then, we query the trained GMM with samples from class 9 uniquely, as described in \cref{sec:gmm}. The resulting samples are all from class 4, since it is the class that is \enquote{most similar} to the query class. These results are visualized in \cref{fig:var}. Variant generation results for deep convolutional extensions of GMMs can be found in \cite{gepperth2021new}, emphasizing that the AR approach can be scaled to more complex problems.
 	%-------------------------------------------------------------------------
 	\subsection{Comparison: AR, ER and DGR-VAE}
 	% BASELINE FOR RAW PIXEL/DATA INPUT