Merge branch 'master' of gitlab.cs.hs-fulda.de:fdai0114/iclr24-ar-foundation

f59dc19c · fdai0234 · 716735c6 · d4ad274f · f59dc19c · f59dc19c
Commit f59dc19c authored 1 year ago by fdai0234
--- a/iclr2024_conference.pdf
+++ b/iclr2024_conference.pdf
--- a/iclr2024_conference.tex
+++ b/iclr2024_conference.tex
@@ -221,11 +221,10 @@
 	% Machine description
 	All experiments are run on a cluster of 30 machines equipped with single RTX3070Ti GPUs.
 	% General experimental setup -> ML domain
-	Replay is investigated in a supervised CIL-scenario, assuming known task-boundaries and disjoint classes.
+	Replay is investigated in a supervised CIL-scenario, assuming known task-boundaries and disjoint classes. All of the following details apply to all investigated CL algorithms, namely AR, ER and DGR with VAEs.
 	% Balancing of Tasks/Classes
 	Tasks $T_{i}$ contain all samples of the corresponding classes defining them, see \cref{tab:slts} for details. 
-	% TODO: OK ???
+	It is assumed that data from all tasks occur with equal probability. Some datasets are slightly unbalanced, for example Fruits and SVHN classes 1 and 2, which may render certain sub-task settings as more difficult.
-	It is assumed that data from all tasks occurs with equal probability, however, it is not ensured that the amount/variability of samples per class is balanced, see e.g., SVHN classes 1 \& 2, which may render certain sub-task settings as more difficult.
 	% Initial/Replay
 	Training consists of an (initial) run on $T_1$, followed by a sequence of independent (replay) runs on $T_{i>1}$.
 	% Averaged over runs & baseline experiments
@@ -265,14 +264,17 @@
 		\label{tab:slts}
 		}
 	\end{table}
+	% TODO: scholars are not M?
+	We set the training mini-batch size to $\beta=100$ ($\beta=50$ for the Fruits dataset). Selective replay of $D_i$ samples is performed before training on task $T_{i}, i>1$ using the current scholar $S_{i-1}$, where $D_i$ represents the amount of training samples contained $T_i$.
+    This strategy keeps the number of generated samples constant w.r.t the number of tasks, and thus comes with modest temporary storage requirements instead of growing linearly with an increasing amount of incoming tasks. 
-	We set the training and sampling mini-batch size to $\beta=100$ ($\beta=50$ for the Fruits dataset). Sample generation is performed before training on $T_{i>1}$ using the current scholar $S_{i-1}$. When replaying, a mini-batch is constituted of real samples $\beta_R$ from task $T_i$, and generated/artificial ones $\beta_R$ from $S_{i-1}$, mixed in a 1:1 proportion. For training, mini-batches are randomly drawn from this resulting merged subset $\mathcal{D}_{T_i}$.
+	When replaying, mini-batches of $\beta$ samples are randomly drawn, in equal proportions, from the real samples from task $T_i$ and the generated samples representing previous tasks.
+	%For training, mini-batches are randomly drawn from this resulting merged subset $\mathcal{D}_{T_i}$.
 	%
-	Furthermore, we limit the number of generated samples to $D_i$ for replay, where $D_i$ is equal to the amount of samples contained in the current training set of $T_i$. This strategy keeps the number of generated samples constant w.r.t the task size, and thus comes with modest temporary storage requirements instead of growing linearly with an increasing amount of incoming tasks. 
-	%
+	It is worth noting that classes will, in general, \textit{not} be balanced in the merged generated/real data at $T_i$, and that it is not required to store the statictics of previously encountered class instances/labels.
-	Additionally, this trivial approach dismisses all assumptions regarding task/class balancing for the resulting merged data. Please also note that we do not assume an equal distribution of past task classes, nor is it required to preserve any information about previously encountered data instances/labels.
 	%-------------------------------------------------------------------------
-	\subsection{Variant generation with GMMs}
+	\subsection{Selective replay functionality}
 	%
 	\begin{figure}[h!]
 		\centering
@@ -285,7 +287,7 @@
 		\caption{\label{fig:vargen} An example for variant generation in AR, see \cref{sec:approach} and \cref{fig:var} for details. Left: centroids of the current GMM scholar trained on MNIST classes 0, 4 and 6. Middle: query samples of MNIST class 9. Right: variants generated in response to the query. Component weights and variances are not shown.
 		}
 	\end{figure}
-	First, we demonstrate the ability of a GMM layer $L_{(G)}$ to query its internal representation through data samples and selectively generate artificial data that \enquote{best match} those that define the query. To illustrate this, we train a GMM layer of $K=25$ components on MNIST classes 0,4 and 6 for 50 epochs using the best-practice rules described in \cref{app:ar}. Then, we query the trained GMM with samples from class 9 uniquely, as described in \cref{sec:gmm}. The resulting samples are all from class 4, since it is the class that is \enquote{most similar} to the query class. These results are visualized in \cref{fig:var}. Variant generation results for deep convolutional extensions of GMMs can be found in \cite{gepperth2021new}, emphasizing that the AR approach can be scaled to more complex problems.
+	First, we demonstrate the ability of a trained GMM to query its internal representation through data samples and selectively generate artificial data that \enquote{best match} those that define the query. To illustrate this, we train a GMM layer of $K=25$ components on MNIST classes 0,4 and 6 for 50 epochs using the best-practice rules described in \cref{app:ar}. Then, we query the trained GMM with samples from class 9 uniquely, as described in \cref{sec:gmm}. The resulting samples are all from class 4, since it is the class that is \enquote{most similar} to the query class. These results are visualized in \cref{fig:var}. Variant generation results for deep convolutional extensions of GMMs can be found in \cite{gepperth2021new}, emphasizing that the AR approach can be scaled to more complex problems.
 	%-------------------------------------------------------------------------
 	\subsection{Comparison: AR, ER and DGR-VAE}
 	% BASELINE FOR RAW PIXEL/DATA INPUT