On the other hand, ER has the disadvantage that memory usage grows with each added task, which is a unrealistic premise in practice. A fixed memory mitigates this problem, but has the negative effect that samples from previous sub-tasks are overwritten as soon as the global budget limit is reached.
%
% TODO: WIP AK
% Ergebnisse von LR kurz zusammmen fassen, was lässt sich beobachten
% CF für DGR, kackt total ab...
As resulting data shows, DGR suffers from catastrophic forgetting...
%
% Konter-Argument: ER nur schlecht weil budget zu klein
Moreover, it is surprising to see that AR is able to show better results for CL training than latent ER on SVHN and CIFAR10. One may argue that the per-class budget is rather small for a complex dataset like CIFAR10, and it is expected that CL performance will rise this way, however, we stress again that this is not trivially applicable in scenarios with a constrained memory budget.
% AR leidet nur unter signifikantem Forgetting bei CIFAR-10; T1 (Klasse 4-9) nimmt stark ab nach T5 (Klasse 2)
For latent replay ()SVHN and CIFAR), the resulting data shows that DGR suffers from catastrophic forgetting, although having the same baseline as latent ER and AR. Forgetting for AR seems to only be significant for CIFAR D5-$1^5$B after task $T_5$, due to a high overlap with classes from initial task $T_1$.
% Argument: ER nur schlecht weil budget zu klein
Moreover, it is surprising to see that AR is able to show better classification results for CL experiments than latent ER. It could be argued that the budget per class for a complex dataset like SVHN and CIFAR10 is rather small, as it can be assumed that this will increase CL performance in this way. However, we stress again that this is not trivially applicable in scenarios with a constrained memory budget.
%
\par\noindent\textbf{CF and selective replay:}
AR shows promising results in terms of knowledge retention, or prevention of forgetting, for sequentially learned classes, as reflected by generally lower average forgetting. We observed very little loss of knowledge on the first task $T_1$ after full training, suggesting that AR's ability to handle small incremental additions/updates to the internal knowledge base over a sequence of tasks is an intrinsic property, due to the selective replay mechanism.
...
...
@@ -503,7 +502,6 @@
Detailed information about the evaluation and experimental setup can be found in \cref{sec:exppeval}.
\label{tab:short_results}}
\end{table}
%
\begin{figure}[h!]
\centering
\begin{subfigure}{.4\textwidth}
...
...
@@ -516,7 +514,6 @@
%Right: Successive tasks have no significant overlap. This is shown using the negative GMM log-likelihood for AR after training on task $T_i$ and then keeping the GMM fixed. As we can observe, log-likelihood universally drops, indicating a poor match.
\label{fig:gen_samples_loglik_plot}
\end{figure}
%
\section{Discussion}
In summary, we can state that our AR approach clearly surpasses VAE-based DGR in the evaluated CIL-P when constraining replay to a constant-time strategy. This is remarkable because the AR scholar performs the tasks of both solver and generator, while at the same time having less parameters. The advantage of AR becomes even more pronounced when considering forgetting prevention instead of simply looking at the classification accuracy results.