% Optional math commands from https://github.com/goodfeli/dlbook_notation.
% Optional math commands from https://github.com/goodfeli/dlbook_notation.
\input{math_commands.tex}
\input{math_commands.tex}
...
@@ -186,7 +187,40 @@
...
@@ -186,7 +187,40 @@
A new sample will cause adaptation of the scholar in a localized region of data space. Variants generated by that sample will, due to similarity, cause adaptation in the same region. Knowledge in the overlap region will therefore be adapted to represent both, while dissimilar regions stay unaffected (see \cref{fig:var} for a visual impression).
A new sample will cause adaptation of the scholar in a localized region of data space. Variants generated by that sample will, due to similarity, cause adaptation in the same region. Knowledge in the overlap region will therefore be adapted to represent both, while dissimilar regions stay unaffected (see \cref{fig:var} for a visual impression).
None of these requirements are fulfilled by DNNs, which is why we implement the scholar by a \enquote{flat} GMM layer (generator/feature encoder) followed by a linear classifier (solver). Both are independently trained via SGD according to \cite{gepperth2021gradient}. Extensions to deep convolutional GMMs (DCGMMs) \cite{gepperth2021new} for higher sampling capacity can be incorporated as drop-in replacements for the generator.
None of these requirements are fulfilled by DNNs, which is why we implement the scholar by a \enquote{flat} GMM layer (generator/feature encoder) followed by a linear classifier (solver). Both are independently trained via SGD according to \cite{gepperth2021gradient}. Extensions to deep convolutional GMMs (DCGMMs) \cite{gepperth2021new} for higher sampling capacity can be incorporated as drop-in replacements for the generator.
% ------
\begin{figure}[ht]
\centering
\begin{minipage}{.6\linewidth}
\begin{algorithm}[H]
\small
\SetAlgoLined
\caption{Adiabatic Replay}\label{alg:two}
\KwData{AR scholar $\Phi$, real data $\mathcal{D}_{R}$}
\For{$t \in2...T$}{% from T2...TN
\For{$\mathcal{B}_{N}\sim\mathcal{D}_{R_t}$}{% iterate over merged batches
% perform a sampling op. from the probability density described by the GMM we traverse the network layers in a backwards direction, returns a batch of samples based on the prototype responses from the forward call on xs
is an intrinsic property of GMMs. They describe data distributions by a set of $K$\textit{components}, consisting of component weights $\pi_k$, centroids $\vmu_k$ and covariance matrices $\mSigma_k$. A data sample $\vx$ is assigned a probability $p(\vx)=\sum_k \pi_k \mathcal N(\vx ; \vmu_k, \mSigma_k)$ as a weighted sum of normal distributions $\mathcal N(\vx; \vmu_k, \mSigma_k)$. Training of GMMs is performed as detailed in \cite{gepperth2021gradient} by adapting centroids, covariance matrices and component weights through the SGD-based minimization of the negative log-likelihood $\mathcal L =\sum_n \log\sum_k \pi_k \mathcal N(\vx_n; \vmu_k,\mSigma_k)$.
is an intrinsic property of GMMs. They describe data distributions by a set of $K$\textit{components}, consisting of component weights $\pi_k$, centroids $\vmu_k$ and covariance matrices $\mSigma_k$. A data sample $\vx$ is assigned a probability $p(\vx)=\sum_k \pi_k \mathcal N(\vx ; \vmu_k, \mSigma_k)$ as a weighted sum of normal distributions $\mathcal N(\vx; \vmu_k, \mSigma_k)$. Training of GMMs is performed as detailed in \cite{gepperth2021gradient} by adapting centroids, covariance matrices and component weights through the SGD-based minimization of the negative log-likelihood $\mathcal L =\sum_n \log\sum_k \pi_k \mathcal N(\vx_n; \vmu_k,\mSigma_k)$.