RobustGaSP
RobustGaSP, standing for Robust Gaussian Stochastic Process Emulation, is a R package developed by Gu et al. [2019]. It implements robust methods to estimate the unknown parameters which can greatly improve predictive performance of the built emulator. In addition, it implements the PP GaSP (parallel partial Gaussian stochastic process) emulator for simulators which return multiple values as output, see Gu and Berger [2016] for more detail.
In PSimPy.emulator.robustgasp
, we have implemented class
ScalarGaSP
and class PPGaSP
based on RobustGaSP and
rpy2, which allows us to use RobustGaSP directly from within Python.
Theory recap of GP emulation
A simulator represents a mapping from an input space to an output space. Gaussian process emulation treats a computationally expensive simulator as an unknown function from a Bayesian perspective: the prior belief of the simulator, namely a Gaussian process, is updated based on a modest number of simulation runs, leading to a posterior which can be evaluated much faster than the simulator and can then be used for computationally demanding analyses [Zhao et al., 2021]. A recap of Gaussian process emulation is provided below, which is adapted from Chapter 3 of Zhao [2021].
GP prior
Let \(f(\cdot)\) denote a simulator that one wants to approximate. It defines a mapping from a \(p\)-dimensional input \(\mathbf{x}=(x_1,\ldots,x_p)^T \in \mathcal{X} \subset{\mathbb{R}^p}\) to a scalar output \(y \in \mathbb{R}\).
From the Bayesian perspective, the prior belief of the simulator is modeled by a Gaussian process, namely
The Gaussian process is fully specified by its mean function \(m(\cdot)\) and covariance function \(C(\cdot,\cdot)\). The mean function is usually modeled by parametric regression
with \(q\)-dimensional vectors \(\mathbf{h}(\mathbf{x})=\left(h_{1}(\mathbf{x}), h_{2}(\mathbf{x}), \ldots, h_{q}(\mathbf{x})\right)^T\) and \(\boldsymbol{\beta}=\left(\beta_{1}, \beta_{2}, \ldots, \beta_{q}\right)^T\) denoting the chosen basis functions and unknown regression parameters respectively. The basis functions are commonly chosen as constant \(\mathbf{h}(\mathbf{x})=1\) or some prescribed polynomials like \(\mathbf{h}(\mathbf{x})=(1,x_1,\ldots,x_p)^T\).
The covariance function is often assumed to be stationary and typically has a separable form of
where \(c\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)\) is the correlation function, and \(c_{l}\left(x_{i l}, x_{j l}\right)\) corresponds to the correlation between the \(l\)-th component of \(\mathbf{x}_i\) and the counterpart of \(\mathbf{x}_j\). Various correlation functions can be chosen for \(c_{l}\left(x_{i l}, x_{j l}\right)\). In what follows a specific correlation function from the Matern family is chosen to illustrate the theory. It should be noted that it can be replaced by any other correlation function without loss of generality.
where \(\psi_l\) is the correlation length parameter in the \(l\)-th dimension. The unknowns in the covariance function \(C\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)\) are the variance \(\sigma^2\) and the \(p\) correlation length parameters \(\boldsymbol{\psi}=(\psi_1,\ldots,\psi_p)^T\).
GP posterior
Equations (1) to (4) encode prior knowledge of the simulator \(f(\cdot)\). From the Bayesian viewpoint, this prior knowledge can be updated to a posterior based on a limited number of training data. Training data here refer to \(n^{tr}\) pairs of input-output data of \(f(\cdot)\).
Let \(\mathbf{x}^{tr}:=\{\mathbf{x}_i\}_{i=1}^{n^{tr}}\) and \(\mathbf{y}^{tr}:=\{f(\mathbf{x}_i)\}_{i=1}^{n^{tr}}\) denote the training inputs and outputs respectively. Since any finite number of random variables from a Gaussian process are jointly Gaussian distributed, the joint distribution of the \(n^{tr}\) outputs \(\mathbf{y}^{tr}\) follow a \(n^{tr}\)-dimensional Gaussian distribution
where \(\mathbf{H}=\left[\mathbf{h}(\mathbf{x}_{1}), \ldots, \mathbf{h}(\mathbf{x}_{n^{tr}}) \right]^T\) is the \(n^{tr} \times q\) basis design matrix and \(\mathbf{R}\) is the \(n^{tr} \times n^{tr}\) correlation matrix with \((i,j)\) element \(c(\mathbf{x}_{i}, \mathbf{x}_{j})\). Equation (5) is known as the likelihood function.
Similarly, the joint distribution of the function output \(y^*\) at any untried input \(\mathbf{x}^*\) and \(\mathbf{y}^{tr}\) follows a \((n^{tr}+1)\)-dimensional Gaussian distribution
where \(\mathbf{r}(\mathbf{x}^*)=\left(c(\mathbf{x}^*, \mathbf{x}_{1}), \ldots, c(\mathbf{x}^*, \mathbf{x}_{n^{tr}}) \right)^T\). According to the property of the multivariate Gaussian distribution, the conditional distribution of \(y^*\) conditioned on \(\mathbf{y}^{tr}\) is again a Gaussian distribution, namely
Equations (7) to (9) can be easily extended to any dimensionality since the joint distribution of the function outputs at any number of untried inputs and \(\mathbf{y}^{tr}\) also follows a multivariate Gaussian distribution. Equations (7) to (9) therefore essentially define the posterior which is an updated Gaussian process, given the unknown regression parameters \(\boldsymbol{\beta}\), variance \(\sigma^2\), and correlation length parameters \(\boldsymbol{\psi}\).
For any new input \(\mathbf{x}^*\), the output \(y^*\) can be almost instantaneously predicted, given by the mean \(m'\), namely equation (8). Moreover, the uncertainty associated with the prediction can be estimated by for example the variance \(\sigma^2c'\). Depending on how the unknown parameters (\(\boldsymbol{\beta}\), \(\sigma^2\), and \(\boldsymbol{\psi}\)) are learned based on the training data, the posterior can have various forms, see Section 3.3 of Zhao [2021].
Sample from GP
A Gaussian process, either the prior or the posterior, defines a distribution over functions. Once its mean function \(m(\cdot)\) and covariance function \(C(\cdot,\cdot)\) are specified, the Gaussian process can be evaluated at a finite number \(s\) of input points \(\{\mathbf{x}_i\}_{i=1}^{s}\). The joint distribution of \(\{y_i=f(\mathbf{x}_i)\}_{i=1}^{s}\) follows a \(s\)-variate Gaussian distribution. A sample of the \(s\)-variate Gaussian distribution, denoted as \(\tilde{\mathbf{y}}:=(\tilde{y}_1,\ldots,\tilde{y}_s)^T\), can be viewed as a sample of the Gaussian process (evaluated at a finite number of points). To generate \(\tilde{\mathbf{y}}\), the \(s \times s\) covariance matrix \(\boldsymbol{\Sigma}\) needs to be decomposed into the product of a lower triangular matrix \(\mathbf{L}\) and its conjugate transpose \(\mathbf{L}^T\) using the Cholesky decomposition
Then a sample \(\tilde{\mathbf{y}}\) can be obtained as follows:
where \(\mathbf{w}:=(w_1,\dots,w_s)^T\) denotes a \(s\)-dimensional random vector consisting of \(s\) independent standard normal random variables \(w_i,i \in \{1,\ldots,s\}\).
ScalarGaSP Class
Class ScalarGaSP
provides Gaussian process emulation for simulators
which return a scalar value as output, namely \(y \in \mathbb{R}\). Please
refer to Gu et al. [2018] for the detailed theory.
It is imported by:
from psimpy.emulator.robustgasp import ScalarGaSP
Methods
- class ScalarGaSP(ndim, zero_mean='No', nugget=0.0, nugget_est=False, range_par=NA, method='post_mode', prior_choice='ref_approx', a=0.2, b=None, kernel_type='matern_5_2', isotropic=False, optimization='lbfgs', alpha=None, lower_bound=True, max_eval=None, num_initial_values=2)[source]
Set up basic parameters of the emulator.
- Parameters
ndim (int) – Input parameter dimension.
zero_mean (str, optional) – If ‘Yes’, the mean is set to zero. If ‘No’, the mean is specified by
trend
. Default is ‘No’.nugget (float, optional) – Nugget variance ratio. If \(0\), there is no nugget or the nugget is estimated. If not \(0\), it means a fixed nugget. Default is \(0\).
nugget_est (bool, optional) – True means
nugget
should be estimated. False meansnugget
is fixed or not estimated. Default is False.range_par (numpy array, optional) – Range parameters. One dimension
numpy.ndarray
of lengthndim
. If given, range parameters are set to the given values rather than estimated based on training data. By default NA, meaning that range parameters are estimated based on training data.method (str, optional) – Method used for parameter estimation. ‘post_mode’: marginal posterior mode estimation, resulted emulator is a Student’s t-process. ‘mmle’: maximum marginal likelihood estimation, resulted emulator is a Student’s t-process. ‘mle’: maximum likelihood estimation, resulted emulator is a Gaussian process. Default is ‘post_mode’.
prior_choice (str, optional) – ‘ref_xi’, ‘ref_gamma’, or ‘ref_approx’. Default is ‘ref_approx’.
a (float, optional) – Prior parameters in the jointly robust prior. Default \(0.2\).
b (float, optional) – Prior parameters in the jointly robust prior. Default value is \(ntrain^{-1/ndim}(a+ndim)\), where
ntrain
is the number of training input points andndim
is input parameter dimension.kernel_type (str, optional) – ‘matern_3_2’: Matern correlation with roughness parameter \(3/2\). ‘matern_5_2’: Matern correlation with roughness parameter \(5/2\). ‘pow_exp’: power exponential correlation with roughness parameter
alpha
. Default is ‘matern_5_2’.isotropic (bool, optional) – True: isotropic kernel. False: separable kernel. Default is False.
optimization (str, optional) – Optimization method for kernel parameters. ‘lbfgs’: low storage version of the Broyden-Fletcher-Goldarb-Shanno method. ‘nelder-mead’: Nelder and Mead method. ‘brent’: Brent method for one-dimensional problems.
alpha (numpy array, optional) – Roughness parameters in the ‘pow_exp’ correlation functions. One dimension
numpy.ndarray
of lengthndim
. Default is a one dimension array with each entry being \(1.9\).lower_bound (bool) – True: the default lower bounds of the inverse range parameters are used to constrain the optimization. False: the optimization is unconstrained. Default is True.
max_eval (int, optional) – Maximum number of steps to estimate the range and nugget parameters. Default is \(max(30,20+5*ndim)\).
num_initial_values (int, optional) – Number of initial values of the kernel parameters in optimization. Default is \(2\).
- train(design, response, trend=None)[source]
Train a scalar GP given training data.
- Parameters
design (numpy array) – Training input points. Shape
(ntrain, ndim)
.ntrain
is the number of training points,ndim
is the input dimension. If \(ndim=1\), both(ntrain, 1)
and(ntrain,)
numpy.ndarray
are valid.response (numpy array) – Training output points. Shape
(ntrain, 1)
or(ntrain,)
.trend (numpy array, optional) – Basis function matrix. Shape
(ntrain, q)
. If None, a(ntrain, 1)
matrix of ones is used. If \(q=1\), both(ntrain, 1)
and(ntrain,)
numpy.ndarray
are valid.
- Return type
None
- predict(testing_input, testing_trend=None)[source]
Make prediction using the trained scalar GP emulator.
- Parameters
testing_input (numpy array) – Input points at which to make prediction. Shape
(ntest, ndim)
.ntest
is the number of input points andndim
is the input dimension. If \(ndim=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.testing_trend (numpy array, optional) – Basis function matrix corresponds to
testing_input
. Shape(ntest, q)
. By default(ntest, 1)
matrix of ones. If \(q=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.
- Returns
predictions – Emulator-prediction at
testing_input
. Shape(ntest, 4)
. predictions[:, 0] - mean, predictions[:, 1] - low95, predictions[:, 2] - upper95, predictions[:, 3] - sd- Return type
numpy array
- sample(testing_input, nsamples=1, testing_trend=None)[source]
Draw samples using the trained scalar GP emulator.
- Parameters
testing_input (numpy array) – Input points at which to draw samples. Shape
(ntest, ndim)
.ntest
is the number of input points andndim
is the input dimension. If \(ndim=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.nsamples (int, optional) – Number of samples to be drawn.
testing_trend (numpy array, optional) – Basis function matrix corresponds to
testing_input
. Shape(ntest, q)
. By default(ntest, 1)
matrix of ones. If \(q=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.
- Returns
samples – Shape
(ntest, nsamples)
. Each column contains one realization of the trained emulator attesting_input
.- Return type
numpy array
PPGaSP Class
Class PPGaSP
provides Gaussian process emulation for simulators
which return multiple values as output, namely \(y \in \mathbb{R}^k, k>1\).
Please refer to Gu and Berger [2016] for the detailed theory.
It is imported by:
from psimpy.emulator.robustgasp import PPGaSP
Methods
- class PPGaSP(ndim, zero_mean='No', nugget=0.0, nugget_est=False, range_par=NA, method='post_mode', prior_choice='ref_approx', a=0.2, b=None, kernel_type='matern_5_2', isotropic=False, optimization='lbfgs', alpha=None, lower_bound=True, max_eval=None, num_initial_values=2)[source]
Set up basic parameters of the emulator.
- Parameters
ndim (int) – Input parameter dimension.
zero_mean (str, optional) – If ‘Yes’, the mean is set to zero. If ‘No’, the mean is specified by
trend
. Default is ‘No’.nugget (float, optional) – Nugget variance ratio. If \(0\), there is no nugget or the nugget is estimated. If not \(0\), it means a fixed nugget. Default is \(0\).
nugget_est (bool, optional) – True means
nugget
should be estimated. False meansnugget
is fixed or not estimated. Default is False.range_par (numpy array, optional) – Range parameters. One dimension
numpy.ndarray
of lengthndim
. If given, range parameters are set to the given values rather than estimated based on training data. By default NA, meaning that range parameters are estimated based on training data.method (str, optional) – Method used for parameter estimation. ‘post_mode’: marginal posterior mode estimation, resulted emulator is a Student’s t-process. ‘mmle’: maximum marginal likelihood estimation, resulted emulator is a Student’s t-process. ‘mle’: maximum likelihood estimation, resulted emulator is a Gaussian process. Default is ‘post_mode’.
prior_choice (str, optional) – ‘ref_xi’, ‘ref_gamma’, or ‘ref_approx’. Default is ‘ref_approx’.
a (float, optional) – Prior parameters in the jointly robust prior. Default \(0.2\).
b (float, optional) – Prior parameters in the jointly robust prior. Default value is \(ntrain^{-1/ndim}(a+ndim)\), where
ntrain
is the number of training input points andndim
is input parameter dimension.kernel_type (str, optional) – ‘matern_3_2’: Matern correlation with roughness parameter \(3/2\). ‘matern_5_2’: Matern correlation with roughness parameter \(5/2\). ‘pow_exp’: power exponential correlation with roughness parameter
alpha
. Default is ‘matern_5_2’.isotropic (bool, optional) – True: isotropic kernel. False: separable kernel. Default is False.
optimization (str, optional) – Optimization method for kernel parameters. ‘lbfgs’: low storage version of the Broyden-Fletcher-Goldarb-Shanno method. ‘nelder-mead’: Nelder and Mead method. ‘brent’: Brent method for one-dimensional problems.
alpha (numpy array, optional) – Roughness parameters in the ‘pow_exp’ correlation functions. One dimension
numpy.ndarray
of lengthndim
. Default is a one dimension array with each entry being \(1.9\).lower_bound (bool) – True: the default lower bounds of the inverse range parameters are used to constrain the optimization. False: the optimization is unconstrained. Default is True.
max_eval (int, optional) – Maximum number of steps to estimate the range and nugget parameters. Default is \(max(30,20+5*ndim)\).
num_initial_values (int, optional) – Number of initial values of the kernel parameters in optimization. Default is \(2\).
- train(design, response, trend=None)[source]
Train a parallel partial GP given training data.
- Parameters
design (numpy array) – Training input points. Shape
(ntrain, ndim)
.ntrain
is the number of training points,ndim
is the input dimension. If \(ndim=1\), both(ndim, 1)
and(ndim,)
numpy.ndarray
are valid.response (numpy array) – Training output points. Shape
(ntrain, nresponse)
, \(nresponse>1\).trend (numpy array, optional) – Basis function matrix. Shape
(ntrain, q)
. If None, a(ntrain, 1)
matrix of ones is used. If \(q=1\), both(ntrain, 1)
and(ntrain,)
numpy.ndarray
are valid.
- Return type
None
- predict(testing_input, testing_trend=None)[source]
Make prediction using the trained parallel partial GP emulator.
- Parameters
testing_input (numpy array) – Input points at which to make prediction. Shape
(ntest, ndim)
.ntest
is the number of input points andndim
is the input dimension. If \(ndim=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.testing_trend (numpy array, optional) – Basis function matrix corresponds to
testing_input
. Shape(ntest, q)
. By default(ntest, 1)
matrix of ones. If \(q=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.
- Returns
predictions – Emulator-prediction at
testing_input
. Shape(ntest, nresponse, 4)
. predictions[:, :, 0] - mean, predictions[:, :, 1] - low95, predictions[:, :, 2] - upper95, predictions[:, :, 3] - sd- Return type
numpy array
- sample(testing_input, nsamples=1, testing_trend=None)[source]
Draw samples using the trained parallel partial GaSP emulator.
- Parameters
testing_input (numpy array) – Input point at which to draw samples. Shape
(ntest, ndim)
.ntest
is the number of input points andndim
is the input dimension. If \(ndim=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.nsamples (int, optional) – Number of samples to be drawn.
testing_trend (numpy array, optional) – Basis function matrix corresponds to
testing_input
. Shape(ntest, q)
. By default(ntest, 1)
matrix of ones. If \(q=1\), both(ntest, 1)
and(ntest,)
numpy.ndarray
are valid.
- Returns
samples – Shape
(ntest, nresponse, nsamples)
. samples[:, :, i] corresponds to i-th realization of the trained emulator attesting_input
, i=1, …,nsamples
.- Return type
numpy array