PPGaSP: GP emulation for multi-output functions

This example shows how to apply Gaussian process emulation to a multi-output function using class PPGaSP.

The multi-output function that we are going to look at is the DIAMOND (diplomatic and military operations in a non-warfighting domain) computer model. It is used as a testbed to illustrate the PP GaSP emulator in the R package RobustGaSP, see Gu et al. [2019] for more detail.

The simulator has \(13\) input parameters and \(5\) outputs. Namely, \(\mathbf{y}=f(\mathbf{x})\) where \(\mathbf{x}=(x_1,\ldots,x_{13})^T\) and \(\mathbf{y}=(y_1,\ldots,y_5)^T\).

The training and testing data are provided in the folder ‘…/tests/data/’. We first load the training and testing data.

import numpy as np
import os

dir_data = os.path.abspath('../../../tests/data/')

humanityX = np.genfromtxt(os.path.join(dir_data, 'humanityX.csv'), delimiter=',')
humanityY = np.genfromtxt(os.path.join(dir_data, 'humanityY.csv'), delimiter=',')
print(f"Number of training data points: ", humanityX.shape[0])
print(f"Input dimension: ", humanityX.shape[1])
print(f"Output dimension: ", humanityY.shape[1])

humanityXt = np.genfromtxt(os.path.join(dir_data, 'humanityXt.csv'), delimiter=',')
humanityYt = np.genfromtxt(os.path.join(dir_data, 'humanityYt.csv'), delimiter=',')
print(f"Number of testing data points: ", humanityXt.shape[0])
Number of training data points:  120
Input dimension:  13
Output dimension:  5
Number of testing data points:  120

Note

You may need to modify dir_data according to where you save them on your local machine.

humanityX and humanitY are the training data, corresponding to design and response respectively. humanityXt are testing input data, at which we are going to make predictions once the emulator is trained. humanityYt are the true outputs at humanityXt, which is then used to validate the performance of the trained emulator.

To build a PP GaSP emulator for the above simulator, first import class PPGaSP by

from psimpy.emulator import PPGaSP

Then, create an instance of PPGaSP. The parameter ndim (dimension of function input x) must be specified. Optional parameters, such as method, kernel_type, etc., can be set up if desired. Here, we leave all the optional parameters to their default values.

emulator = PPGaSP(ndim=humanityX.shape[1])

Next, we train the PP GaSP emulator based on the training data using PPGaSP.train() method.

emulator.train(design=humanityX, response=humanityY)
The upper bounds of the range parameters are 296,974 297,1814 294,7672 295,9345 296,563 295,935 297,3198 296,4444 296,9323 298,2373 298,0619 298,7249 298,7249
The initial values of range parameters are 5,93948 5,943628 5,895344 5,918691 5,93126 5,918701 5,946395 5,928888 5,938645 5,964745 5,961238 5,974498 5,974498
Start of the optimization  1  :
The number of iterations is  44
 The value of the  marginal posterior  function is  -5279,534
 Optimized range parameters are 25,46132 2,891428 5,30812 26,95519 291,1828 48,17274 84,56133 2,874385 39,12121 51,81304 0,5407651 1,728696 1,020605
 Optimized nugget parameter is 0
 Convergence:  TRUE
The initial values of range parameters are 12,37503 12,38367 12,28307 12,33172 12,3579 12,33174 12,38944 12,35296 12,37329 12,42767 12,42037 12,44799 12,44799
Start of the optimization  2  :
The number of iterations is  45
 The value of the  marginal posterior  function is  -5279,534
 Optimized range parameters are 25,46132 2,891428 5,308121 26,95519 291,183 48,17274 84,56133 2,874385 39,12121 51,81304 0,5407651 1,728696 1,020605
 Optimized nugget parameter is 0
 Convergence:  TRUE

With the trained emulator, we can make predictions for any arbitrary set of input points using PPGaSP.predict() method. Here, we make predictions at testing input points humanityXt.

predictions = emulator.predict(humanityXt)

We can validate the performance of the trained emulator based on the true outputs humanityYt.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(5, 1, figsize=(6,15))

for i in range(humanityY.shape[1]):

    ax[i].set_xlabel(f'Actual $y_{i+1}$')
    ax[i].set_ylabel(f'Emulator-predicted $y_{i+1}$')
    ax[i].set_xlim(np.min(humanityYt[:,i]),np.max(humanityYt[:,i]))
    ax[i].set_ylim(np.min(humanityYt[:,i]),np.max(humanityYt[:,i]))

    _ = ax[i].plot([np.min(humanityYt[:,i]),np.max(humanityYt[:,i])], [np.min(humanityYt[:,i]),np.max(humanityYt[:,i])])
    _ = ax[i].errorbar(humanityYt[:,i], predictions[:,i,0], predictions[:,i,3], fmt='.', linestyle='', label='prediction and std')
    _ = ax[i].legend()

plt.tight_layout()
plot ppgasp

We can also draw any number of samples at testing input humanityXt using PPGaSPsample() method.

samples = emulator.sample(humanityXt, nsamples=10)
print("Shape of samples: ", samples.shape)
Shape of samples:  (120, 5, 10)

Total running time of the script: ( 0 minutes 5.333 seconds)

Gallery generated by Sphinx-Gallery