Active Learning
Active learning is a machine learning technique that involves selecting the most informative data points for the purpose of training an emulator. The idea behind active learning is to reduce the amount of training data required for a machine learning model to achieve a certain level of accuracy. This is achieved by iteratively choosing a new data point that is expected to be the most informative.
In this module, the ActiveLearning class is implemented to actively build a
Gaussian process emulator for the natural logarithm of the unnormalized posterior in
Bayesian inference. It is supposed to facilitate efficient parameter calibration of
computationally expensive simulators. For detailed theories, please refer to
Wang and Li [2018], Kandasamy et al. [2017], and Zhao and Kowalski [2022].
ActiveLearning Class
The ActiveLearning class is imported by:
from psimpy.inference.active_learning import ActiveLearning
Methods
- class ActiveLearning(ndim, bounds, data, run_sim_obj, prior, likelihood, lhs_sampler, scalar_gasp, scalar_gasp_trend='constant', indicator='entropy', optimizer=<function brute>, args_prior=None, kwgs_prior=None, args_likelihood=None, kwgs_likelihood=None, args_optimizer=None, kwgs_optimizer=None)[source]
- Contruct a scalar GP emulator for natural logarithm of the product of prior and likelihood (i.e. unnormalized posterior), via active learning. - Parameters
- ndim (int) – Dimension of parameter - x.
- bounds (numpy array) – Upper and lower boundaries of each parameter. Shape - (ndim, 2). bounds[:, 0] corresponds to lower boundaries of each parameter and bounds[:, 1] to upper boundaries of each parameter.
- data (numpy array) – Observed data for parameter calibration. 
- run_sim_obj (instance of class - RunSimulator) – It has an attribute- simulatorand two methods to run- simulator, namely- serial_run()and- parallel_run(). For each simulation,- simulatormust return outputs- yas a numpy array.
- prior (Callable) – Prior probability density function. Call with - prior(x, *args_prior, **kwgs_prior)and return the value of prior probability density at- x.
- likelihood (Callable) – Likelihood function constructed based on - dataand simulation outputs- yevaluated at- x. Call with- likelihood(y, data, *args_likelihood, **kwgs_likelihood)and return the likelihood value at- x.
- lhs_sampler (instance of class - LHS) – Latin hypercube sampler used to draw initial samples of- x. These initial samples are used to run initial simulations and build initial emulator.
- scalar_gasp (instance of class - ScalarGaSP) – An object which sets up the emulator structure. Providing training data, the emulator can be trained and used to make predictions.
- scalar_gasp_trend (str or Callable, optional) – Mean function of - scalar_gaspemulator, which is used to determine the- trendor- testing_trendat given- designor- testing_input. ‘zero’ - trend is set to zero. ‘constant’ - trend is set to a constant. ‘linear’ - trend is linear to design or testing_input. Callable - a function takes design or testing_input as parameter and returns the trend. Default is ‘constant’.
- indicator (str, optional) – Indicator of uncertainty. ‘entropy’ or ‘variance’. Default is ‘entropy’. 
- optimizer (Callable, optional) – A function which finds the input point - xthat minimizes the uncertainty- indicatorat each iteration step. Call with- optimizer(func, *args_optimizer, **kwgs_optimizer). The objective function- funcis defined by the class method- _uncertainty_indicator()which have only one argument- x. The- optimizershould return either the solution array, or a- scipy.optimize.OptimizeResultobject which has the attribute- xdenoting the solution array. By default is set to- scipy.optimize.brute().
- args_prior (list, optional) – Positional arguments for - prior.
- kwgs_prior (dict, optional) – Keyword arguments for - prior.
- args_likelihood (list, optional) – Positional arguments for - likelihood.
- kwgs_likelihood (dict, optional) – Keyword arguments for - likelihood.
- args_optimizer (list, optional) – Positional arguments for - optimizer.
- kwgs_optimizer (dict, optional) – Keyword arguments for - optimizer.
 
 - initial_simulation(n0, prefixes=None, mode='serial', max_workers=None)[source]
- Run - n0initial simulations.- Parameters
- n0 (int) – Number of initial simulation runs. 
- prefixes (list of str, optional) – Consist of - n0strings. Each is used to name corresponding simulation output file(s). If None, ‘sim0’, ‘sim1’, … are used.
- mode (str, optional) – ‘parallel’ or ‘serial’. Run n0 simulations in parallel or in serial. 
- max_workers (int, optional) – Controls the maximum number of tasks running in parallel. Default is the number of CPUs on the host. 
 
- Return type
- tuple[- ndarray,- ndarray]
- Returns
- init_var_samples (numpy array) – Variable input samples for - n0initial simulations. Shape of- (n0, ndim).
- init_sim_outputs (numpy array) – Outputs of - n0intial simulations.- init_sim_outputs.shape[0]is- n0.
 
 
 - iterative_emulation(ninit, init_var_samples, init_sim_outputs, niter, iter_prefixes=None)[source]
- Sequentially pick - niternew input points based on- ninitsimulations.- Parameters
- niter (int) – Number of interative simulaitons. 
- ninit (int) – Number of initial simulations. 
- init_var_samples (numpy array) – Variable input samples for - ninitsimulations. Shape of- (ninit, ndim).
- init_sim_outputs (numpy array) – Outputs of - ninitsimulations.- init_sim_outputs.shape[0]is- ninit.
- iter_prefixes (list of str, optional) – Consist of - niterstrings. Each is used to name corresponding iterative simulation output file(s). If None, ‘iter_sim0’, ‘iter_sim1’, … are used.
 
- Return type
- tuple[- ndarray,- ndarray,- ndarray]
- Returns
- var_samples (numpy array) – Variable input samples of - ninitsimulations and- niteriterative simulations. Shape of- (ninit+niter, ndim).
- sim_outputs (numpy array) – Outputs of - ninitand- nitersimulations.- sim_outputs.shape[0]is \(ninit+niter\).
- ln_pxl_values (numpy array) – Natural logarithm values of the product of prior and likelihood at - ninitand- nitersimulations. Shape of- (ninit+niter,).
 
 - Notes - If a duplicated iteration point is returned by the - optimizer, the iteration will be stopped right away. In that case, the first dimension of returned- var_samples,- sim_outputs,- ln_pxl_valuesis smaller than \(ninit+niter\).