diff --git a/docs/source/inference/bayes_inference.rst b/docs/source/inference/bayes_inference.rst
index 9ccb48886a103e0db8e89042bc7559fe6827afdb..554ec0234f43ff328a90aebc1f7a5f51ff505b77 100644
--- a/docs/source/inference/bayes_inference.rst
+++ b/docs/source/inference/bayes_inference.rst
@@ -1,6 +1,42 @@
 Bayes Inference
 ===============
 
+Bayesian inference is a statistical method for making probabilistic inference about
+unknown parameters based on observed data and prior knowledge. The update from
+prior knowledge to posterior in light of observed data is based on Bayes' theorem:
+
+.. math:: p(\mathbf{x} \mid \mathbf{d}) = \frac{L(\mathbf{x} \mid \mathbf{d}) p(\mathbf{x})}
+    {\int L(\mathbf{x} \mid \mathbf{d}) p(\mathbf{x}) d \mathbf{x}}
+    :label: bayes
+
+where :math:`\mathbf{x}` represents the collection of unknown parameters and
+:math:`\mathbf{d}` represents the collection of observed data. The prior probability
+distribution, :math:`p(\mathbf{x})`, represents the degree of belief in the parameters before
+any data is observed. The likelihood function, :math:`L(\mathbf{x} \mid \mathbf{d})`, represents
+the probability of observing the data given the parameters. The posterior distribution,
+:math:`p(\mathbf{x} \mid \mathbf{d})`, is obtained by multiplying the prior probability
+distribution by the likelihood function and then normalizing the result. Bayesian inference
+allows for incorporating subjective prior beliefs, which can be updated as new data becomes available.
+
+For many real world problems, it is hardly possible to analytically compute the posterior due to
+the complexity of the denominator in equation :eq:`bayes`, namely the nomalizing constant. In this
+module, two numerical approximations are implemented: grid estimation and Metropolis Hastings
+estimation.
+
+In grid estimation, the denominator in equation :eq:`bayes` is approximated by numerical integration
+on a regular grid and the posterior value at each grid point is computed, as shown in equation
+:eq:`grid_estimation`. The number of grid points increases dramatically with the increase of the number
+of unknown parameters. Grid estimation is therefore limited to low-dimensional problems.
+
+.. math:: p(\mathbf{x} \mid \mathbf{d}) \approx \frac{L(\mathbf{x} \mid \mathbf{d}) p(\mathbf{x})}
+    {\sum_{i=1}^N L\left(\mathbf{x}_i \mid \mathbf{d}\right) p\left(\mathbf{x}_i\right) \Delta \mathbf{x}_i}
+    :label: grid_estimation
+
+Metropolis Hastings estimation directly draw samples from the unnormalized posterior distribution, namely
+the numerator of equation :eq:`bayes`. The samples are then used to estimate properties of the posterior
+distribution, like the mean and variance, or to estimate the posterior distribution.
+
+
 GridEstimation Class
 --------------------