Inference Methods: EM, VBEM, and Gibbs
Introduction
sppcax implements three inference methods for the Dynamic Factor Analysis model
(see Bayesian Dynamic Factor Analysis): Expectation-Maximization (EM), Variational Bayes EM
(VBEM), and Blocked Gibbs Sampling. All three methods share the same M-step — full
Bayesian posterior updates via conjugate computations — and differ only in how the E-step
is computed and how parameters are extracted from the posteriors.
Method |
E-step |
Parameter extraction |
Code |
|---|---|---|---|
EM |
Standard Kalman smoother |
Mode of posterior |
|
VBEM |
VB-corrected Kalman smoother |
Moments of posterior |
|
Gibbs |
Forward filtering, backward sampling |
Sample from posterior |
|
EM (Maximum A Posteriori)
The EM algorithm finds parameters that maximize the marginal posterior:
E-Step
The E-step uses a standard Kalman smoother with point-estimate parameters to compute the posterior over latent states \(q(\mathbf{Z})\). No parameter uncertainty is propagated — the filter treats current parameter values as exact.
The smoother produces:
from which the sufficient statistics for the M-step are computed:
M-Step
The M-step computes the full Bayesian posterior from the sufficient statistics, then extracts the mode as the point estimate for the next iteration. The mode of the MVNIG emission posterior and the MVN dynamics posterior provides the MAP parameter values.
VBEM (Variational Bayes EM)
VBEM approximates the full posterior over both latent states and model parameters using a factorized variational distribution. The key difference from EM is that the E-step accounts for parameter uncertainty.
E-Step
The E-step uses a VB-corrected Kalman filter/smoother that incorporates correction terms from the posterior uncertainty in \(\mathbf{H}\) and \(\mathbf{F}\).
Specifically, the predicted state covariance is corrected as:
where the correction matrix \(\mathbf{C}^{xx}\) captures the expected second moments of model parameters beyond their means:
This correction inflates the state uncertainty to account for the fact that the model parameters are not known exactly. Without it, the E-step would underestimate the posterior variance of the latent states.
The corrected predicted mean is:
and a log-likelihood correction term is accumulated for the ELBO.
M-Step
The M-step computes the full Bayesian posterior from the sufficient statistics (identical conjugate updates as EM), then extracts the moments — the posterior mean and covariance. These moments are used both as the parameters for the next E-step and to compute the correction matrices \(\mathbf{C}^{xx}\).
For detailed static-case update equations, see Factor Analysis and PCA.
Blocked Gibbs Sampling
Gibbs sampling provides an alternative to variational inference by drawing samples from the exact posterior distribution. In contrast to VB, which provides a locally optimal deterministic approximation, Gibbs sampling generates a Markov chain whose stationary distribution is the true posterior. The blocked variant groups correlated variables and samples them jointly, reducing autocorrelation and improving mixing.
E-Step: Forward Filtering Backward Sampling (FFBS)
Instead of computing posterior expectations, the E-step samples latent states from their conditional distribution given the current parameter samples.
Forward pass (Kalman filter):
using standard Kalman filter recursions with the current parameter samples.
Backward pass (sampling):
where:
The sufficient statistics are then computed directly from the sampled states (not from expectations), e.g., \(\mathbf{S}_{zz} = \sum_t \mathbf{z}_{t-1} \mathbf{z}_{t-1}^\top\).
M-Step: Sampling from Conditional Posteriors
The M-step draws samples from the full conditional posteriors of each parameter block.
Sampling \((\mathbf{h}_d, \psi_d)\) (row-wise):
For each row \(d\) of the loading matrix, the conditional distribution of \((\mathbf{h}_d, \psi_d)\) is a Normal-Gamma:
where:
Sampling proceeds by first drawing \(\psi_d \sim \text{Gamma}(\alpha_d, \beta_d)\), then \(\mathbf{h}_d \sim \mathcal{N}(\boldsymbol{\mu}_d, \psi_d^{-1} \boldsymbol{\Sigma}_d)\).
Sampling \(\mathbf{F}\):
With the MVN prior used in sppcax (where \(\mathbf{Q} = \mathbf{I}\) is fixed):
where:
Sampling \(\boldsymbol{\tau}\) (ARD):
The column-wise ARD precisions have Gamma conditionals:
Sampling \(\boldsymbol{\Lambda}\) (sparsity structure):
When using Bayesian Model Reduction for sparsity (see Bayesian Model Reduction), the sparsity indicators \(\lambda_{dk} \in \{0, 1\}\) are sampled from Bernoulli conditionals:
where \(\Delta F_{dk}\) is the change in variational free energy when pruning element \((d, k)\), and \(\sigma(\cdot)\) is the logistic function.
Sampling \(\pi\) (sparsity level):
The inclusion probability \(\pi\) has a Beta conditional:
Algorithm Summaries
Algorithm 1: EM for DFA
Algorithm 2: VBEM for DFA
Algorithm 3: Blocked Gibbs Sampler for DFA
See Also
Bayesian Dynamic Factor Analysis — model definition and priors
Factor Analysis and PCA — static-case VB-EM update equations
Parameter Expansion for Dynamic Factor Analysis — PX-VB rotation details
Bayesian Model Reduction — BMR pruning
References
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Luttinen, J., Raiko, T., & Ilin, A. (2014). Linear state-space model with time-varying dynamics. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2014), LNCS 8725, pp. 338-353.