Bayesian Dynamic Factor Analysis

Introduction

The primary model implemented in sppcax is the Bayesian Dynamic Factor Analysis (DFA) model, a Linear Gaussian State Space Model (LGSSM) with hierarchical priors for automatic sparsification. The model places Automatic Relevance Determination (ARD) priors on both the emission matrix \(\mathbf{H}\) and the dynamics matrix \(\mathbf{F}\), enabling data-driven discovery of sparse latent representations and temporal dependencies.

Factor Analysis (FA) and Probabilistic PCA (PPCA) are recovered as special cases of DFA when there are no temporal dynamics — see Factor Analysis and PCA as Special Cases below and Factor Analysis and PCA for detailed update equations.

Model Definition

The Dynamic Factor Analysis model is defined by:

State equation:

(1)\[\mathbf{z}_t = \mathbf{F}\mathbf{z}_{t-1} + \mathbf{B}\mathbf{u}_t + \mathbf{b} + \boldsymbol{\eta}_t, \quad \boldsymbol{\eta}_t \sim \mathcal{N}(\mathbf{0}, \mathbf{Q})\]

Observation equation:

(2)\[\mathbf{x}_t = \mathbf{H}\mathbf{z}_t + \mathbf{D}\mathbf{u}_t + \mathbf{d} + \boldsymbol{\epsilon}_t, \quad \boldsymbol{\epsilon}_t \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi}^{-1})\]

where:

\(\mathbf{z}_t \in \mathbb{R}^K\) is the latent state
\(\mathbf{x}_t \in \mathbb{R}^D\) is the observation
\(\mathbf{u}_t \in \mathbb{R}^U\) is an optional control input
\(\mathbf{F} \in \mathbb{R}^{K \times K}\) is the state dynamics matrix
\(\mathbf{H} \in \mathbb{R}^{D \times K}\) is the loading (emission) matrix
\(\mathbf{B} \in \mathbb{R}^{K \times U}\) and \(\mathbf{D} \in \mathbb{R}^{D \times U}\) are input weight matrices
\(\mathbf{b} \in \mathbb{R}^K\) and \(\mathbf{d} \in \mathbb{R}^D\) are bias terms
\(\boldsymbol{\Psi}\) is the observation noise precision (diagonal)
\(\mathbf{Q}\) is the state noise covariance (fixed to \(\mathbf{I}\) by default)

Note

The bias terms \(\mathbf{b}\) and \(\mathbf{d}\) are absorbed into the augmented weight matrices by appending a constant input of 1. The augmented emission matrix is \(\tilde{\mathbf{H}} = [\mathbf{H}, \mathbf{D}, \mathbf{d}]\) and the augmented dynamics matrix is \(\tilde{\mathbf{F}} = [\mathbf{F}, \mathbf{B}, \mathbf{b}]\). The biases are learned jointly with the weight matrices under their respective priors, not as separate parameters.

Initial state:

\[\mathbf{z}_0 \sim \mathcal{N}(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)\]

Prior Distributions

Emission Matrix and Noise

The emission parameters \((\mathbf{H}, \boldsymbol{\Psi})\) have a Multivariate Normal-Inverse Gamma (MVNIG) prior. For each row \(d\) of the augmented emission matrix \(\tilde{\mathbf{h}}_d = [\mathbf{h}_d, \mathbf{d}^u_d, d_d]\) (loadings, input weights, bias):

\[p(\tilde{\mathbf{h}}_d, \psi_d) = \mathcal{N}(\tilde{\mathbf{h}}_d \mid \mathbf{0}, \psi_d^{-1} \tilde{\boldsymbol{\Sigma}}_d^0) \, \text{Gamma}(\psi_d \mid \alpha_0^\psi, \beta_0^\psi)\]

See Factor Analysis and PCA for the detailed static-case prior specification.

Noise precision has two variants:

Diagonal noise (FA): \(\boldsymbol{\Psi} = \text{diag}(\psi_1, \ldots, \psi_D)\) with independent \(\psi_d \sim \text{Gamma}(\alpha_0^\psi, \beta_0^\psi)\)
Isotropic noise (PCA): \(\boldsymbol{\Psi} = \psi \mathbf{I}\) with a shared \(\psi \sim \text{Gamma}(\alpha_0^\psi, \beta_0^\psi)\)

Dynamics Matrix

The dynamics parameters \((\mathbf{F}, \mathbf{Q})\) have a Multivariate Normal (MVN) prior with \(\mathbf{Q} = \mathbf{I}\) fixed. For each row \(k\) of the augmented dynamics matrix \(\tilde{\mathbf{f}}_k = [\mathbf{f}_k, \mathbf{b}^u_k, b_k]\) (dynamics weights, input weights, bias):

\[p(\tilde{\mathbf{f}}_k) = \mathcal{N}(\tilde{\mathbf{f}}_k \mid \mathbf{0}, \tilde{\boldsymbol{\Sigma}}_k^{0,F})\]

ARD Priors

Automatic Relevance Determination (ARD) places column-wise Gamma priors on both the emission and dynamics matrices, encouraging removal of unnecessary latent dimensions:

\[\begin{split}p(\tau_k^H) &= \text{Gamma}(\tau_k^H \mid \alpha_0^\tau, \beta_0^\tau) \\ p(\tau_k^F) &= \text{Gamma}(\tau_k^F \mid \alpha_0^\tau, \beta_0^\tau)\end{split}\]

with defaults \(\alpha_0^\tau = \beta_0^\tau = 0.5\). The expected value \(\mathbb{E}[\tau_k]\) is incorporated into the prior precision of the \(k\)-th column of \(\mathbf{H}\) (or \(\mathbf{F}\)), such that large \(\tau_k\) shrinks the entire column towards zero.

For the emission matrix, the ARD-augmented prior precision for column \(k\) becomes:

\[\tilde{\boldsymbol{\Sigma}}_d^{0,-1} \leftarrow \tilde{\boldsymbol{\Sigma}}_d^{0,-1} + \text{diag}(\mathbb{E}[\tau_1^H], \ldots, \mathbb{E}[\tau_K^H], 0, \ldots)\]

and analogously for the dynamics matrix with \(\tau_k^F\).

Initial State

The initial state distribution has a Normal-Inverse Wishart (NIW) prior:

\[p(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0) = \text{NIW}(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0 \mid \mathbf{m}_0, \kappa_0, \nu_0, \mathbf{S}_0)\]

Variational Inference

We use variational Bayes EM (VB-EM) to approximate the posterior. The factorized approximate posterior is:

\[p(\mathbf{Z}, \mathbf{H}, \boldsymbol{\Psi}, \mathbf{F}, \boldsymbol{\tau}^H, \boldsymbol{\tau}^F \mid \mathbf{X}) \approx q(\mathbf{Z})\,q(\mathbf{H}, \boldsymbol{\Psi})\,q(\mathbf{F})\,q(\boldsymbol{\tau}^H)\,q(\boldsymbol{\tau}^F)\]

VBE-Step

The variational E-step updates \(q(\mathbf{Z})\) via Kalman smoothing using the expected parameters from the other variational factors. In the VB setting, the Kalman filter incorporates correction terms that account for the posterior uncertainty in model parameters \(\mathbf{H}\) and \(\mathbf{F}\).

Specifically, the predicted state covariance is corrected as:

\[\tilde{\boldsymbol{\Sigma}}_{t|t-1} = (\mathbf{I} + \boldsymbol{\Sigma}_{t|t-1} \mathbf{C}^{xx})^{-1} \boldsymbol{\Sigma}_{t|t-1}\]

where \(\mathbf{C}^{xx} = \sum_d \text{correction from } q(\mathbf{H}, \boldsymbol{\Psi})\) captures the parameter uncertainty. See Factor Analysis and PCA for the detailed static-case update equations.

VBM-Step

The variational M-step updates each factor in turn:

\(q(\mathbf{H}, \boldsymbol{\Psi})\): MVNIG posterior update using emission sufficient statistics
\(q(\mathbf{F})\): MVN posterior update using dynamics sufficient statistics \(\mathbf{S}_{zz} = \sum_t \langle \mathbf{z}_{t-1} \mathbf{z}_{t-1}^\top \rangle\) and \(\mathbf{S}_{z'z} = \sum_t \langle \mathbf{z}_t \mathbf{z}_{t-1}^\top \rangle\)
\(q(\boldsymbol{\tau}^H)\): Gamma posterior update with natural parameter increments

\[\begin{split}\Delta \eta_1^{\tau_k} &= \frac{1}{2} \sum_{d: \lambda_{dk}=1} 1 \\ \Delta \eta_2^{\tau_k} &= -\frac{1}{2} \sum_d \mathbb{E}_q[\psi_d] \left(\sigma_{dk}^2 + \mu_{dk}^2\right)\end{split}\]
\(q(\boldsymbol{\tau}^F)\): analogous Gamma update using dynamics posterior statistics

ELBO

The Evidence Lower Bound for the full DFA model is:

\[\begin{split}\mathcal{L} &= \mathbb{E}_q[\log p(\mathbf{X} \mid \mathbf{Z}, \mathbf{H}, \boldsymbol{\Psi})] + \mathbb{E}_q[\log p(\mathbf{Z} \mid \mathbf{F})] \\ &- \text{KL}(q(\mathbf{H}, \boldsymbol{\Psi}) \| p(\mathbf{H}, \boldsymbol{\Psi} \mid \boldsymbol{\tau}^H)) - \text{KL}(q(\mathbf{F}) \| p(\mathbf{F} \mid \boldsymbol{\tau}^F)) \\ &- \text{KL}(q(\boldsymbol{\tau}^H) \| p(\boldsymbol{\tau}^H)) - \text{KL}(q(\boldsymbol{\tau}^F) \| p(\boldsymbol{\tau}^F)) \\ &- \text{KL}(q(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0) \| p(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0))\end{split}\]

The expected log-likelihood terms include VB correction terms from the parameter uncertainty.

Factor Analysis and PCA as Special Cases

Factor Analysis (FA) and Probabilistic PCA are obtained from DFA by removing the temporal dynamics:

Set \(\mathbf{F} = \mathbf{0}\), \(\mathbf{Q} = \mathbf{I}\), \(\mathbf{b} = \mathbf{0}\)
Each latent variable \(\mathbf{z}_n \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\) independently
Data is reshaped from \((\text{N}, \text{D})\) to \((\text{N}, 1, \text{D})\) — each observation becomes an independent single-timestep sequence, so the Kalman smoother reduces to a single filter step

The generative model simplifies to:

\[\mathbf{x}_n = \tilde{\mathbf{H}}\tilde{\mathbf{z}}_n + \boldsymbol{\epsilon}_n, \quad \boldsymbol{\epsilon}_n \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi}^{-1})\]

where \(\tilde{\mathbf{z}}_n = [\mathbf{z}_n^\top, 1]^\top\) and \(\tilde{\mathbf{H}} = [\mathbf{H}, \mathbf{d}]\) is the augmented loading matrix with the bias absorbed as its last column.

PCA additionally constrains the noise to be isotropic: \(\boldsymbol{\Psi} = \psi \mathbf{I}\).

Note

In sppcax, the initial distribution \(\mathbf{z}_0 \sim \mathcal{N}(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)\) is not updated by default for FA and PCA. The prior is set so that \(\mathbf{z}_n \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\) independently for each observation. This can be changed by providing a custom initial_prior and setting the initial distribution to be trainable.

For the detailed VB-EM update equations specific to FA, see Factor Analysis and PCA.

Training Pipeline

sppcax provides three inference methods — EM, VBEM, and Blocked Gibbs Sampling — all sharing the same M-step pipeline. Each M-step proceeds as:

Compute posteriors: Apply ARD to priors, then update posteriors from sufficient statistics
PX-VB rotation (if enabled): Find rotation \(\mathbf{R}\) that minimises the expected negative log-prior, and remap posteriors (see Parameter Expansion for Dynamic Factor Analysis)
BMR pruning (if enabled, after a burn-in period): Prune loading matrix elements via Bayesian Model Reduction (see Bayesian Model Reduction)
KL divergence: Compute KL terms for ELBO
ARD updates: Update \(q(\boldsymbol{\tau}^H)\) and \(q(\boldsymbol{\tau}^F)\) from posteriors
Extract parameters: Mode (EM), moments (VB-EM), or samples (Gibbs) from posteriors

The methods differ only in the E-step: EM uses a standard Kalman smoother, VBEM uses a VB-corrected Kalman smoother that accounts for parameter uncertainty, and Gibbs sampling uses forward filtering backward sampling (FFBS). See Inference Methods: EM, VBEM, and Gibbs for details.

For practical examples comparing EM, VB-EM, PX-VB, and BMR settings, see the Examples section.

References

Luttinen, J., Raiko, T., & Ilin, A. (2014). Linear state-space model with time-varying dynamics. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2014), LNCS 8725, pp. 338-353.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the Fifteenth conference on Uncertainty in Artificial Intelligence.
Zhao, J. H., and Philip, L. H. (2009). A note on variational Bayesian factor analysis. Neural Networks, 22(7), 988-997.