===================================================== Parameter Expansion for Dynamic Factor Analysis ===================================================== Introduction ============ A persistent difficulty with variational Bayesian inference for factor models is slow convergence caused by strong posterior coupling between the loading matrix and the latent variables. This coupling arises from the rotational invariance of the factor model: for any non-singular matrix :math:`\mathbf{R}`, the transformation :math:`\mathbf{H} \to \mathbf{H}\mathbf{R}^{-1}`, :math:`\mathbf{z}_t \to \mathbf{R}\mathbf{z}_t` leaves the likelihood unchanged. As a consequence, sequential VB updates zigzag slowly along ridge-lines of equal likelihood. Parameter expansion (PX) alleviates this problem by introducing auxiliary parameters that decouple the variational factors, enabling faster convergence. The **Parameter Expanded Variational Bayes (PX-VB)** framework [1]_ optimizes auxiliary parameters within the variational framework, and the result is remapped to the original parameterization via a reduction step. This document derives the PX-VB approach for the Dynamic Factor Analysis (DFA) model implemented in ``sppcax``. For the DFA model definition, prior distributions, and ARD priors, see :doc:`dynamic_factor_analysis`. For the static-case VB-EM update equations, see :doc:`factor_analysis`. Parameter Expanded VBEM (PX-VBEM) ================================== The PX-VB framework [1]_ introduces auxiliary parameters :math:`\boldsymbol{\alpha}` that expand the original model :math:`p(\tilde{\mathbf{x}}, D)` into :math:`p_{\boldsymbol{\alpha}}(\mathbf{x}, D)`, where the original model is recovered at :math:`\boldsymbol{\alpha} = \boldsymbol{\alpha}_0`. Each PX-VB iteration consists of three steps: 1. Sequential VB updates with :math:`\boldsymbol{\alpha} = \boldsymbol{\alpha}_0` 2. Minimize :math:`\text{KL}(q(\mathbf{x}) \| p_{\boldsymbol{\alpha}}(\mathbf{x}, D))` over :math:`\boldsymbol{\alpha}` 3. Reduce the expanded model back to the original via reparameterization .. _sec-rotation-expansion: Rotation-Based Parameter Expansion for LGSSM --------------------------------------------- For state-space models, a natural parameter expansion is the rotation of the latent subspace [3]_. For any non-singular :math:`\mathbf{R} \in \mathbb{R}^{K \times K}`: .. math:: :label: rotation-obs \mathbf{x}_t = \mathbf{H}\mathbf{z}_t = (\mathbf{H}\mathbf{R}^{-1})(\mathbf{R}\mathbf{z}_t) The rotation must be compensated in the dynamics: .. math:: :label: rotation-dyn \mathbf{R}\mathbf{z}_t = \mathbf{R}\mathbf{F}\mathbf{R}^{-1}(\mathbf{R}\mathbf{z}_{t-1}) + \text{noise} Thus the expanded model uses transformed variables :math:`\tilde{\mathbf{z}}_t = \mathbf{R}\mathbf{z}_t`, :math:`\tilde{\mathbf{H}} = \mathbf{H}\mathbf{R}^{-1}`, and :math:`\tilde{\mathbf{F}} = \mathbf{R}\mathbf{F}\mathbf{R}^{-1}`. Standard VB-EM Updates ----------------------- The factorized approximate posterior is: .. math:: p(\mathbf{Z}, \mathbf{H}, \boldsymbol{\Psi}, \mathbf{F}, \mathbf{Q}, \boldsymbol{\tau} | \mathbf{X}) \approx q(\mathbf{Z})\,q(\mathbf{H}, \boldsymbol{\Psi})\,q(\mathbf{F}, \mathbf{Q})\,q(\boldsymbol{\tau}) The **VBE-step** updates :math:`q(\mathbf{Z})` via Kalman smoothing using the expected parameters from the other factors. The **VBM-step** updates each factor in turn: - :math:`q(\mathbf{H}, \boldsymbol{\Psi})`: Normal-Gamma (MVNIG) posterior update using sufficient statistics from the E-step (see :doc:`factor_analysis` for the static update equations) - :math:`q(\mathbf{F}, \mathbf{Q})`: MNIW or MVN posterior update using state transition sufficient statistics - :math:`q(\boldsymbol{\tau})`: Gamma posterior update (ARD hyperparameters) PX-VB Rotation Step --------------------- After the standard VB-EM updates, we find the optimal rotation :math:`\mathbf{R}` that minimizes the KL divergence. The rotation is applied to the posterior distributions as follows [3]_: **Rotating** :math:`q(\mathbf{Z})`: .. math:: q(\tilde{\mathbf{Z}}) = q(\mathbf{R}\mathbf{Z}) where the sufficient statistics transform as: .. math:: \langle \tilde{\mathbf{z}}_t \rangle &= \mathbf{R} \langle \mathbf{z}_t \rangle \\ \langle \tilde{\mathbf{z}}_t \tilde{\mathbf{z}}_t^\top \rangle &= \mathbf{R} \langle \mathbf{z}_t \mathbf{z}_t^\top \rangle \mathbf{R}^\top **Rotating** :math:`q(\mathbf{H})`: .. math:: :label: rotate-h \tilde{\boldsymbol{\mu}}_d^H &= \mathbf{R}^{-\top} \boldsymbol{\mu}_d^H \\ \tilde{\boldsymbol{\Sigma}}_d^H &= \mathbf{R}^{-\top} \boldsymbol{\Sigma}_d^H \mathbf{R}^{-1} **Rotating** :math:`q(\mathbf{F})`: .. math:: :label: rotate-f \tilde{\boldsymbol{\mu}}^F &= \mathbf{R} \boldsymbol{\mu}^F \mathbf{R}^{-1} \\ \tilde{\boldsymbol{\Sigma}}^F &\text{ transforms accordingly} Finding the Optimal Rotation ----------------------------- The optimal rotation :math:`\mathbf{R}` is found by minimizing the expected negative log-prior of the rotated model parameters: .. math:: :label: px-loss \mathcal{L}(\mathbf{R}) = \mathbb{E}_q\left[-\ln p(\tilde{\mathbf{H}}, \tilde{\mathbf{F}}, \tilde{\mathbf{z}}_0)\right] This objective decomposes into four terms: .. math:: \mathcal{L}(\mathbf{R}) = \mathcal{L}_{\text{init}} + \mathcal{L}_{\text{emission}} + \mathcal{L}_{\text{dyn-prior}} + \mathcal{L}_{\text{dyn-lik}} where: - :math:`\mathcal{L}_{\text{init}}`: expected negative log-prior of the rotated initial state :math:`\tilde{\mathbf{z}}_0 = \mathbf{R}^{-1}\mathbf{z}_0` - :math:`\mathcal{L}_{\text{emission}}`: expected negative log-prior of the rotated emission matrix. The rotation uses a block-diagonal structure :math:`\mathbf{R}_{\text{block}} = \text{blkdiag}(\mathbf{R}, \mathbf{I})` so that only the first :math:`K` columns of the augmented emission matrix are rotated, preserving the input weights and bias columns: .. math:: \mathcal{L}_{\text{emission}} = \frac{1}{2} \sum_d \text{tr}\left(\boldsymbol{\Lambda}_d \, \mathbf{R}_{\text{block}}^\top \mathbb{E}[\tilde{\mathbf{h}}_d \tilde{\mathbf{h}}_d^\top] \mathbf{R}_{\text{block}}\right) - D \ln|\mathbf{R}| - :math:`\mathcal{L}_{\text{dyn-prior}}`: expected negative log-prior of the rotated dynamics :math:`\tilde{\mathbf{F}} = \mathbf{R}^{-1}\mathbf{F}\mathbf{R}_{\text{block}}` - :math:`\mathcal{L}_{\text{dyn-lik}}`: expected negative log-likelihood of the dynamics residuals under the rotated state noise For the **static case** (FA/PCA), only :math:`\mathcal{L}_{\text{init}}` and :math:`\mathcal{L}_{\text{emission}}` are needed since there are no dynamics. **Numerical optimization:** In ``sppcax``, the rotation :math:`\mathbf{R}` is found by minimizing Eq. :eq:`px-loss` using gradient descent with Anderson acceleration [7]_. Starting from :math:`\mathbf{R} = \mathbf{I}`, the optimizer runs for a fixed number of steps (default: 32, learning rate: 0.001). Anderson acceleration with memory :math:`m=1` is applied after the first gradient step to improve convergence. If the final loss exceeds the initial loss, the optimizer falls back to the identity rotation. Static Case Simplification ---------------------------- For static Factor Analysis (:math:`\mathbf{F} = \mathbf{0}`, :math:`\mathbf{Q} = \mathbf{I}`), the rotation only affects :math:`\mathbf{H}` and :math:`\mathbf{Z}`. The dynamics terms vanish, and the loss simplifies to: .. math:: \mathcal{L}(\mathbf{R}) = \mathcal{L}_{\text{init}} + \mathcal{L}_{\text{emission}} PX-VBEM Algorithm Summary --------------------------- .. rst-class:: algorithm **Algorithm 1: PX-VBEM for Dynamic Factor Analysis** | **Input:** observations :math:`\mathbf{X} = \{\mathbf{x}_1, \ldots, \mathbf{x}_T\}`, prior hyperparameters | **Output:** approximate posterior :math:`q(\mathbf{Z})q(\mathbf{H}, \boldsymbol{\Psi})q(\mathbf{F}, \mathbf{Q})q(\boldsymbol{\tau})` | | 1. Initialize all approximate posteriors | 2. **repeat** until convergence: | a. **VBE-step:** update :math:`q(\mathbf{Z})` via Kalman smoother using expected parameters | b. **VBM-step:** | - Update :math:`q(\mathbf{H}, \boldsymbol{\Psi})` using emission sufficient statistics | - Update :math:`q(\mathbf{F}, \mathbf{Q})` using dynamics sufficient statistics | - Update :math:`q(\boldsymbol{\tau})` using ARD update equations | c. **PX-step (rotation):** | - Minimize :math:`\mathcal{L}(\mathbf{R})` (Eq. :eq:`px-loss`) via gradient descent with Anderson acceleration | - Remap :math:`q(\mathbf{Z}) \to q(\mathbf{R}\mathbf{Z})` (sufficient statistics) | - Remap :math:`q(\mathbf{H}) \to q(\mathbf{H}\mathbf{R}^{-1})` via Eq. :eq:`rotate-h` | - Remap :math:`q(\mathbf{F}) \to q(\mathbf{R}\mathbf{F}\mathbf{R}^{-1})` via Eq. :eq:`rotate-f` | d. Compute ELBO for convergence monitoring | 3. **return** approximate posterior See Also ======== For practical examples comparing EM, VB-EM, and PX-VB: - :doc:`examples/test_px_em_fa` — Parameter expansion for Factor Analysis - :doc:`examples/test_px_em_dfa` — Parameter expansion for Dynamic Factor Analysis References ========== .. [1] Qi, Y., & Jaakkola, T. S. (2006). Parameter expanded variational Bayesian methods. In *Advances in Neural Information Processing Systems 19*, pp. 1097-1104. .. [3] Luttinen, J., Raiko, T., & Ilin, A. (2014). Linear state-space model with time-varying dynamics. In *Machine Learning and Knowledge Discovery in Databases* (ECML PKDD 2014), LNCS 8725, pp. 338-353. .. [7] Luttinen, J., & Ilin, A. (2010). Transformations in variational Bayesian factor analysis to speed up learning. *Neurocomputing*, 73, 1093-1102.