Variance Transformation Matrix (MLLRVAR, MLLRCOV)

Estimation of the first variance transformation matrices is only available for diagonal covariance Gaussian systems in the current implementation, though full transforms can in theory be estimated. The Gaussian covariance is transformed using9.5,

$\displaystyle \hat{{\mbox{\boldmath$\mu$}}}_{m_r} = {\mbox{\boldmath$\mu$}}_{m_...
...}}_{m_r}^{\scriptstyle\sf T}{\mbox{\boldmath$H$}}_r{\mbox{\boldmath$B$}}_{m_r}
$

where $ {\mbox{\boldmath $H$}}_m$ is the linear transformation to be estimated and $ {\mbox{\boldmath $B$}}_m$ is the inverse of the Choleski factor of $ {\mbox{\boldmath $\Sigma$}}_{m_r}^{-1}$, so

$\displaystyle {\mbox{\boldmath$\Sigma$}}_{m_r}^{-1} = {\mbox{\boldmath$C$}}_{m_r}{\mbox{\boldmath$C$}}_{m_r}^{\scriptstyle\sf T}
$

and

$\displaystyle {\mbox{\boldmath$B$}}_{m_r} = {\mbox{\boldmath$C$}}_{m_r}^{-1}
$

After rewriting the auxiliary function, the transform matrix $ {\mbox{\boldmath $H$}}_m$ is estimated from,

$\displaystyle {\mbox{\boldmath$H$}}_r = \frac{ \sum_{m_r=1}^{M_r}{\mbox{\boldma...
..._{m_r})^{\scriptstyle\sf T}\right]
{\mbox{\boldmath$C$}}_{m_r} } { L_{m_r}(t)}
$

Here, $ {\mbox{\boldmath $H$}}_r$ is forced to be a diagonal transformation by setting the off-diagonal terms to zero, which ensures that $ \hat{{\mbox{\boldmath $\Sigma$}}}_{m_r}$ is also diagonal.

The alternative form of variance adaptation us supported for full, block and diagonal transforms. Substituting the for expressions for variance adaptation

$\displaystyle \hat{{\mbox{\boldmath$\mu$}}}_{m_r} = {\mbox{\boldmath$\mu$}}_{m_...
...{{\mbox{\boldmath$\Sigma$}}}_{m_r}{{\mbox{\boldmath$H$}}}_r^{\scriptstyle\sf T}$     (9.13)

into the auxiliary function, and using the fact that the covariance matrices are diagonal yields
$\displaystyle {\cal Q}({\cal M},{\hat{\cal M}}) = K +
\sum_{r=1}^R
\beta_r\log(...
...}^d{
\left({\bf a}_{rj}{\bf G}^{(j)}_r{\bf a}^{\scriptstyle\sf T}_{rj}
\right)}$      

where
$\displaystyle \beta_r$ $\displaystyle =$ $\displaystyle \sum_{m_r=1}^{M_r}\sum_{t=1}^TL_{m_r}(t)$ (9.14)
$\displaystyle {{\mbox{\boldmath$A$}}}_r$ $\displaystyle =$ $\displaystyle {{\mbox{\boldmath$H$}}}_r^{-1}$ (9.15)

$ {\bf a}_{ri}$ is $ i^{th}$ row of $ {{\mbox{\boldmath $A$}}}_r$, the $ 1\times n$ row vector $ {\bf c}_{ri}$ is the vector of cofactors of $ {{\mbox{\boldmath $A$}}}_r$, $ c_{rij}={\mbox{cof}}({\bf A}_{rij})$, and $ {\bf G}^{(i)}_r$ is defined as
$\displaystyle {\bf G}^{(i)}_r=\sum_{m_r=1}^{M_r}
\frac{1}{\sigma_{m_ri}^{2}}
\s...
...mbox{\boldmath$o$}}(t)-\hat{{\mbox{\boldmath$\mu$}}}_{m_r})^{\scriptstyle\sf T}$     (9.16)

Differentiating the auxiliary function with respect to the transform $ {{\mbox{\boldmath $A$}}}_r$ , and then maximising it with respect to the transformed mean yields the following update
$\displaystyle {\bf a}_{ri} ={\bf c}_{ri}{\bf G}^{(i)-1}_r
\sqrt{\left(\frac{\beta_r}{{\bf c}_{ri}{\bf G}_r^{(i)-1}{\bf c}^{\scriptstyle\sf T}_{ri}}\right)}$     (9.17)

This is an iterative optimisation scheme as the cofactors mean the estimate of row $ i$ is dependent on all the other rows (in that block). For the diagonal transform case it is of course non-iterative and simplifies to the same form as the MLLRVAR transform.


Back to HTK site
See front page for HTK Authors