Output Probability Specification

Before the problem of parameter estimation can be discussed in more detail, the form of the output distributions $\{b_{j}({\mbox{\boldmath $o$}}_t)\}$ needs to be made explicit. HTK is designed primarily for modelling continuous parameters using continuous density multivariate output distributions. It can also handle observation sequences consisting of discrete symbols in which case, the output distributions are discrete probabilities. For simplicity, however, the presentation in this chapter will assume that continuous density distributions are being used. The minor differences that the use of discrete probabilities entail are noted in chapter 7 and discussed in more detail in chapter 11.

In common with most other continuous density HMM systems, HTK represents output distributions by Gaussian Mixture Densities. In HTK, however, a further generalisation is made. HTK allows each observation vector at time to be split into a number of independent data streams $o_{st}$ . The formula for computing $b_{j}({\mbox{\boldmath $o$}}_t)$ is then

$\displaystyle b_{j}({\mbox{\boldmath$o$}}_t) = \prod_{s=1}^S \left[ \sum_{m=1}^... ...ox{\boldmath$\mu$}}_{jsm}, {\mbox{\boldmath$\Sigma$}}_{jsm}) \right]^{\gamma_s}$

(1.8)

where

is the number of mixture components in stream

, $c_{jsm}$ is the weight of the

'th component and ${\cal N}(\cdot; {\mbox{\boldmath $\mu$}}, {\mbox{\boldmath $\Sigma$}})$ is a multivariate Gaussian with mean vector ${\mbox{\boldmath $\mu$}}$ and covariance matrix ${\mbox{\boldmath $\Sigma$}}$ , that is

$\displaystyle {\cal N}({\mbox{\boldmath$o$}}; {\mbox{\boldmath$\mu$}}, {\mbox{\... ...{\mbox{\boldmath$\Sigma$}}^{-1}({\mbox{\boldmath$o$}}-{\mbox{\boldmath$\mu$}})}$

(1.9)

where

is the dimensionality of ${\mbox{\boldmath $o$}}$ .

The exponent $\gamma_s$ is a stream weight^1.1. It can be used to give a particular stream more emphasis, however, it can only be set manually. No current HTK training tools can estimate values for it.

Multiple data streams are used to enable separate modelling of multiple information sources. In HTK, the processing of streams is completely general. However, the speech input modules assume that the source data is split into at most 4 streams. Chapter 5 discusses this in more detail but for now it is sufficient to remark that the default streams are the basic parameter vector, first (delta) and second (acceleration) difference coefficients and log energy.

Back to HTK site
See front page for HTK Authors