Linear Prediction Analysis

In linear prediction (LP) analysis, the vocal tract transfer function is modelled by an all-pole filter with transfer function^5.4

$\displaystyle H(z) = \frac{1}{\sum_{i=0}^p a_i z^{-i}}$

(5.4)

where

is the number of poles and $a_0 \equiv 1$ . The filter coefficients $\{a_i \}$ are chosen to minimise the mean square filter prediction error summed over the analysis window. The HTK module HSIGP uses the autocorrelation method to perform this optimisation as follows.

Given a window of speech samples $\{s_n, n=1,N \}$ , the first terms of the autocorrelation sequence are calculated from

$\displaystyle r_i = \sum_{j=1}^{N-i} s_j s_{j+i}$

(5.5)

where

. The filter coefficients are then computed recursively using a set of auxiliary coefficients $\{k_i\}$ which can be interpreted as the reflection coefficients of an equivalent acoustic tube and the prediction error

which is initially equal to

. Let $\{k_j^{(i-1)} \}$ and $\{a_j^{(i-1)} \}$ be the reflection and filter coefficients for a filter of order

, then a filter of order

can be calculated in three steps. Firstly, a new set of reflection coefficients are calculated.

$\displaystyle k_j^{(i)} = k_j^{(i-1)}$

(5.6)

for

and

$\displaystyle k_i^{(i)} = \left\{ r_i + \sum_{j=1}^{i-1} a_j^{(i-1)} r_{i-j} \right\} / E^{(i-1)}$

(5.7)

Secondly, the prediction energy is updated.

$\displaystyle E^{(i)} = (1 - k_i^{(i)} k_i^{(i)} ) E^{(i-1)}$

(5.8)

Finally, new filter coefficients are computed

$\displaystyle a_j^{(i)} = a_j^{(i-1)} - k_i^{(i)} a_{i-j}^{(i-1)}$

(5.9)

for

and

$\displaystyle a_i^{(i)} = - k_i^{(i)}$

(5.10)

This process is repeated from

through to the required filter order

To effect the above transformation, the target parameter kind must be set to either LPC to obtain the LP filter parameters $\{a_i \}$ or LPREFC to obtain the reflection coefficients $\{k_i\}$ . The required filter order must also be set using the configuration parameter LPCORDER. Thus, for example, the following configuration settings would produce a target parameterisation consisting of 12 reflection coefficients per vector.

    TARGETKIND = LPREFC
    LPCORDER = 12

An alternative LPC-based parameterisation is obtained by setting the target kind to LPCEPSTRA to generate linear prediction cepstra. The cepstrum of a signal is computed by taking a Fourier (or similar) transform of the log spectrum. In the case of linear prediction cepstra, the required spectrum is the linear prediction spectrum which can be obtained from the Fourier transform of the filter coefficients. However, it can be shown that the required cepstra can be more efficiently computed using a simple recursion

$\displaystyle c_n = -a_n - \frac{1}{n} \sum_{i=1}^{n-1} (n-i) a_i c_{n-i}$

(5.11)

The number of cepstra generated need not be the same as the number of filter coefficients, hence it is set by a separate configuration parameter called NUMCEPS.

The principal advantage of cepstral coefficients is that they are generally decorrelated and this allows diagonal covariances to be used in the HMMs. However, one minor problem with them is that the higher order cepstra are numerically quite small and this results in a very wide range of variances when going from the low to high cepstral coefficients. HTK does not have a problem with this but for pragmatic reasons such as displaying model parameters, flooring variances, etc., it is convenient to re-scale the cepstral coefficients to have similar magnitudes. This is done by setting the configuration parameter CEPLIFTER to some value to lifter the cepstra according to the following formula

$\displaystyle {c^{\prime}}_n = \left( 1 + \frac{L}{2} sin \frac{\pi n}{L} \right) c_n$

(5.12)

As an example, the following configuration parameters would use a 14'th order linear prediction analysis to generate 12 liftered LP cepstra per target vector

    TARGETKIND = LPCEPSTRA
    LPCORDER = 14
    NUMCEPS = 12
    CEPLIFTER = 22

These are typical of the values needed to generate a good front-end parameterisation for a speech recogniser based on linear prediction.

Finally, note that the conversions supported by HTK are not limited to the case where the source is a waveform. HTK can convert any LP-based parameter into any other LP-based parameter.

Back to HTK site
See front page for HTK Authors