HDECODE is a decoder specifically written for large vocabulary speech recognition. In contrast to the other tools in HTK there are a number of limitations to its use. See the warnings above.
Similar to HVITE, the best transcription hypothesis will be generated in the Master Label File (MLF) format. Optionally, multiple hypotheses can also be generated as a word lattice in the form of the HTK Standard Lattice File (SLF) format. In the HDECODE tutorial section, a range of different options of using HDECODE and detailed examples are given.
There are two main modes that HDECODE can be run in. The first is
full decoding where an -gram language model is used for recognition.
The current version of HDECODE supports using word based uni-gram,
bi-gram or tri-gram language models. Higher order
-gram models, or if the
size of the language model is very large, should be applied by rescoring
lattices generated using simpler language models. See the HDECODE
tutorial section and HLRESCORE reference page for example of lattice
expansion by applying additional language models. The second mode that
HDECODE can be run in is lattice rescoring. Here, for example,
a different acoustic model can be used to rescore lattices.
The acoustic and language model scores can be adjusted using the -a and -s options respectively. In the case where the supplied dictionary contains pronunciation probability information, the corresponding scale factor can be adjusted using the -r option. The -q option can be used to control the type of information to be included in the generated lattices.
HDECODE, when compiled with the MODALIGN setting, can also be used to align the HMM models to a given word level lattice (also known as model marking the lattice). When using the default Makefile supplied with HDECODE, this binary will be made and stored in HDECODE.MOD.
When using HDECODE, the run-time can be adjusted by changing the main and relative token pruning beam widths (see the -t option), word end beam width (see the -v option), the maximum model pruning (see the -u option). The number of tokens used per state (see the -n option) can also significantly affect both the decoding time and the size of lattices generated.
Decoding with adapted acoustic models is supported by HDECODE. The use of an adaptation transformation is enabled using the -m option. The path, name and extension of the transformation matrices are specified using the -J and the file names are derived from the name of the speech file using a mask (see the -h option). Incremental adaptation and transform estimation, are not currently supported by HDECODE.
HDECODE also allows probability calculation to be carried out in blocks at the same time. The block size (in frames) can be specified using the -k option. However, when CMLLR adaptation is used, probabilities have to be calculated one frame at a time (i.e. using -k 1)17.5.
HDECODE performs recognition by expanding a phone model network with language model and pronunciation model information dynamically applied. The lattices generated are word lattices, though generated using triphone acoustic models. This is similar to a projection operation of a phone level finite state network on to word level, but identical word paths that correspond to different phone alignments are kept distinct. Note that these duplicated word paths are not permitted when using either HDECODE or HVITE for acoustic model lattice rescoring or alignment. Input word lattices are expected to be deterministic in both cases. The impact of using non-deterministic lattices for the two HTK decoders are different in nature due to internal design differences, but in both cases the merging step to remove the duplicates is very important prior to lattice rescoring or alignment. See HDECODE tutorial page and HLRESCORE reference page for information on how to produce deterministic word lattices.