Using the HTK Large Vocabulary Decoder `HDecode`

WARNING: The HTK Large Vocabulary Decoder HDECODE has been specifically written for speech recognition tasks using cross-word triphone models. Known restrictions are:

only works for cross-word triphones;
supports N-gram language models up to tri-grams;
sil and sp models are reserved as silence models and are, by default, automatically added to the end of all pronunciation variants of each word in the recognition dictionary;
sil must be used as the pronunciation for the sentence start and sentence end tokens;
sil and sp models cannot occur in the dictionary, except for the dictionary entry of the sentence start and sentence end tokens;
word lattices generated with HDECODE must be made deterministic using HLRESCORE to remove duplicate paths prior to being used for acoustic model rescoring with HDECODE or HVITE.

The decoder distributed with HTK, HVITE, is only suitable for small and medium vocabulary systems^3.7 and systems using bigrams. For larger vocabulary systems, or those requiring trigram language models to be used directly in the search, HDECODE is available as an extension^3.8 to HTK. HDECODE has been specifically written for large vocabulary speech recognition using cross-word triphone models. Known restrictions are listed above. For detailed usage, see the HDECODE reference page 17.6 for more information. HDECODE will also be used to generate lattices for discriminative training described in the next section.

In this section, examples are given for using HDECODE for large vocabulary speech recognition. Due to the limitations described above, the word-internal tripone systems generated in the previous stages cannot be used with HDECODE. For this section it is assumed that there is a cross-word triphone system in the directory hmm20 along with a model-list in xwrdtiedlist. In contrast to the previous sections both the macros and HMM definitions are stored in the same file hmm20/models. For an example of how to build a cross-word state-clustered triphone system, see the Resource Management (RM) example script step 9, in the RM samples tar-ball.

Note: the grammar scale factors used in this section, and the next section on discriminative training, are consistent with the values used in the previous tutorial sections. However for large vocabulary speech recognition systems grammar scale factors in the range 12-15 are commonly used.

Subsections

Back to HTK site
See front page for HTK Authors

Using the HTK Large Vocabulary Decoder HDecode

Using the HTK Large Vocabulary Decoder `HDecode`