Assuming that test.scp holds a list of the coded test files, then each test file will be recognised and its transcription output to an MLF called recout.mlf by executing the following
HVite -H hmm15/macros -H hmm15/hmmdefs -S test.scp \
-l '*' -i recout.mlf -w wdnet \
-p 0.0 -s 5.0 dict tiedlist
The options -p and -s set the word insertion penalty
and the grammar scale factor,
respectively. The word insertion penalty
is a fixed value added to each token when it transits from the end of one word
to the start of the next. The grammar scale factor is the amount by which
the language model probability is scaled before being
added to each token as it transits from the end of one word
to the start of the next. These parameters can have a significant effect
on recognition performance and hence, some tuning on development test data
is well worthwhile.
The dictionary contains monophone transcriptions whereas the supplied HMM list contains word internal triphones. HVITE will make the necessary conversions when loading the word network wdnet. However, if the HMM list contained both monophones and context-dependent phones then HVITE would become confused. The required form of word-internal network expansion can be forced by setting the configuration variable FORCECXTEXP to true and ALLOWXWRDEXP to false (see chapter 12 for details).
Assuming that the MLF testref.mlf contains word level transcriptions for each test file3.6, the actual performance can be determined by running HRESULTS as follows
HResults -I testref.mlf tiedlist recout.mlf
the result would be a print-out of the form
====================== HTK Results Analysis ==============
Date: Sun Oct 22 16:14:45 1995
Ref : testrefs.mlf
Rec : recout.mlf
------------------------ Overall Results -----------------
SENT: %Correct=98.50 [H=197, S=3, N=200]
WORD: %Corr=99.77, Acc=99.65 [H=853, D=1, S=1, I=1, N=855]
==========================================================
The line starting with SENT: indicates that of the 200 test utterances,
197 (98.50%) were correctly recognised. The following line starting with WORD:
gives the word level statistics and indicates that of the 855 words in total,
853 (99.77%) were recognised correctly. There was 1 deletion error (D),
1 substitution
error (S) and 1 insertion error (I). The accuracy figure (Acc)
of 99.65% is lower than the percentage correct (Cor) because it takes
account of the insertion errors which the latter ignores.