Now that word networks and dictionaries have been explained,
the conversion of word level networks
to model-based recognition networks will be described. Referring
again to Fig
, this expansion
is performed automatically by the module HNET. By default,
HNET attempts to infer the required expansion from the
contents of the dictionary and the associated list of HMMs.
However, 5 configurations parameters are supplied to apply
more precise control where required:
ALLOWCXTEXP,
ALLOWXWRDEXP,
FORCECXTEXP,
FORCELEFTBI and
FORCERIGHTBI.
The expansion proceeds in four stages.
The determination of the network type can be modified by using the configuration parameters mentioned earlier. By default ALLOWCXTEXP is set true. If ALLOWCXTEXP is set false, then no expansion of phone names is performed and each phone corresponds to the model of the same name. The default value of ALLOWXWRDEXP is false thus preventing context expansion across word boundaries. This also limits the expansion of the phone labels in the dictionary to word internal contexts only. If FORCECXTEXP is set true, then context expansion will be performed. For example, if the HMM set contained all monophones, all biphones and all triphones, then given a monophone dictionary, the default behaviour of HNET would be to generate a monophone recognition network since the dictionary would be closed. However, if FORCECXTEXP is set true and ALLOWXWRDEXP is set false then word internal context expansion will be performed. If FORCECXTEXP is set true and ALLOWXWRDEXP is set true then full cross-word context expansion will be performed.
sil aa r sp y uw sp sil
would be expanded as
sil sil-aa+r aa-r+y sp r-y+uw y-uw+sil sp sil
assuming that sil is context-independent and sp is
context-free.
For word-internal systems,
the context expansion can be further controlled via the configuration variable
CFWORDBOUNDARY. When set true (default setting) context-free phones
will be treated as word boundaries so
aa r sp y uw sp
would be expanded to
aa+r aa-r sp y+uw y-uw sp
Setting CFWORDBOUNDARY false would produce
aa+r aa-r+y sp r-y+uw y-uw sp
Having described the expansion process in some detail, some simple
examples will help clarify the process. All of these are based
on the Bit-But word network illustrated in Fig.
.
Firstly, assume that the dictionary contains simple monophone
pronunciations, that is
bit b i t
but b u t
start sil
end sil
and the HMM set consists of just monophones
b i t u sil
In this case, HNET will find a closed dictionary. There will
be no expansion and it will directly generate the network
shown in Fig Similarly, if the dictionary contained word-internal triphone pronunciations such as
bit b+i b-i+t i-t
but b+u b-u+t u-t
start sil
end sil
and the HMM set contains all the required models
b+i b-i+t i-t b+u b-u+t u-t sil
then again HNET will find a closed dictionary
and the network shown in Fig.
If however the dictionary contained just the simple monophone pronunciations as in the first case above, but the HMM set contained just triphones, that is
sil-b+i t-b+i b-i+t i-t+sil i-t+b
sil-b+u t-b+u b-u+t u-t+sil u-t+b sil
then HNET would perform full cross-word expansion and
generate the network shown in Fig.
Now suppose that still using the simple monophone pronunciations,
the HMM set contained all monophones, biphones and triphones. In this
case, the default would be to generate the monophone network of
Fig
. If FORCECXTEXP is true but
ALLOWXWRDEXP is set false then the word-internal
network
of Fig.
would be generated. Finally, if both
FORCECXTEXP and
ALLOWXWRDEXP are set true then the cross-word network
of Fig.
would be generated.