The HMM Definition Language

To conclude this chapter, this section presents a formal description of the HMM definition language used by HTK. Syntax is described using an extended BNF notation in which alternatives are separated by a vertical bar $\vert$ , parentheses () denote factoring, brackets [ ] denote options, and braces {} denote zero or more repetitions.

All keywords are enclosed in angle brackets^7.7 and the case of the keyword name is not significant. White space is not significant except within double-quoted strings.

The top level structure of a HMM definition is shown by the following rule.


		  hmmdef = 		 [ h macro ] 

BeginHMM 


NumStates short 


state { state } 


transP 


EndHMM

A HMM definition consists of an optional set of global options followed by the

NumStates

keyword whose following argument specifies the number of states in the model inclusive of the non-emitting entry and exit states^7.8. The information for each state is then given in turn, followed by the parameters of the transition matrix and the model duration parameters, if any. The name of the HMM is given by the $\sim$ h macro. If the HMM is the only definition within a file, the $\sim$ h macro name can be omitted and the HMM name is assumed to be the same as the file name.

The global options are common to all HMMs. They can be given separately using a $\sim$ o option macro


		 optmacro = 		 o globalOpts

or they can be included in one or more HMM definitions. Global options may be repeated but no definition can change a previous definition. All global options must be defined before any other macro definition is processed. In practice this means that any HMM system which uses parameter tying must have a $\sim$ o option macro at the head of the first macro file processed.

The full set of global options is given below. Every HMM set must define the vector size (via VecSize), the stream widths (via StreamInfo) and the observation parameter kind. However, if only the stream widths are given, then the vector size will be inferred. If only the vector size is given, then a single stream of identical width will be assumed. All other options default to null.


		 globalOpts = 		 option { option } 


option = 		 HmmSetId string  

StreamInfo short { short }  

VecSize short  

ProjSize short  

InputXform inputXform  

ParentXform a macro  


covkind  


durkind  


parmkind

The HmmSetId option allows the user to give the MMF an identifier. This is used as a sanity check to make sure that a TMF can be safely applied to this MMF. The arguments to the StreamInfo option are the number of streams (default 1) and then for each stream, the width of that stream. The VecSize option gives the total number of elements in each input vector. ProjSize is the number of ``nuisance'' dimensions removed using, for example, an HLDA transform. The ParentXForm allows the semi-tied macro, if any, associated with the model-set to be specified. If both

VecSize

and

StreamInfo

are included then the sum of all the stream widths must equal the input vector size.

The covkind defines the kind of the covariance matrix


		  covkind=		 DiagC  InvDiagC  FullC  

LLTC  XformC

where InvDiagC is used internally. LLTC and XformC are not used in HTK Version 3.4. Setting the covariance kind as a global option forces all components to have this kind. In particular, it prevents mixing full and diagonal covariances within a HMM set.

The durkind denotes the type of duration model used according to the following rules


		  durkind=		 nullD  poissonD  gammaD  genD

For anything other than nullD, a duration vector must be supplied for the model or each state as described below. Note that no current HTK tool can estimate or use such duration vectors.

The parameter kind is any legal parameter kind including qualified forms (see section 5.1)


		  parmkind=		 basekind{_D_A_T_E_N_Z_O_V_C_K} 


basekind=		 discretelpclpcepstramfcc  fbank  

melspec lprefclpdelcep  user

where the syntax rule for parmkind is non-standard in that no spaces are allowed between the base kind and any subsequent qualifiers. As noted in chapter 5, lpdelcep is provided only for compatibility with earlier versions of HTK and its further use should be avoided.

Each state of each HMM must have its own section defining the parameters associated with that state


		 state=		  State: Exp  short stateinfo

where the short following State: Exp is the state number. State information can be defined in any order. The syntax is as follows


		   stateinfo = 		 s macro  
[ weights ] stream { stream } [ duration ] 


macro     = 		 string

A stateinfo definition consists of an optional specification of the number of mixture components, an optional set of stream weights, followed by a block of information for each stream, optionally terminated with a duration vector. Alternatively, $\sim$ s macro can be written where macro is the name of a previously defined macro.

The optional mixes in a stateinfo definition specify the number of mixture components (or discrete codebook size) for each stream of that state


		   mixes = 		  NumMixes short {short}

where there should be one short for each stream. If this specification is omitted, it is assumed that all streams have just one mixture component.

The optional weights in a stateinfo definition define a set of exponent weights for each independent data stream. The syntax is


		   weights = 		 w macro  SWeights short vector 


vector  = 		 float { float }

where the short gives the number

of weights (which should match the value given in the

StreamInfo

option) and the vector contains the

stream weights $\gamma_s$ (see section 7.1).

The definition of each stream depends on the kind of HMM set. In the normal case, it consists of a sequence of mixture component definitions optionally preceded by the stream number. If the stream number is omitted then it is assumed to be 1. For tied-mixture and discrete HMM sets, special forms are used.


		   stream = 		 [ Stream short ] 

(mixture { mixture }  tmixpdf  discpdf)

The definition of each mixture component consists of a Gaussian pdf optionally preceded by the mixture number and its weight


		   mixture = 		 [ Mixture short float ] mixpdf

If the

Mixture

part is missing then mixture 1 is assumed and the weight defaults to 1.0.

The tmixpdf option is used only for fully tied mixture sets. Since the mixpdf parts are all macros in a tied mixture system and since they are identical for every stream and state, it is only necessary to know the mixture weights. The tmixpdf syntax allows these to be specified in the following compact form


		   tmixpdf = 		 TMix macro weightList 


weightList = 		 repShort { repShort } 


repShort = 		 short [  char ]

where each short is a mixture component weight scaled so that a weight of 1.0 is represented by the integer 32767. The optional asterix followed by a char is used to indicate a repeat count. For example, 0*5 is equivalent to 5 zeroes. The Gaussians which make-up the pool of tied-mixtures are defined using $\sim$ m macros called macro1, macro2, macro3, etc.

Discrete probability HMMs are defined in a similar way


		   discpdf = 		 DProb weightList

The only difference is that the weights in the weightList are scaled log probabilities as defined in section 7.6.

The definition of a Gaussian pdf requires the mean vector to be given and one of the possible forms of covariance


		   mixpdf = 		 m macro  mean cov [ GConst float ] 


mean = 		 u macro  Mean short vector 


cov = 		 var  inv  xform 


var = 		 v macro  Variance short vector 


inv = 		 i macro  

(InvCovar  LLTCovar) short tmatrix 


xform = 		 x macro  Xform short short matrix 


matrix = 		 float {float} 


tmatrix = 		 matrix

In mean and var, the short preceding the vector defines the length of the vector, in inv the short preceding the tmatrix gives the size of this square upper triangular matrix, and in xform the two short's preceding the matrix give the number of rows and columns. The optional GConst^7.9 gives that part of the log probability of a Gaussian that can be precomputed. If it is omitted, then it will be computed during load-in, including it simply saves some time. HTK tools which output HMM definitions always include this field.

In addition to defining the output distributions, a state can have a duration probability distribution defined for it. However, no current HTK tool can estimate or use these.


		   duration = 		 d macro  Duration short vector

Alternatively, as shown by the top level syntax for a hmmdef, duration parameters can be specified for a whole model.

The transition matrix is defined by


		   transP = 		 t macro  TransP short matrix

where the short in this case should be equal to the number of states in the model.

To support HMM adaptation (as described in chapter 9) baseclasses and regression class trees are defined. A baseclass is defined as


		   baseClass = 		 b macro  baseopts classes


baseopts  = 		 MMFIdMask string Parameters baseKind NumClasses int


baseKind  = 		MIXBASE  MEANBASE  COVBASE 


classes   = 		 Class int itemlist  { classes }

where itemlist is a list of mixture components specified using the same conventions as the HHED command described in section 10.3. A regression class tree may also exist for an HMM set. This is defined by


		   regTree = 		 r macro BaseClass baseclasses node 


baseclasses = 		 b macro  baseopts classes 


node    = 		 (Node int int int { int }   TNode int intint { int }) { node }

For the definition of a node (

Node

) in node the first integer is the node number, the second the number of children followed the of children node numbers^7.10. The integers in the definition of a terminal node (

TNode

) define the node number, number of base classes associated with the terminal and the base class-index numbers.

Adaptation transforms are defined using


		  adaptXForm  = 		 a macro adaptOpts XformSet xformset 


adaptOpts   = 		 AdaptKind adaptkind BaseClass baseclasses [ParentXForm parentxform]  


parentxform = 		 a macro  adaptOpts XformSet xformset


adaptKind   = 		 TREE  BASE


xformset    = 		 XFormKind xformKind NumXForms int { linxform }


xformKind   = 		 MLLRMEAN  MLLRCOV  MLLRVAR  CMLLR  SEMIT


linxform    = 		 LinXForm int VecSize int [OFFSET xformbias] 

BlockInfo int int {int} block {block}


xformbias = 		 y macro  Bias short vector 


block       = 		 Block int xform

In the definition of the

BlockInfo

the first integer is the number of blocks, followed the size of each of the clocks. For examples of the adaptation transform format see section 9.1.5.

Finally the input transform is defined by


		  inputXform  = 		 j macro  inhead inmatrix


inhead      = 		 MMFIdMask string parmkind [PreQual]


inmatrix    = 		 LinXform VecSize int BlockInfo int int {int} block {block}


block       = 		 Block int xform

where the short following

VecSize

is the number of dimensions after applyingthe linear transform and must match the vector size of the HMM definition. The first short after

BlockInfo

is the number of block, this is followed by the number of output dimensions from each of the blocks.

Back to HTK site
See front page for HTK Authors