Contents of Summary

This section summarises the various file formats, parameter kinds, qualifiers and configuration parameters used by HTK. Table

lists the audio speech file formats which can be read by the HWAVE module. Table

lists the basic parameter kinds supported by the HPARM module and Fig.

shows the various automatic conversions that can be performed by appropriate choice of source and target parameter kinds. Table

lists the available qualifiers for parameter kinds. The first 6 of these are used to describe the target kind. The source kind may already have some of these, HPARM adds the rest as needed. Note that HPARM can also delete qualifiers when converting from source to target. The final two qualifiers in Table

are only used in external files to indicate compression and an attached checksum. HPARM adds these qualifiers to the target form during output and only in response to setting the configuration parameters SAVECOMPRESSED and SAVEWITHCRC. Adding the _C or _K qualifiers to the target kind simply causes an error. Finally, Tables

and

lists all of the configuration parameters along with their meaning and default values.

Name	Description
`HTK`	The standard HTK file format
`TIMIT`	As used in the original prototype TIMIT CD-ROM
`NIST`	The standard SPHERE format used by the US NIST
`SCRIBE`	Subset of the European SAM standard used in the SCRIBE CD-ROM
`SDES1`	The Sound Designer 1 format defined by Digidesign Inc.
`AIFF`	Audio interchange file format
`SUNAU8`	Subset of 8bit ".au" and ".snd" formats used by Sun and NeXT
`OGI`	Format used by Oregan Graduate Institute similar to TIMIT
`WAV`	Microsoft WAVE files used on PCs
`ESIG`	Entropic Esignal file format
`AUDIO`	Pseudo format to indicate direct audio input
`ALIEN`	Pseudo format to indicate unsupported file, the alien header size must be set via the environment variable `HDSIZE`
`NOHEAD`	As for the ALIEN format but header size is zero

Table. 5..1 Supported File Formats

Kind	Meaning
`WAVEFORM`	scalar samples (usually raw speech data)
`LPC`	linear prediction coefficients
`LPREFC`	linear prediction reflection coefficients
`LPCEPSTRA`	LP derived cepstral coefficients
`LPDELCEP`	LP cepstra + delta coef (obsolete)
`IREFC`	LPREFC stored as 16bit (short) integers
`MFCC`	mel-frequency cepstral coefficients
`FBANK`	log filter-bank parameters
`MELSPEC`	linear filter-bank parameters
`USER`	user defined parameters
`DISCRETE`	vector quantised codebook symbols
`PLP`	perceptual linear prediction coefficients
`ANON`	matches actual parameter kind

Table. 5..2 Supported Parameter Kinds

Module	Name	Default	Description
HAUDIO	`LINEIN`	`T`	Select line input for audio
HAUDIO	`MICIN`	`F`	Select microphone input for audio
HAUDIO	`LINEOUT`	`T`	Select line output for audio
HAUDIO	`SPEAKEROUT`	`F`	Select speaker output for audio
HAUDIO	`PHONESOUT`	`T`	Select headphones output for audio
	`SOURCEKIND`	`ANON`	Parameter kind of source
	`SOURCEFORMAT`	`HTK`	File format of source
	`SOURCERATE`	`0.0`	Sample period of source in 100ns units
HWAVE	`NSAMPLES`		Num samples in alien file input via a pipe
HWAVE	`HEADERSIZE`		Size of header in an alien file
HWAVE	`STEREOMODE`		Select channel: `RIGHT` or `LEFT`
HWAVE	`BYTEORDER`		Define byte order `VAX` or other
	`NATURALREADORDER`	`F`	Enable natural read order for HTK files
	`NATURALWRITEORDER`	`F`	Enable natural write order for HTK files
	`TARGETKIND`	`ANON`	Parameter kind of target
	`TARGETFORMAT`	`HTK`	File format of target
	`TARGETRATE`	`0.0`	Sample period of target in 100ns units
HPARM	`SAVECOMPRESSED`	`F`	Save the output file in compressed form
HPARM	`SAVEWITHCRC`	`T`	Attach a checksum to output parameter file
HPARM	`ADDDITHER`	`0.0`	Level of noise added to input signal
HPARM	`ZMEANSOURCE`	`F`	Zero mean source waveform before analysis
HPARM	`WINDOWSIZE`	`256000.0`	Analysis window size in 100ns units
HPARM	`USEHAMMING`	`T`	Use a Hamming window
HPARM	`PREEMCOEF`	`0.97`	Set pre-emphasis coefficient
HPARM	`LPCORDER`	`12`	Order of LPC analysis
HPARM	`NUMCHANS`	`20`	Number of filterbank channels
HPARM	`LOFREQ`	`-1.0`	Low frequency cut-off in fbank analysis
HPARM	`HIFREQ`	`-1.0`	High frequency cut-off in fbank analysis
HPARM	`USEPOWER`	`F`	Use power not magnitude in fbank analysis
HPARM	`NUMCEPS`	`12`	Number of cepstral parameters
HPARM	`CEPLIFTER`	`22`	Cepstral liftering coefficient
HPARM	`ENORMALISE`	`T`	Normalise log energy
HPARM	`ESCALE`	`0.1`	Scale log energy
HPARM	`SILFLOOR`	`50.0`	Energy silence floor (dB)
HPARM	`DELTAWINDOW`	`2`	Delta window size
HPARM	`ACCWINDOW`	`2`	Acceleration window size
HPARM	`VQTABLE`	`NULL`	Name of VQ table
HPARM	`SAVEASVQ`	`F`	Save only the VQ indices
HPARM	`AUDIOSIG`	`0`	Audio signal number for remote control

Table. 5..4 Configuration Parameters

Module	Name	Default	Description
HPARM	`USESILDET`	`F`	Enable speech/silence detector
HPARM	`MEASURESIL`	`T`	Measure background noise level prior to sampling
HPARM	`OUTSILWARN`	`T`	Print a warning message to `stdout` before measuring audio levels
HPARM	`SPEECHTHRESH`	`9.0`	Threshold for speech above silence level (dB)
HPARM	`SILENERGY`	`0.0`	Average background noise level (dB)
HPARM	`SPCSEQCOUNT`	`10`	Window over which speech/silence decision reached
HPARM	`SPCGLCHCOUNT`	`0`	Maximum number of frames marked as silence in window which is classified as speech whilst expecting start of speech
HPARM	`SILSEQCOUNT`	`100`	Number of frames classified as silence needed to mark end of utterance
HPARM	`SILGLCHCOUNT`	`2`	Maximum number of frames marked as silence in window which is classified as speech whilst expecting silence
HPARM	`SILMARGIN`	`40`	Number of extra frames included before and after start and end of speech marks from the speech/silence detector
HPARM	`V1COMPAT`	`F`	Set Version 1.5 compatibility mode
	`TRACE`	`0`	Trace setting

Table. 5..5 Configuration Parameters (cont)

Qualifier	Meaning
`_A`	Acceleration coefficients appended
`_C`	External form is compressed
`_D`	Delta coefficients appended
`_E`	Log energy appended
`_K`	External form has checksum appended
`_N`	Absolute log energy suppressed
`_T`	Third differential coefficients appended
`_V`	VQ index appended
`_Z`	Cepstral mean subtracted
`_0`	Cepstral C0 coefficient appended