This section summarises the various file formats, parameter kinds, qualifiers
and configuration parameters used by HTK. Table
lists the
audio speech file formats which can be read by the HWAVE module.
Table
lists the basic parameter kinds supported by the
HPARM module and Fig.
shows the various automatic
conversions that can be performed by appropriate choice of source and target
parameter kinds. Table
lists the available qualifiers for
parameter kinds. The first 6 of these are used to describe the target kind.
The source kind may already have some of these, HPARM adds the rest as
needed. Note that HPARM can also delete qualifiers when converting
from source to target. The final two qualifiers in Table
are only used in external files to indicate compression and an attached
checksum. HPARM adds these qualifiers to the target form during output
and only in response to setting the configuration parameters
SAVECOMPRESSED and SAVEWITHCRC. Adding the
_C or
_K qualifiers to the target kind
simply causes an error. Finally, Tables
and
lists all of the configuration parameters along with their
meaning and default values.
| Name | Description |
| HTK | The standard HTK file format |
| TIMIT | As used in the original prototype TIMIT CD-ROM |
| NIST | The standard SPHERE format used by the US NIST |
| SCRIBE | Subset of the European SAM standard used in the SCRIBE CD-ROM |
| SDES1 | The Sound Designer 1 format defined by Digidesign Inc. |
| AIFF | Audio interchange file format |
| SUNAU8 | Subset of 8bit ".au" and ".snd" formats used by Sun and NeXT |
| OGI | Format used by Oregan Graduate Institute similar to TIMIT |
| WAV | Microsoft WAVE files used on PCs |
| ESIG | Entropic Esignal file format |
| AUDIO | Pseudo format to indicate direct audio input |
| ALIEN | Pseudo format to indicate unsupported file, the alien header size must be set via the environment variable HDSIZE |
| NOHEAD | As for the ALIEN format but header size is zero |
| Kind | Meaning |
| WAVEFORM | scalar samples (usually raw speech data) |
| LPC | linear prediction coefficients |
| LPREFC | linear prediction reflection coefficients |
| LPCEPSTRA | LP derived cepstral coefficients |
| LPDELCEP | LP cepstra + delta coef (obsolete) |
| IREFC | LPREFC stored as 16bit (short) integers |
| MFCC | mel-frequency cepstral coefficients |
| FBANK | log filter-bank parameters |
| MELSPEC | linear filter-bank parameters |
| USER | user defined parameters |
| DISCRETE | vector quantised codebook symbols |
| PLP | perceptual linear prediction coefficients |
| ANON | matches actual parameter kind |
| Qualifier | Meaning |
| _A | Acceleration coefficients appended |
| _C | External form is compressed |
| _D | Delta coefficients appended |
| _E | Log energy appended |
| _K | External form has checksum appended |
| _N | Absolute log energy suppressed |
| _T | Third differential coefficients appended |
| _V | VQ index appended |
| _Z | Cepstral mean subtracted |
| _0 | Cepstral C0 coefficient appended |
| Module | Name | Default | Description |
| HAUDIO | LINEIN | T | Select line input for audio |
| HAUDIO | MICIN | F | Select microphone input for audio |
| HAUDIO | LINEOUT | T | Select line output for audio |
| HAUDIO | SPEAKEROUT | F | Select speaker output for audio |
| HAUDIO | PHONESOUT | T | Select headphones output for audio |
| SOURCEKIND | ANON | Parameter kind of source | |
| SOURCEFORMAT | HTK | File format of source | |
| SOURCERATE | 0.0 | Sample period of source in 100ns units | |
| HWAVE | NSAMPLES | Num samples in alien file input via a pipe | |
| HWAVE | HEADERSIZE | Size of header in an alien file | |
| HWAVE | STEREOMODE | Select channel: RIGHT or LEFT | |
| HWAVE | BYTEORDER | Define byte order VAX or other | |
| NATURALREADORDER | F | Enable natural read order for HTK files | |
| NATURALWRITEORDER | F | Enable natural write order for HTK files | |
| TARGETKIND | ANON | Parameter kind of target | |
| TARGETFORMAT | HTK | File format of target | |
| TARGETRATE | 0.0 | Sample period of target in 100ns units | |
| HPARM | SAVECOMPRESSED | F | Save the output file in compressed form |
| HPARM | SAVEWITHCRC | T | Attach a checksum to output parameter file |
| HPARM | ADDDITHER | 0.0 | Level of noise added to input signal |
| HPARM | ZMEANSOURCE | F | Zero mean source waveform before analysis |
| HPARM | WINDOWSIZE | 256000.0 | Analysis window size in 100ns units |
| HPARM | USEHAMMING | T | Use a Hamming window |
| HPARM | PREEMCOEF | 0.97 | Set pre-emphasis coefficient |
| HPARM | LPCORDER | 12 | Order of LPC analysis |
| HPARM | NUMCHANS | 20 | Number of filterbank channels |
| HPARM | LOFREQ | -1.0 | Low frequency cut-off in fbank analysis |
| HPARM | HIFREQ | -1.0 | High frequency cut-off in fbank analysis |
| HPARM | USEPOWER | F | Use power not magnitude in fbank analysis |
| HPARM | NUMCEPS | 12 | Number of cepstral parameters |
| HPARM | CEPLIFTER | 22 | Cepstral liftering coefficient |
| HPARM | ENORMALISE | T | Normalise log energy |
| HPARM | ESCALE | 0.1 | Scale log energy |
| HPARM | SILFLOOR | 50.0 | Energy silence floor (dB) |
| HPARM | DELTAWINDOW | 2 | Delta window size |
| HPARM | ACCWINDOW | 2 | Acceleration window size |
| HPARM | VQTABLE | NULL | Name of VQ table |
| HPARM | SAVEASVQ | F | Save only the VQ indices |
| HPARM | AUDIOSIG | 0 | Audio signal number for remote control |
| Module | Name | Default | Description |
| HPARM | USESILDET | F | Enable speech/silence detector |
| HPARM | MEASURESIL | T | Measure background noise level prior to sampling |
| HPARM | OUTSILWARN | T | Print a warning message to stdout before measuring audio levels |
| HPARM | SPEECHTHRESH | 9.0 | Threshold for speech above silence level (dB) |
| HPARM | SILENERGY | 0.0 | Average background noise level (dB) |
| HPARM | SPCSEQCOUNT | 10 | Window over which speech/silence decision reached |
| HPARM | SPCGLCHCOUNT | 0 | Maximum number of frames marked as silence in window which is classified as speech whilst expecting start of speech |
| HPARM | SILSEQCOUNT | 100 | Number of frames classified as silence needed to mark end of utterance |
| HPARM | SILGLCHCOUNT | 2 | Maximum number of frames marked as silence in window which is classified as speech whilst expecting silence |
| HPARM | SILMARGIN | 40 | Number of extra frames included before and after start and end of speech marks from the speech/silence detector |
| HPARM | V1COMPAT | F | Set Version 1.5 compatibility mode |
| TRACE | 0 | Trace setting |