8.2 Program 'trainhmm.py'

Once tagged training data has been created using the program tagdata.py and edited by a user, a hidden Markov model (HMM) can be created using trainhmm.py.

The program is called from the command line with:

python trainhmm.py

All settings are within the module as shown in the code example at the end of this section. The following list describes the different configuration settings that must be defined.

The format of the training data input file 'hmm_train_file' must be as follows:

The following code example shows the part of the trainhmm.py program that needs to be modified by the user according to her or his needs.

# ====================================================================
# Define a project logger

init_febrl_logger(log_file_name = 'febrl-trainhmm.log',
                     file_level = 'WARN',
                  console_level = 'INFO',
                      clear_log = True,
                parallel_output = 'host')

# ====================================================================
# Set up Febrl and create a new project (or load a saved project)

hmm_febrl = Febrl(description = 'HMM training Febrl instance',
                   febrl_path = '.')

hmm_project = hmm_febrl.new_project(name = 'HMM-Train',
                             description = 'Training module for HMMs',
                               file_name = 'hmm.fbr')

# ====================================================================
# Define settings for HMM training

# Name of the file containing training records  - - - - - - - - - - - 
#
hmm_train_file = 'hmm'+dirsep+'address-train.csv'

# Name of the HMM file to be written  - - - - - - - - - - - - - - - -
#
hmm_model_file = 'test-address.hmm'

# Name of the HMM - - - - - - - - - - - - - - - - - - - - - - - - - -
#
hmm_name = 'Test Address HMM'

# Component: Can either be 'name' or 'address'  - - - - - - - - - - -
#
hmm_component = 'address'

# HMM smoothing method, can be either None, 'laplace' or 'absdiscount'
#
hmm_smoothing = 'absdiscount'