Once tagged training data has been created using the program tagdata.py and edited by a user, a hidden Markov model (HMM) can be created using trainhmm.py.
The program is called from the command line with:
All settings are within the module as shown in the code example at the end of this section. The following list describes the different configuration settings that must be defined.
febrlobject is defined next. Generally there is no need to modify this.
'hmm_train_file'. Such a file can be created using the program tagdata.py as described in Section 8.1.
'hmm_name'option. Not that this name is not the file name.
'hmm_component'should be the same as used to tag the training records in
'hmm_train_file'. Possible values are
'hmm_smoothing'option. Possible values are
Nonefor no smoothing,
'laplace'for Laplace smoothing or
'absdiscout'for absolute discount smoothing. Both smoothing methods are described in .
The format of the training data input file
must be as follows:
'#'). Blank lines are allowed and are skipped.
tagis one of the possible tags as listed in Appendix B, and
hmm_stateis one of the possible states from the state lists in Appendix A (either for the name or the address component). Any unknown tag or state in the training data will result in an error and the program stops.
The following code example shows the part of the trainhmm.py program that needs to be modified by the user according to her or his needs.
# ==================================================================== # Define a project logger init_febrl_logger(log_file_name = 'febrl-trainhmm.log', file_level = 'WARN', console_level = 'INFO', clear_log = True, parallel_output = 'host') # ==================================================================== # Set up Febrl and create a new project (or load a saved project) hmm_febrl = Febrl(description = 'HMM training Febrl instance', febrl_path = '.') hmm_project = hmm_febrl.new_project(name = 'HMM-Train', description = 'Training module for HMMs', file_name = 'hmm.fbr') # ==================================================================== # Define settings for HMM training # Name of the file containing training records - - - - - - - - - - - # hmm_train_file = 'hmm'+dirsep+'address-train.csv' # Name of the HMM file to be written - - - - - - - - - - - - - - - - # hmm_model_file = 'test-address.hmm' # Name of the HMM - - - - - - - - - - - - - - - - - - - - - - - - - - # hmm_name = 'Test Address HMM' # Component: Can either be 'name' or 'address' - - - - - - - - - - - # hmm_component = 'address' # HMM smoothing method, can be either None, 'laplace' or 'absdiscount' # hmm_smoothing = 'absdiscount'