6.1.3 Step 3: Segmentation

Once a word and tag list is available, the tags are used to segment the input word elements into the correct output fields. For names, both a simple rules based and a probabilistic hidden Markov model (HMM) approach are implemented in Febrl (see Appendix C for a description of the rule-based system), but only HMM-based processing is available for addresses (which tend to have more complex and variable layouts and formats than names do). However, results of the HMM-based address processing are sufficiently good that it is unlikely that rule-based address processing will be implemented (at least by the authors). The HMM approach is discussed in more details in Chapters 7.