14.2 Tagging Look-up Table

A tagging look-up table file contains one or more blocks of entries, with all entries in a block are being assigned the same tag. Tagging look-up table files should have a file extension '.tbl'. The format of these files is as follows:

The following examples are extracted from the name_misc.tbl and territory.tbl files.

# ====================================================================

tag=<SP>  # Tag for separator elements
              and : 
               or : 
         known as : kn as, kn, known

tag=<BO>  # Tag for 'baby of' and similar sequences
             baby : 
          baby of : 
         daughter : 
      daughter of : 
              son : 
           son of : 

tag=<NE>  # Tag for word 'nee' (born as) or surname or givenname (?)
              nee : 

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

tag=<TR>  # Tag for territory words

other territories : o/t, o t, other territory, other terr

  new south wales : n s w, new s w, new south w, nsw, n south w,
                    n south wales, new south wa, n south wa,
                    new s wa, n s wales, new s wales

       queensland : q l d, q land, queen land, queens land, qld,
                    queenland

  south australia : s a, s australia, s australian, sa,
                    south australian, southern australia,
                    southern australian

         victoria : vi, vict, vic

western australia : w australia, w australian, wa,
                    western australian, west australia,
                    west australian

It is possible to load more than one tagging look-up table file into one combined tagging look-up table, by simply giving a list of file names when the table is loaded, as shown in the example below. If an entry is listed in different files with different tags, and error is triggered.

The default value for the attribute default is an empty string '', i.e. if a value is looked up in a table that does not exist, an empty string is returned. The default value can be changed when a tagging look-up table is initialised using the default argument as shown in the example below.

After one or more tagging files have been loaded into a tagging look-up table, the attribute max_key_length is set to the maximal length in words of all keys in the look-up table. If for example the longest key in a look-up table is 'south west rocks' then the value of max_key_length would be 3.

Assuming the lookup.py module has been imported using the import lookup command, an example tagging look-up table can be initialised and loaded from several files as shown in the following example. It is also assumed that the febrl.py module has been imported so the directory separator character 'dirsep' is available (as used in the example below).

# ====================================================================

name_tagging_table = lookup.TagLookupTable(name = 'NameTagTable',
                                        default = 'missing')

name_tagging_table.load(['data'+dirsep+'givenname_f.tbl',
                         'data'+dirsep+'givenname_m.tbl',
                         'data'+dirsep+'name_prefix.tbl',
                         'data'+dirsep+'name_misc.tbl',
                         'data'+dirsep+'saints.tbl',
                         'data'+dirsep+'surname.tbl',
                         'data'+dirsep+'title.tbl'])

print name_tagging_table.length

print name_tagging_table.max_key_length

print name_tagging_table[('peter',)]  # Prints: ('peter', 'GM')

print name_tagging_table['xyg0542w']  # Assume not in table, 'missing'
                                      # will be returned