14.3 Frequency Look-up Table

The third type of look-up table files are lists of words with corresponding frequency counts. These files contain two columns separated by a comma, thus they are simple CSV (comma separated values) files, as for example created by a spreadsheet. The first column contains words and the second column contains the corresponding frequency counts (positive integer numbers). These files should have a file extension '.csv'. The following example is extracted from a surname frequency look-up table.

# ====================================================================

dijkstra,3
miller,4325
smith,22540

A probability distribution for a given frequency look-up table is computed internally after loading such a file by summing up all the frequency counts and then dividing each frequency count by this sum.

It is possible to load more than one frequency look-up table files into one combined frequency look-up table, by simply giving a list of file names when the table is loaded, as shown in the example below. If an entry is listed in more than one file, its frequency counts are simply added up.

The default value for the attribute default is the value 1, i.e. if a value is searched in a table that does not exist, the default value 1 is returned. The default value can be changed when a frequency look-up table is initialised using the default argument as shown in the example below.

After a frequency look-up table has been loaded from one or more files, the total sum of all frequency counts is stored in the attribute sum.

Assuming the lookup.py module has been imported using the import lookup command, an example frequency look-up table can be initialised and loaded from several files as shown in the following example.

# ====================================================================

name_freq_table = lookup.FrequencyLookupTable(name = 'NameFreqTable')

name_freq_table.load(['surname_english.csv','surname_french.csv'])

print name_freq_table.sum
print name_freq_table.length

print name_freq_table['miller']       # Returns for example 246
print name_freq_table['leroc']        # Returns for example 42
print name_freq_table['deutschmann']  # Should return default value 1
                                      # assuming it's not in the table