6.7 Phone Number Cleaning and Standardisation

Cleaning and standardisation of phone numbers is done by a regular expression and rules based approach. Phone numbers can consist of a country code (possibly with an IDD - international direct dial - prefix), followed by an area code, then the actual number (with it's number of digits depending upon the country and sometimes even the area), and possibly an extension. The routines related to phone number standardisation are implemented in the phonenum.py module.

Assuming the input phone number is given in one string, the Febrl phone number standardiser parses this number into the five output fields shown in Table 6.3. The phone cleaning and parsing method has a list of all international country codes built in, as well as two routines to specifically parse Australian or Canadian/US phone numbers.

The following arguments need to be set when a phone number standardiser is initialised.

The following example code shows how a phone number standardiser is initialised.

# ====================================================================

phone_std = PhoneNumStandardiser(name = 'Phone-Num-std',
                          description = 'Phone number standardiser',
                         input_fields = 'phone_num',
                        output_fields = ['phone_country_code',
                                         'phone_country_name',
                                         'phone_area_code',
                                         'phone_number',
                                         'phone_extension'],
                      default_country = 'australia')