ANU Computer Science Technical Reports
TR-CS-06-02
Peter Christen.
A comparison of personal name matching: Techniques and practical
issues.
September 2006.
[POSTSCRIPT (240778 bytes)] [PDF (253581 bytes)] [DSpace archive]
Abstract: Finding and matching personal names is at
the core of an increasing number of applications: from text and Web mining,
information retrieval and extraction, search engines, to deduplication and
data linkage systems. Variations and errors in names make exact string
matching problematic, and approximate matching techniques based on phonetic
encoding or pattern matching have to be applied. When compared to general
text, however, personal names have different characteristics that need to be
considered. In this paper we discuss the characteristics of personal
names and present potential sources of variations and errors. We overview a
comprehensive number of commonly used, as well as some recently developed
name matching techniques. Experimental comparisons on four large name data
sets indicate that there is no clear best technique. We provide a series of
recommendations that will help researchers and practitioners to select a name
matching technique suitable for a given data set.
Technical Reports <Technical-DOT-Reports-AT-cs-DOT-anu.edu.au>
Last modified: Tue May 31 12:56:01 EST 2011