Skip navigation
The Australian National University

Student research opportunities

Automated Data Cleaning Via Logic

Project Code: CECS_700

This project is available at the following levels:
CS single semester, Honours, Summer Scholar, Masters

Keywords:

data cleaning, constraint satisfaction, resolution

Supervisor:

Professor Rajeev Gore

Outline:

Data is usually collected as tables, but such tables usually contain
many errors due to mistyping or just misunderstanding of
questions. For example, a record may claim that a 5 year old is
married.

The Fellegi-Holt method of data cleaning is a standard way to find
the minimal changes required to correct a record. We have shown that
the essence of the Fellegi-Holt method of data cleaning is an old
method from automated deduction called propositional resolution.

Goals of this project

The project is to implement a prototype for the Fellgi-Holt method
of data cleaning using fast SAT solvers or fast consequence finders.

Requirements/Prerequisites

A good background in maths will be useful.

Student Gain

There is a high chance that this could lead to a conference publication
and/or a Phd here working on data cleaning via logic.


Contact:



Updated:  8 January 2013 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.