Automated Data Cleaning Via Logic

People

Research areas

Description

Data is usually collected as tables, but such tables usually contain many errors due to mistyping or just misunderstanding of questions. For example, a record may claim that a 5 year old is married. The Fellegi-Holt method of data cleaning is a standard way to find the minimal changes required to correct a record. We have shown that the essence of the Fellegi-Holt method of data cleaning is an old method from automated deduction called propositional resolution.

Goals

The project is to implement a prototype for the Fellgi-Holt method of data cleaning using fast SAT solvers or fast consequence finders.

Requirements

A good background in maths will be useful.

Gain

There is a high chance that this could lead to a conference publication and/or a Phd here working on data cleaning via logic.

Keywords

data cleaning, constraint satisfaction, resolution

Updated:  15 May 2018/Responsible Officer:  Head of School/Page Contact:  CECS Marketing