Student research opportunities
Automated Data Cleaning Via Logic
Project Code: CECS_700
This project is available at the following levels:
CS single semester, Honours, Summer Scholar, Masters
Keywords:
data cleaning, constraint satisfaction, resolution
Supervisor:
Professor Rajeev GoreOutline:
Data is usually collected as tables, but such tables usually contain
many errors due to mistyping or just misunderstanding of
questions. For example, a record may claim that a 5 year old is
married.
The Fellegi-Holt method of data cleaning is a standard way to find
the minimal changes required to correct a record. We have shown that
the essence of the Fellegi-Holt method of data cleaning is an old
method from automated deduction called propositional resolution.
Goals of this project
The project is to implement a prototype for the Fellgi-Holt method
of data cleaning using fast SAT solvers or fast consequence finders.
Requirements/Prerequisites
A good background in maths will be useful.
Student Gain
There is a high chance that this could lead to a conference publication
and/or a Phd here working on data cleaning via logic.



