Data is usually collected as tables, but such tables usually contain many errors due to mistyping or just misunderstanding of questions. For example, a record may claim that a 5 year old is married. The Fellegi-Holt method of data cleaning is a standard way to find the minimal changes required to correct a record. We have shown that the essence of the Fellegi-Holt method of data cleaning is an old method from automated deduction called propositional resolution.
The project is to implement a prototype for the Fellgi-Holt method of data cleaning using fast SAT solvers or fast consequence finders.
A good background in maths will be useful.
There is a high chance that this could lead to a conference publication and/or a Phd here working on data cleaning via logic.
data cleaning, constraint satisfaction, resolution