Exploring Distributed Databases
Henning Koehler (University of Queensland)
CSIRO ICTDATE: 2009-08-25
TIME: 16:00:00 - 17:00:00
LOCATION: CSIT Seminar Room, N101
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
A central problem that arises when integrating unfamiliar database systems or constructing queries over them, is to identify relationships between tables and attributes. As manual comparison of attributes is expensive, this process should be automated as far as possible. Of particular interest here are relationships between attributes which allow us to join tables, such as foreign keys, as this is fundamental for further integration steps. For exploration purposes, knowledge of such relationships allows quick identification of similar data sets, and make it easier to understand a new system. In this talk we will present a prototype which matches attributes across multiple distributed databases, and allows users to explore these by following join paths, examining samples and providing feedback for future use by others. Our system employs a newly developed sampling approach, which allows matching of clean as well as `dirty' data (i.e., where the content is the same but may contain typos or be formatted differently), without compromising efficiency.
BIO:
Dr. Henning Koehler is a Research Fellow at the University of Queensland, and part of the Data and Knowledge Engineering group. He obtained his Master in Mathematics in 2003, and his PhD in Information Systems in 2007. His research interests include graph algorithms, database design, dependency theory, data integration, sampling, multi- dimensional search/indexing and data provenance. His is currently working on a CSIRO Water For A Health Country Flagship Collaboration project with Prof Xiaofang Zhou.


