Who `dat?: Identity resolution in large email collections
Doug Oard (CSIRO ICT Centre)
CSIRO ICTDATE: 2010-04-19
TIME: 14:00:00 - 15:00:00
LOCATION: CSIT Seminar Room, N101
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
Automated techniques that can support the human activities of search and sense-making in large email collections are of increasing importance for a broad range of uses, including historical scholarship, law enforcement and intelligence applications, and lawyers involved in "e- discovery" incident to civil litigation. In this talk, I'll briefly describe some of the work to date on searching large email collections, and then for most of the talk I will focus on the more challenging task of support for sense-making. Specifically, I'll describe joint work with Tamer Elsayed to automatically resolve the identity of people who are mentioned ambiguously (e.g., just by first name) in a collection of email from a failed corporation (Enron). Our results indicate that for people who are well represented in the collection we can use a generative model to guess the right identity about 80% of the time, and for others we are right about half the time. I'll conclude the talk with a few remarks on our next directions for techniques, evaluation, and additional types of collections to which similar ideas might be applied.
BIO:
I am an Associate Professor in the College of Information
Studies and the Institute for Advanced Computer Studies at
the University of Maryland, College Park. My research
interests include cross-language information retrieval,
speech retrieval, and information filtering.
