Skip navigation
The Australian National University

Who `dat?: Identity resolution in large email collections

Doug Oard (CSIRO ICT Centre)

CSIRO ICT

DATE: 2010-04-19
TIME: 14:00:00 - 15:00:00
LOCATION: CSIT Seminar Room, N101
CONTACT: JavaScript must be enabled to display this email address.

ABSTRACT:
Automated techniques that can support the human activities of search and sense-making in large email collections are of increasing importance for a broad range of uses, including historical scholarship, law enforcement and intelligence applications, and lawyers involved in "e- discovery" incident to civil litigation. In this talk, I'll briefly describe some of the work to date on searching large email collections, and then for most of the talk I will focus on the more challenging task of support for sense-making. Specifically, I'll describe joint work with Tamer Elsayed to automatically resolve the identity of people who are mentioned ambiguously (e.g., just by first name) in a collection of email from a failed corporation (Enron). Our results indicate that for people who are well represented in the collection we can use a generative model to guess the right identity about 80% of the time, and for others we are right about half the time. I'll conclude the talk with a few remarks on our next directions for techniques, evaluation, and additional types of collections to which similar ideas might be applied.


BIO:
I am an Associate Professor in the College of Information Studies and the Institute for Advanced Computer Studies at the University of Maryland, College Park. My research interests include cross-language information retrieval, speech retrieval, and information filtering.



Updated:  6 April 2010 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.