The Work of the Information Retrieval Research Group at the University of Sunderland
Michael Oakes (University of Sunderland, England )
CSIRO ICTDATE: 2006-07-14
TIME: 15:00:00 - 16:00:00
LOCATION: CSIT Seminar Room, N101
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
Fadi Yamout describes a new Relevance Feedback technique called Weight Propagation (WP) that outperforms existing techniques in terms of computation time and quality of the retrieval results. A relevant document will propagate positive weights to neighbouring documents in vector space, and negative weights to documents deemed non-relevant. The weights propagated are inversely related to the distance between the documents. The documents are then re-ranked according to the net weights received.
Chufeng Chen has developed a browser for searching through personal collections of digital photographs. Factors related to human episodic memory, time and location, are used to separate the photographs into discrete events. A user study was performed to compare this event-based browser with four commercial browsers, according to speed of search, user satisfaction, recall and precision. A comparison was also made between automatic annotation of images using a gazetteer and user-generated annotations.
George Ke has performed experiments on automatic classification of emails in the ENRON corpus. His PERC system is a hybrid of two clustering approaches: the centroid method which overcomes the problem of data sparseness (emails tend to be short) and kNN to allow the topic of an email folder to drift over time.
Fennie Liang has worked on automatic summarisation for the gisting of web pages to help users make quicker, more accurate judgements of web pages without having to access the actual pages. Her Query Terms Collocation (QTC) algorithm allows the sentences in the original web pages to be extracted which most closely match the query. The resulting summaries are judged according to representativeness (how well they represent their corresponding page contents) and judgeability (do they help the user decide whether the original page is relevant?).
In my own research on corpus linguistics, the chi-squared test is used to find the vocabulary most typical of seven different ICAME corpora, each representing the English used in a particular country. In a closely related study, Leech and Fallon (1992) found differences in the vocabulary used in the Brown Corpus of American English and that the LOB Corpus of British English. They were mainly interested in those vocabulary differences which they assumed to be due to cultural differences between the United States and Britain, but we are equally interested in vocabulary differences which reveal linguistic preferences in the various countries in which English is spoken.
BIO:
Michael Oakes' Ph.D. thesis, from the University of Liverpool, was entitled "Automated Assistance in the Formulation of Search Statements for Bibliographic Databases". He was the Research Associate on the CRATER Automatic Alignment Project, led by Prof. Tony McEnery at Lancaster University, which explored statistical techniques to find which portions of translated texts corresponded to each other. He has worked on two projects which were concerned with the automatic production of summaries: the Concept-Based Abstracting Project, led by Dr. Chris Paice at Lancaster University, which produced summaries of journal articles about agriculture, and on the TRESTLE project, led by Prof. Rob. Gaizauskas at University of Sheffield, which looked at extracting the key information from newsfeeds in the field of pharmacology. In 1998 I wrote the book, "Statistics for Corpus Linguistics", in the series "Edinburgh Textbooks in Empirical Linguistics", Edinburgh University Press.
Since 2001 I have been a Senior Lecturer in Computing at the University of Sunderland.


