ANU Computer Science Technical Reports
TR-CS-96-08
David Hawking and Paul Thistlewaite.
Relevance weighting using distance between term occurrences.
August 1996.
[POSTSCRIPT (139673 bytes)] [PDF (262322 bytes)]
Abstract: Recent work has achieved promising retrieval
performance using distance between term occurrences as a primary estimator of
document relevance. A major benefit of this approach is that relevance
scoring does not rely on collection frequency statistics. A theoretical
framework for lexical spans is now proposed which encompasses these
approaches and suggests a number of important directions for future
experimental work. Based on the formalism, approaches to issues such as
scoring partial spans, treatment of repeated term occurrences within spans,
and the importance of ordering are proposed. Consideration is given to the
practical application of the formalism to both locating and scoring concept
intersections and to locating phrases (with an estimate of confidence)
despite intervening or substituted words.
Technical Reports <Technical-DOT-Reports-AT-cs-DOT-anu.edu.au>
Last modified: Tue May 31 12:55:59 EST 2011