Title: Department of Computer Science Seminar Date: Wednesday, May 31, 2000 Time: 4:00 pm to 5:00 pm Venue: Room N101, CSIT Building [108] Speaker: Nick Craswell (Ph.D Student at DCS, ANU) Description: "Distributed Information Retrieval on the Web" Abstract There are tens of thousands of search servers on the Web. Many of them offer the most comprehensive, or only, coverage of a useful document collection or database (for example at the ACM Digital Library, britannica.com, amazon.com). A search broker helps users access such resources, by selecting a useful subset of servers, querying them concurrently and presenting their results in a relevance ranked list. However, there has been a gap between theory and practice in broker research. In theory it is possible for a broker to select from a large set of servers and generate a merged ranking using state-of-the-art ranking methods. However, theoretical methods rely on special cooperation from search servers, which is not forthcoming on the Web. So in practice, Web search brokers have addressed a few servers and used merging methods of untested effectiveness. I will present new methods which allow a broker to address any search server, provided that search results and documents are accessible. Two extensive evaluation experiments show that a Web-compatible broker can be as effective as any previously proposed broker (and more effective than existing Web brokers). I will also relate server coverage limitations to a taxonomy of "Dark Matter" on the Web -- documents which are visible to some but not all observers. URL: http://cs.anu.edu.au/lib/seminars/seminars00/dept20000531 Biography: Nick Craswell recently submitted his PhD thesis "Methods for Distributed Information Retrieval" after three and a half years of study at the Australian National University. http://pastime.anu.edu.au/nick/