Australian National University
Faculty of Engineering and Information Technology
COMP3410
Information Technology in Electronic Commerce
Tutorial 5 - Information Retrieval
The first part of this is a tutorial to enable you to better understand
the material covered in the lectures on information retrieval. The
second part is a short hands-on session.
Tutorial
- What are the main components of an IR system?
- What is a web crawler and how does it work
-
Given the following four documents:
- Document 0: All of the web is spam
- Document 1: Some of the web is spam
- Document 2: Spam is context dependent
create a postings file and an inverted index.
-
Given the query spam context, process this query using
the inverted index created in the previous question.
- What are query dependent and query independent evidence?
-
What is the difference between whole web, linked web and attainable
web?
-
What is dark matter on the web? Why do we care about dark matter?
-
What is a query log? Why is it useful?
-
Anchor text, tags and click descriptions are examples of external
evidence. Can you think of other kinds of external evidence? Do
you think such evidence is useful?
- How does PageRank work?
- What are some examples of web spam that you have come across?
- What is adversarial information retrieval?
-
What is the main source of income for search engine companies
like Google?
-
Discuss some of the ways of cutting costs during query processing.
-
What are some of the privacy and legal issues associated with
web IR?
-
Do you think that passage retrieval is useful? Why is XML Retrieval
important, in this context?
Lab
- Find some spam pages.
-
Use side by side comparison to compare two of your favourite
search engines.
-
Using different queries, check if the advertisements on a search
engine page changes with the type of query.
Last modified: Wed Sep 9 11:47:27 EST 2009
Ramesh Sankaranarayana