Australian National University

Faculty of Engineering and Information Technology

COMP3410

Information Technology in Electronic Commerce


Tutorial 5 - Information Retrieval


The first part of this is a tutorial to enable you to better understand the material covered in the lectures on information retrieval. The second part is a short hands-on session.

Tutorial

  1. What are the main components of an IR system?
  2. What is a web crawler and how does it work
  3. Given the following four documents: create a postings file and an inverted index.
  4. Given the query spam context, process this query using the inverted index created in the previous question.
  5. What are query dependent and query independent evidence?
  6. What is the difference between whole web, linked web and attainable web?
  7. What is dark matter on the web? Why do we care about dark matter?
  8. What is a query log? Why is it useful?
  9. Anchor text, tags and click descriptions are examples of external evidence. Can you think of other kinds of external evidence? Do you think such evidence is useful?
  10. How does PageRank work?
  11. What are some examples of web spam that you have come across?
  12. What is adversarial information retrieval?
  13. What is the main source of income for search engine companies like Google?
  14. Discuss some of the ways of cutting costs during query processing.
  15. What are some of the privacy and legal issues associated with web IR?
  16. Do you think that passage retrieval is useful? Why is XML Retrieval important, in this context?

Lab

  1. Find some spam pages.
  2. Use side by side comparison to compare two of your favourite search engines.
  3. Using different queries, check if the advertisements on a search engine page changes with the type of query.


Last modified: Wed Sep 9 11:47:27 EST 2009
Ramesh Sankaranarayana