|
|
COMP8400 - Paper presentation
This presentation is worth 20% of your total course mark. It
will be marked out of 20 as indicated below.
Please note: Students are not expected to
choose a paper for their presentation until after the semester break!
Objectives
The objective of this paper presentation is for students to learn about
recent research in data mining, by selecting and reading a scientific
paper published in a relevant data mining conference or journal, and to
summarise and present the paper in both a short (10 minutes) talk and a
report that addresses various aspects of the paper.
The estimated time we expect you to spend on this assignment is around 20
hours in total (1 hour per mark). This includes browsing through the
papers provided on the conference and journal Web sites listed below,
selecting a paper of your interest, reading and summarising this paper,
possibly read further material to clarify certain aspects of the paper,
and to write the report and produce the slides for the presentation.
Submission
You will have to submit three documents, your report, your
slides and the paper you have selected (all three in PDF
format, not in MS Word or MS Power Point!!). These three documents
have to be named:
- u1234567-report.pdf
- u1234567-slides.pdf
- u1234567-paper.pdf
(please replace u1234567 with your ANU university ID).
You need to submit your slides before your presentation,
while the report and paper can be submitted afterwards (details further below).
Submission details:
- Your presentation can be a maximum of ten (10) minutes long and
should be made of five (5) slides, as detailed below.
- Your report can be maximum four (4) A4 pages long, and must contain
the section discussed below. Don't use a font smaller than 12 points.
Important:
Penalties
Penalties for late submissions of your report (if you submit it
after Friday 29 May 5 pm) are as follows:
| How late | less than 6 hours | 6 to 24 hours |
24 to 48 hours | 48 to 72 hours |
more than 72 hours |
| Penalty from 10 marks |
-0.5 | -1 | -2 | -4 |
-8 (forget it!) |
Penalties for report submission that are longer than 4 pages are as
follows:
| Number of pages | 5 | 6
| 7 or more |
| Penalty from 10 marks |
-1 | -2 | -4 |
Plagiarism
We do take plagiarism seriously! You should read the chapter in the
Department of Computer Science Student Handbook that discusses
assessment (Chapter 6, pages 18-24), particularly the sections headed
Misconduct in examinations (which also applies to assignments
and other forms of assessment) and Collaborations versus
misconduct in assignments.
If you do include material from some other documents (e.g. graphics and
figures, tables or formulas extracted from a paper, a book or a Web
site), then you clearly have to make attribution, for example by writing
the name of the paper, book, etc. where you got it from onto your slides
or adding a reference to your report.
Tasks
- Select a paper from one of the sources provided below. You should
browse around and choose a paper which is of your interest. Once
you have found a paper you would like to present, e-mail me, the
course coordinator,
the title, author(s) and conference/journal of this paper. Please
either attach the paper as a PDF file to the e-mail, or include the
URL from where you got the paper in your e-mail.
I will check if this paper is appropriate, and if it has already
been selected by another student or not (please check the list of
already selected papers below)
Alternatively, if you already have a paper you like to present
from another source (not from the list given below), please send
the paper (as PDF or postscript file) as well as a URL (from where
you got it from) to the course coordinator and ask if this paper
is appropriate.
Please also e-mail me if you prefer to give your presentation
in the first session (currently scheduled for Thursday 21 May, 9-11)
or in the second session (currently scheduled for Thursday 28 May,
9-11).
- Report (10 marks)
- The report on your selected paper has to be maximum four (4) A4
pages long. Please do not use fonts smaller than 12 points.
- The first page should contain at the top your name and ANU
university ID, the paper title and its author(s), publication
year, and the conference or journal where the paper was published.
- Your report has to contain the following sections:
- Summary: In your own words (not the paper abstract!)
summarise the topic, content, contributions and outcomes or
results of the paper, as well as its citation count (for
example taken from
Google Scholar).
- Area and techniques: Give a short description of what
area this paper covers (like data cleaning, transformation,
pre-processing; clustering; associations; prediction;
classification; privacy; etc.) and what techniques are used
or proposed.
- Experiments and data sets: If the paper contains
experimental results, what data sets are used for these
experiments (synthetic or real data, sourced from where,
publicly available or not, etc.).
- Measurements: What quality measures are used
(for example accuracy, precision, recall, support,
confidence, etc.)?
- Criticism: What are the main critic points of this
paper? This might include, for example, only limited
experiments on small data sets, claims written which are not
supported by theory or experiments, etc.
- Presentation (10 marks: 5 for slides and 5 for oral presentation)
- You should summarise your selected paper in a short presentation
of maximum 10 minutes duration.
- Your presentation should have the following 5 slides:
- Your name and ANU student ID, the paper title and its
author(s), publication year, and the conference or journal
where the paper was published, as well as its citation count
(for example taken from
Google Scholar).
- A summary of the paper (for example as a dot list of its
main content and contributions).
- The area and techniques used or proposed in the paper, as well
as data sets used for experiments.
- Outcomes and/or results of the paper.
- Your criticism of the paper, for example if it does not provide
an experimental evaluation, or only give experiments on small
data sets, or parts of the paper are written unclear, or not
enough detail are given, etc.
Presentation schedule
- We will have two two-hour sessions this year, with five presentations
per hour (ten per session). They are scheduled to be on:
- Thursday 21 May, 9-11.
- Thursday 28 May, 9-11.
- All student presentations will be held in the CSIT seminar room N101
(where the COMP8400 lectures are held).
Sources of paper (conference and journal repositories)
You should be able to access electronic version of papers in these
conferences and journals from within the ANU, if you have problems
please let the course-coordinator know.
Selected papers
The following papers have already been selected by a student:
- Title: Dynamics of a Collaborative Rating System
Authors: Kristina Lerman
Conference: WebKDD 2007
- Title: Mining frequent patterns without candidate generation
Authors: H Jiawei, P Jian, Y Yiwen
Conference: ACM SIGMOD 2000
- Title: Probabilistic Latent Semantic Visualization: Topic Model
for Visualizing Documents
Authors: Tomoharu Iwata, Takeshi Yamada, and Naonori Ueda
Conference: ACM SIGKDD 2008, Las Vegas
- Title: OntoDM: An Ontology of Data Mining
Authors: Panov, P.; Dzeroski, S.; Soldatova, L.
Conference: ICDMW 2008 Workshops, Pisa, Italy
- Title: Application of Data Mining Techniques for Medical Image
Classification
Authors: Maria-Luiza Antonie, Osmar R. Zaiane and Alexandru Coman
Conference: ACM SIGKDD 2001 Workshops
- Title: Efficient Anonymity Preserving Data Collection
Authors: Justin Brickell and Vitaly Shmatikov
Conference: ACM SIGKDD, Philadelphia, 2006
- Title: Schema exchange: Generic mappings for transforming data
and metadata
Authors: Paolo Papotti and Riccardo Torlone
Journal: Elsevier Data & Knowledge Engineering
- Title: Combined Association Rule Mining
Authors: Huaifeng Zhang, Yanchang Zhao, Longbing Cao, Chengqi Zhang
Conference: PAKDD 2008, Osaka
- Title: Mining Quality-Aware Subspace Clusters
Authors: Ying-Ju Chen, Yi-Hong Chu, Ming-Syan Chen
Conference: PAKDD 2008, Osaka.
- Title: Multi-label Lazy Associative Classification
Authors: Adriano Veloso, Wagner Meira Jr., Marcos Goncalves and
Mohammed Zaki
Conference: PAKDD 2007, Nanjing.
- Title: Mining Frequent Itemsets from Uncertain Data
Authors: Chun-Kit Chui, Ben Kao, and Edward Hung
Conference: PAKDD 2007, Nanjing.
- Title: Distributed Data Clustering can be Efficient and Exact
Authors: George Forman and Bin Zhang
Journal: SIGKDD Explorations, vol. 2, 2000.
- Title: Cost-efficient mining techniques for data streams
Authors: Mohamed Gaber, Shonali Krishnaswamy, Arkady Zaslavsky
Conference: Workshop on Australasian Information Security, Data
Mining and Web Intelligence, and Software Internationalisation, 2004.
- Title: Influence and correlation in Social Networks
Authors: Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian
Conference: ACM SIGKDD 2008, Las Vegas.
- Title: Clustering of Streaming Time Series is Meaningless
Authors: Jessica Lin, Eamonn Keogh and Wagner Truppel
Conference: ACM DMKD 2003.
- Title: SPARCL: Efficient and Effective Shape-based Clustering
Authors: Vineet Chaoji, Mohammad Hasan, Saeed Salem, and
Mohammed J. Zaki
Conference: IEEE ICDM, Pisa, 2008.
- Title: Relational Data Pre-Processing Techniques for Improved
Securities Fraud Detection
Authors: Andrew Fast, Lisa Friedland, Marc Maier, Brian Taylor,
David Jensen, Henry G. Goldberg, John Komorosk
Conference: ACM SIGKDD 2007, San Jose.
- Title: An Innovative Concept for Image Information Mining
Authors: Mihai Datcu and Klaus Seidal
Conference: MDM/KDD 2002: International Workshop on Multimedia
Data Mining (with ACM SIGKDD 2002)
- Title: Ranking and classifying attractiveness of photos in
folksonomies
Authors: Jose San Pedro, Stefan Siersdorfer
Conference: WWW '09: Proceedings of the 18th international
conference on World wide web
Last modified: 26/05/2009, 08:45
|