![]() |
ANU College of Engineering and Computer Science
School of Computer Science
|
|
|
COMP3420 - Tutorial 4 - Week 9 (4-8 May 2009)
Association rules mining and Introduction to Rattle
ObjectivesThe objectives of this first data mining tutorial/lab are to conduct a small association rules mining example manually, and to get familiar with the open source data mining tool Rattle by conducting a data exploration project on a small example data set.
Rattle BackgroundRattle is a freely available software tool that provides a graphical user interface on top of the R statistical programming language. Rattle thus facilitates access to the many data mining and statistical functionalities from R. For more on R and how to get it please see the COMP8400 further material and resources Web page (the Rattle data mining tool is also used in the ANU Masters course COMP8400, Algorithms and Techniques for Data Mining). The main developer of Rattle is Graham Williams, senior data miner at the Australian Taxation Office (ATO) in Canberra. Rattle has and is been used in data mining courses at the Australian National University, the University of Canberra, and at Yale University. It is also used for practical data mining at the ATO and various other organisations. Graham will be giving a guest lecture in COMP8400 in the last week of the semester, and all COMP3420 students are encouraged to attend this guest lecture. Note that Rattle, similar to many other data mining tools, contains a very large number of algorithms, techniques, settings and options. In COMP3420, we will not use all of them, and you are not required to be familiar with all of them. You are however encouraged to explore these techniques and options, and read about them in the Rattle documentation or the R help pages. The Rattle software and its manual can be downloaded and accessed from: The Rattle version installed in the labs is 2.4.0. Note that this is the version assumed to be used for the second assignment as well. If you do your assignments on your laptop or home/office computer using a different version of Rattle, then make sure that you write in your assignment submission which version you were using.
PreliminariesI suggest that you create a directory comp3420 in either your ANU student account on the lab machines, on your personal laptop or desktop, or on a portable storage device such as a USB memory stick. Within this comp3420 directory, create sub-directories named tutorial4, tutorial5, and tutorial6, in order to have your COMP3420 data mining work nicely structured. You should have a look at the Rattle Data Miner documentation. Note that both Rattle and its documentation are under development, and currently not all functionality and chapters are completed. Any feedback on errors, typos and other issues is much appreciated (you can e-mail Peter Christen who will then contact Graham Williams). Preferably you should have a browse through the Rattle Data Miner documentation before the tutorial/lab session. Also, if you plan to use your own laptop in the labs, please try to install Rattle before the lab (install instructions are provided in the Rattle Data Miner documentation). If you want to dig deeper into the functionalities of Rattle please have a look at the R statistical language it is based on.
Task 1: Association rules miningThis task will be discussed in the tutorial/lab on the whiteboard, but preferably you should to try to solve this task before the tutorial/lab. The slides of the lecture Introduction to association mining will give you an example of how to solve this task. This task is similar to the second task in assignment 2. The objective is to manually conduct association rules mining on the following small example database:
TID | Item set
--------------------------
1 | ['a', 'c', 'd']
2 | ['a', 'b', 'c']
3 | ['a', 'b', 'c']
4 | ['b', 'c', 'd', 'e']
5 | ['b', 'c', 'd', 'e']
6 | ['b', 'c', 'd']
7 | ['a', 'c', 'e']
You should do the following steps manually:
Task 2: Working with RattleIn the tutorial/lab, go through the following practical steps in order to become familiar with the Rattle data mining tool.
Tasks This first lab basically consist of working through chapters Interacting with Rattle, Data, Exploring Data, and Transforming Data. These three chapters correspond to the Data, Explore, and Transform tabs in Rattle.
Last modified: 13/05/2009, 10:46
|
|||||||||||||||||||||||||||||||||||||||||||||
|
Please direct all enquiries to: webmaster@cs.anu.edu.au Page authorised by: Dean, FEIT |
| The Australian National University — CRICOS Provider Number 00120C |