CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science
Department of Computer Science
Printer Friendly Version of this Document

UniSAFE

COMP8400 - Lab 4 - New: Thursday 21 May

Support Vector Machines in Rattle

Objectives

The objectives of this fourth lab are to experiment with the support vector machines (SVM) package available in R and Rattle, in order to better understand the issues involved with this data mining technique; to compare the SVM classification results with the results from decision trees; and to gain more experience with the different evaluation methods for supervised classification available in the Rattle tool.

Preliminaries

If you haven't done so yet, I suggest you create a lab4 directory (folder) within your comp8400 directory.

For this lab, we will mainly use the audit.csv data set which you have used previously in lab 2 and lab 3. If you want to use another data set to conduct more experiments please do so.

The support vector machine classifier in Rattle is based on the R package kernlab (Kernel Methods Lab), and specifically on the ksvm class from this package. You can get help on this class by typing the following two commands into the R console (the terminal window where you started R and Rattle):

  • library(kernlab)
  • help(ksvm)

To re-familiarise yourself with the evaluation of (classification) models, you might want to read the corresponding chapter Evaluation and Deployment in the Rattle Data Miner documentation before the lab.


Tasks

  1. Perform the first six (6) steps from the Tasks section from lab 3 (basically start Rattle, load the audit.csv data set, select the 2-class supervised classifier mode, select the data set variables and their roles, select sampling, and explore the data set if you want to re-familiarise yourself with this data set).

  2. Now go to the Model tab and make sure the SVM type radio button is selected. As you can see, there are two main parameters you can modify, one is the Kernel function (the mathematical function that is at the core of the SVM), and the other parameter is the Class Weights. Please read the Rattle documentation section on support vector machines for more information.

    Note that the current Rattle version contains a bug that results in an error/crash when the class weights are specified (Graham, the developer of Rattle, is working on this). Therefore, we will not use the Class Weights parameter in this lab.

  3. To build a SVM, click on Execute and inspect what is printed into the main Rattle output area. How many support vectors are required (out of how many training records)?

  4. Go to the Evaluate tab and examine the confusion matrix results you get with this SVM (make sure the Testing button is activated and not the Training one). Write them down so you can compare them with the results from other SVMs you will construct later on in this lab.

    Also, do you remember the accuracy you achieved with decision trees on this data set in lab 3?

  5. Now experiment with the Kernel function, and for each SVM you build examine the resulting confusion matrix. Which one gives you the best results? Also check the Training error printed on the Model page. Is there a correlation between training and testing error?

  6. Next select the Tree classifier (as already done in lab 3) and re-create the best decision tree classifier you got in lab 3. Once you have done this, go to the Evaluate tab and you will see that you can now also tick the Tree model box.

    Make sure both the SVM and Tree boxes are ticked, select Confusion and click on Execute. This should give you two confusion matrices each (two for the decision tree and two for the SVM). Which one is the better classifier?

  7. Next select ROC and once you've executed you should see a graph popping up which contains two curves - one for the decision tree and one for the SVM. Compare these graphs - again, which is the better classifier, and how do they differ?

    For more information about ROC graphs please have a look at the paper provided as additional material to this lab.

  8. Finally, let's look at the risk charts implemented in Rattle (please read the documentation provided at this previous link). Go back to the Data tab, and select the Adjustment variable (attribute) as Risk variable (make sure you click on Execute before you go back to the Model tab).

  9. Again build your 2-class classifiers (decision tree and SVM), and then go to the Evaluation tab and select Risk (make sure both Tree and SVM are ticked). Once you click on Execute you will see two risk chart popping up. Analyse them to see which of the two classifiers is better.

  10. If you have time, you might want to use different data sets, e.g from the UCI Machine Learning repository, and explore how you can build SVMs and decision trees on them.

  11. At the end of the lab, quit Rattle as described in the first lab sheet.


Last modified: 15/05/2009, 09:09