|
|
COMP8400 - Lab 4 - New:
Thursday 21 May
Support Vector Machines in Rattle
Objectives
The objectives of this fourth lab are to experiment with the support vector
machines (SVM) package available in R and Rattle, in order to
better understand the issues involved with this data mining technique; to
compare the SVM classification results with the results from decision trees;
and to gain more experience with the different evaluation methods
for supervised classification available in the Rattle tool.
Preliminaries
If you haven't done so yet, I suggest you create a lab4
directory (folder) within your comp8400 directory.
For this lab, we will mainly use the audit.csv data set which
you have used previously in lab 2 and
lab 3. If you want to use another data set to
conduct more experiments please do so.
The support vector machine classifier in Rattle is based on the
R package
kernlab (Kernel Methods Lab), and specifically on the ksvm
class from this package. You can get help on this class by typing the
following two commands into the R console (the terminal window where
you started R and Rattle):
- library(kernlab)
- help(ksvm)
To re-familiarise yourself with the evaluation of (classification)
models, you might want to read the corresponding chapter
Evaluation and Deployment in the Rattle Data Miner
documentation before the lab.
Tasks
- Perform the first six (6) steps from the Tasks section from
lab 3 (basically start Rattle, load
the audit.csv data set, select the
2-class supervised classifier mode, select the data set variables and
their roles, select sampling, and explore the data set if you want to
re-familiarise yourself with this data set).
- Now go to the Model tab and make sure the SVM type radio
button is selected. As you can see, there are two main parameters you
can modify, one is the Kernel function (the mathematical function
that is at the core of the SVM), and the other parameter is the
Class Weights. Please read the Rattle documentation
section on
support vector machines for more information.
Note that the current Rattle version contains a bug that results
in an error/crash when the class weights are specified (Graham, the
developer of Rattle, is working on this). Therefore, we will not
use the Class Weights parameter in this lab.
- To build a SVM, click on Execute and inspect what is printed into
the main Rattle output area. How many support vectors are
required (out of how many training records)?
- Go to the Evaluate tab and examine the confusion matrix results
you get with this SVM (make sure the Testing button is activated
and not the Training one). Write them down so you can compare
them with the results from other SVMs you will construct later on in
this lab.
Also, do you remember the accuracy you achieved with decision trees on
this data set in lab 3?
- Now experiment with the Kernel function, and for each SVM you
build examine the resulting confusion matrix. Which one gives you the
best results? Also check the Training error printed on the
Model page. Is there a correlation between training and testing
error?
- Next select the Tree classifier (as already done in
lab 3) and re-create the best decision tree
classifier you got in lab 3. Once you have done this, go to the
Evaluate tab and you will see that you can now also tick the
Tree model box.
Make sure both the SVM and Tree boxes are ticked, select
Confusion and click on Execute. This should give you two
confusion matrices each (two for the decision tree and two for the SVM).
Which one is the better classifier?
- Next select ROC and once you've executed you should see a graph
popping up which contains two curves - one for the decision tree and one
for the SVM. Compare these graphs - again, which is the better classifier,
and how do they differ?
For more information about ROC graphs please have a look at the
paper provided as additional material to this
lab.
- Finally, let's look at the
risk
charts implemented in Rattle (please read the documentation
provided at this previous link). Go back to the Data tab, and
select the Adjustment variable (attribute) as Risk
variable (make sure you click on Execute before you go back to the
Model tab).
- Again build your 2-class classifiers (decision tree and SVM), and then go
to the Evaluation tab and select Risk (make sure both
Tree and SVM are ticked). Once you click on Execute
you will see two risk chart popping up. Analyse them to see which of the
two classifiers is better.
- If you have time, you might want to use different data sets, e.g from the
UCI Machine Learning
repository, and explore how you can build SVMs and decision trees on
them.
- At the end of the lab, quit Rattle as described in the
first lab sheet.
Last modified: 15/05/2009, 09:09
|