Why do students fail? Data mining the DCS student database
(Research project, suitable for
PhD and
MPhil or
Honours)
The ANU Department of Computer
Science (DCS) has been collecting student course enrolments,
marks and grades through its FAculty Information System
(FAIS) for several years.
The aim of this research project is to develop data mining
techniques that allow analysis of the FAIS database with the
objective to better understand student performance, progress and
retention. The questions that the DCS is interested in include,
for example: why do students stop studying computer science?, what
correlations are there between their marks in different courses?,
can we predict that a student will have problems in a certain
course based on her or his past performance?, can we identify
students that might be at risk of failing or dropping out of
computer sciences courses early in their studies?
This project will include the following components:
- Exploration, development and application of techniques that
allow data mining of the multi-relational data that is
available in FAIS. This will include using the open source
data mining tool
Rattle as well as
development of specific tools and techniques required for
this project (likely using the
Python programming
language).
- Development of a data generator - based on real, summarised
FAIS data - that will allow creation of synthetic data to be
used for later testing and evaluation of the data mining
techniques to be developed in this project.
- Analysis of real, anonymised FAIS student data. This work
will be done on a secure compute server and will require
that an Ethics approval for this research has been obtained
from the
Human
Ethics committee at the
ANU Research Office.
Note: Parts of this project depend upon the outcomes of
the Human Ethics approval, and therefore there might be changes
to this project.
Data mining of discussion forums
(Research project, suitable for
PhD and
MPhil or
Honours)
Increasingly, people communicate with each other using
electronic techniques such as e-mails, SMS, or discussion forums,
bulletin boards or chat rooms. So far, only limited research has
been done in trying to find patterns in such online discussions.
This project aims to investigate data mining techniques in order
to find patterns in publicly available online discussion forums
(for example forums that discuss new movies, DVDs, games, music,
or electronic products). The challenges with such data is that
participants often use nick names, typos and abbreviations are
common, as are slang expressions and emoticons, like ;-) or ;-(.
Questions that this research could address are: Who are the
participants in an online discussion? What topic are they
discussing? Can we find new trends being discussed? Who is
starting new trends? How are the participants interacting? When
and how long are participants online?
This research is fairly open and involves many challenges,
including: extracting the participants and what they write;
extracting the conversations in a discussion forum (there might
be several discussions going on at the same time); or finding the
topics of a discussion by using external data (e.g. by querying a
search engine). The research therefore involves techniques such
as entity resolution, link analysis, time series analysis, text
mining, and information retrieval.