Skip navigation
The Australian National University

Student research opportunities

Developing Elaborate Topic Models

Project Code: CECS_811

This project is available at the following levels:
Masters, PhD
Please note that this project is only for higher degree (postgraduate) applicants.

Keywords:

non-parametric statistics, topic models, document analysis, machine learning, software engineering

Supervisor:

Dr Wray Buntine

Outline:

Topic models allow statistical summaries of documents. They are developed using latent Bayesian statistical models using non-parametric methods, Monte Carlo Markov Chain sampling and variational methods. Topic models have been developed for a range of tasks including structured documents, time-stamped sequences of collections, collocations (multi-word terms) and text segmentation.
Each of these seem to require a lot of customisation even though the theory is quite similar. Surely we can develop a general framework (theory and tools) to make this task easier.

NOTE 1: not really an Honours project, but some variation could work.

NOTE 2: feel free to be creative and suggest an alternative!

Goals of this project

An engineering theory and software tools for rapid development of rich topic and text analysis models.

Requirements/Prerequisites

Basic Bayesian statistical computing methods (MCMC, distributions), exposure to document analysis or machine learning, good programming experience.

Background Literature

See recent papers by group and the ALTA 2011 invited talk, given at my NICTA website.
Some core variations are Topics Over Time (see Wang and McCallum, KDD 2006) and Topic-Aspect Model (see Paul and Girju, AAAI, 2010, although other variants exist).

Links

My website with pointers to literature

Contact:



Updated:  1 July 2012 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.