Student research opportunities
Developing Elaborate Topic Models
Project Code: CECS_811
This project is available at the following levels:
Masters, PhD
Please note that this project is only for higher degree (postgraduate) applicants.
Keywords:
non-parametric statistics, topic models, document analysis, machine learning, software engineering
Supervisor:
Dr Wray BuntineOutline:
Topic models allow statistical summaries of documents. They are developed using latent Bayesian statistical models using non-parametric methods, Monte Carlo Markov Chain sampling and variational methods. Topic models have been developed for a range of tasks including structured documents, time-stamped sequences of collections, collocations (multi-word terms) and text segmentation.
Each of these seem to require a lot of customisation even though the theory is quite similar. Surely we can develop a general framework (theory and tools) to make this task easier.
NOTE 1: not really an Honours project, but some variation could work.
NOTE 2: feel free to be creative and suggest an alternative!
Goals of this project
An engineering theory and software tools for rapid development of rich topic and text analysis models.
Requirements/Prerequisites
Basic Bayesian statistical computing methods (MCMC, distributions), exposure to document analysis or machine learning, good programming experience.
Background Literature
See recent papers by group and the ALTA 2011 invited talk, given at my NICTA website.
Some core variations are Topics Over Time (see Wang and McCallum, KDD 2006) and Topic-Aspect Model (see Paul and Girju, AAAI, 2010, although other variants exist).



