Student research opportunities
Linguistic Topic Models
Project Code: CECS_812
This project is available at the following levels:
Honours, Masters, PhD
Keywords:
text analysis, topic models, non-parametric methods, natural language processing, machine learning
Supervisor:
Dr Wray BuntineOutline:
Topic models allow statistical summaries of documents with a more semantic bent. They are developed using latent Bayesian statistical models using non-parametric methods, Monte Carlo Markov Chain sampling and variational methods. How might these models be extended to provide more statistical support for some of the standard tasks in natural language processing such as parsing, collocation analysis, unsupervised part of speech processing and sentiment analysis? Moreover, how might standard pseudo-semantic resources such as WordNet and the Wikipedia be used as well?
NOTE 1: not really an Honours project, but some variation could work.
NOTE 2: feel free to be creative and suggest an alternative!
Goals of this project
Topical latent variable models for some task in natural language processing.
Requirements/Prerequisites
Basic Bayesian statistical computing methods (MCMC, distributions), exposure to natural language processing or machine learning, some programming experience.
Background Literature
See recent papers by group and the ALTA 2011 invited talk, given at my NICTA website.



