Skip navigation
The Australian National University

Student research opportunities

Linguistic Topic Models

Project Code: CECS_812

This project is available at the following levels:
Honours, Masters, PhD

Keywords:

text analysis, topic models, non-parametric methods, natural language processing, machine learning

Supervisor:

Dr Wray Buntine

Outline:

Topic models allow statistical summaries of documents with a more semantic bent. They are developed using latent Bayesian statistical models using non-parametric methods, Monte Carlo Markov Chain sampling and variational methods. How might these models be extended to provide more statistical support for some of the standard tasks in natural language processing such as parsing, collocation analysis, unsupervised part of speech processing and sentiment analysis? Moreover, how might standard pseudo-semantic resources such as WordNet and the Wikipedia be used as well?

NOTE 1: not really an Honours project, but some variation could work.

NOTE 2: feel free to be creative and suggest an alternative!

Goals of this project

Topical latent variable models for some task in natural language processing.

Requirements/Prerequisites

Basic Bayesian statistical computing methods (MCMC, distributions), exposure to natural language processing or machine learning, some programming experience.

Background Literature

See recent papers by group and the ALTA 2011 invited talk, given at my NICTA website.

Links

My website with pointers to literature

Contact:



Updated:  8 May 2013 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.