Student research opportunities
Unsupervised Part-of-Speech Training with Non-parametric Methods
Project Code: CECS_818
This project is available at the following levels:
Honours, Masters
Keywords:
non-parametric statistics, part of speech models, machine learning
Supervisor:
Dr Wray BuntineOutline:
One of the classic problems in natural language processing is unsupervised inference of parts of speech, i.e., attempting to infer noun/verb classes etc. without having tagged text. Recent research here by Blunsom and Cohn uses Bayesian non-parametric methods specifically Pitman-Yor processes. Implement and test these methods and try various extensions using the techniques from our group.
Requirements/Prerequisites
Gibbs sampling. Good programming skills. Basic exposure to natural language processing.
Background Literature
"A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction", Phil Blunsom and Trevor Cohn. ACL 2011, Portland, Oregon.
Links
Blunsom and Cohn's paperMy website with pointers to literature



