Skip navigation
The Australian National University

Speeding up Latent Dirichlet Allocation

David Newman (Research Scientist NICTA VRL, from Univ of CA Irvine)

NICTA SML SEMINAR

DATE: 2009-03-02
TIME: 11:00:00 - 12:00:00
LOCATION: NICTA - 7 London Circuit
CONTACT: JavaScript must be enabled to display this email address.

ABSTRACT:
Latent Dirichlet Allocation (aka the Topic Model) is popular for modeling discrete data coming from text collections. The time complexity for the Gibbs-sampled topic model is linear in the size of the collection, N, and linear in the number of topics, K. We would like to speed up LDA, both for computing topic models of very large collections, and for computing topic models in near real time (for smaller collections).

I will discuss two approaches to speeding up LDA. The first approach is to distribute the LDA computation over P processors, and devise a sensible method for concurrent Gibbs sampling (get up to P times speedup). The second approach is to take advantage of three specific facts that allow us to terminate early the computation over K topics (get 5-8 times speedup).
BIO:
David Newman is a Research Scientist at NICTA VRL, currently on leave from the Dept of Computer Science at the University of California, Irvine. His research interests include machine learning, data mining and text mining. Newman received his PhD from Princeton University and was a Postdoctoral Scholar at Caltech.

http://www.ics.uci.edu/~newman

Updated:  2 March 2009 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.