ANU Computer Science Technical Reports
TR-CS-01-01
Peter Christen, Markus Hegland, Stephen Roberts, and Irfan Altas.
A scalable parallel FEM surface fitting algorithm for data
mining.
October 2001.
[POSTSCRIPT (260132 bytes)] [PDF (385105 bytes)] [EPrints archive]
Abstract: The development of automatic techniques to
process and detect patterns in very large data sets is a major task in data
mining. An essential subtask is the interpolation of surfaces, which can be
done with multivariate regression. Thin plate splines provide a very good
method to determine an approximating surface. Unfortunately, obtaining
standard thin plate splines requires the solution of a dense linear system of
order n, where n is the number of observations. Thus, standard thin plate
splines are not practical, as the number of observations for data mining
applications is often in the millions. We have developed a finite element
approximation of a thin plate spline that can handle data sizes with millions
of records. Each observation record has to be read from an external file once
only and there is no need to store the data in memory. The resolution of the
finite element method can be chosen independently from the number of data
records. An overlapping domain partitioning is applied to achieve
parallelism. Our algorithm is scalable both in the number of data points as
well as with the number of processors. We present first results on a Sun
shared-memory multiprocessor.
Technical Reports <Technical-DOT-Reports-AT-cs-DOT-anu.edu.au>
Last modified: Tue May 31 12:56:00 EST 2011