Surface Least Squares
Originally posted by Dan on 07:00 Wed 14 March 2007, last modified 10:24 Sat 12 May 2007.
File under: curve fitting math maximum liklihood phd programming python regression
The method of least squares, or even simply maximum likelihood is one of the more powerful tools available to a statistician. It’s so powerful, because its simplicity means it can be used in a variety of regression problems.
Regression simply means line fitting, and lines are just a graphical way to represent a model, which is the mathematical way to describe the relationship between an independent variable and one or more dependent variables. There is a lot of text on linear straight line fitting, so I’m not going to go into too much detail. I will however briefly discuss the principle behind least squares.
Consider a set of measurements, or pairs of datum points {xi, yi} where y is the dependent variable. If we think that a linear relationship exists between x and y, i.e. y = a + bx, then we can use least squares to calcuated the most likely values of a and b. This is done by minimising the deviations Δ y between the observed values yi and their corresponding fitted values, so that Δ y = yi - y(xi) = yi - a - bxi.
The data is a sample from a parent population, and if we have taken enough measurements, then it is reasonable to assume that such a parent population has a Gaussian distribution, with mean y0(xi) and standard deviation σi. With this assumption in mind, the probability Pi of making the observed measurement yi with standard deviation σi for the observations about the actual value y0(xi) is

Therefore the probability of making the observed set of N values of yi is the product of the probabilities for each observation, and remembering the laws of indices, we sum the indices when multiplying terms:

The maximum likelihood estimates of a and b are those values that maximize this probability. As the first factor is constant, not dependent on the parameters a and b, we can ignore it, and the optimization problem, reduces to minisizing the sum in the exponential. This term is the so called "goodness of fit" or χ2 parameter.
The literature reveals a lot of research on such optimization problems, as minimizing χ2, is in fact a problem of solving a set of simultaneous equations. So far this (brief) discussion, has assumed a straight line relationship between x and y, however, the theory can be applied to non-linear relationships, and indeed surfaces. My programming tool of choice at the moment is Python, it has an excellent library for scientific computation, Scipy, or at least it will be excellent when it’s all finished. It includes (actually wraps) a least square algorithm (the popular Levenberg-Marquardt algorithm) and will solve problems of this nature. However, getting it to work over surfaces is not as obvious as you might think, hence why I'm actually writing this post.
It's actually ridiculously easy to do this in Python, once you know how the least squares functions works. Here is my function:
""" The residuals from a surface fit """
# coefficients of the surface polynomial
a,b,c,d,e,f = p
# The error associated with these parameters and the polynomial
err = z - a*x**2 - b*y**2 - c*x*y - d*x - e*y - f
return err
def performLeastSquares(data):
xi = array([d[0][0] for d in data])
yi = array([d[0][1] for d in data])
zi = array([d[1] for d in data])
params = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
lstsqResult = optimize.leastsq(residual, params, args=(xi,yi,zi))
return lstsqResult[0]
Firstly we have a function, that returns the deviates, or residuals given a set of parameters, note here at that the variables x, y and z are all Numpy arrays. So, given the parameters p which is a tuple of 6 doubles, we subtract the model from the observed y value: Δ zi = zi - ax2 - by2 - cxy - dx - ey - f
This function is the first argument to the least squares function, after which is passed an initial set of parameters. Lastly the arrays x, y and z (and any other arguments needed for the residual function) must be passed. And that's it!
Once this runs, it will return the 6 parameters which you can then plot in your favourite program. Below is an example, the green points are the original data, and the red surface is the best 2nd order surface polynomial that fits the them.
