Statistical estimation from an optimization viewpoint

Content:

Statistics and optimization have been closely linked from the very outset. The search for a `best' estimator (least squares, maximum likelihood, etc.) certainly relies on optimization tools. On the other hand, Statistics has often provided the motivation for the development of algorithmic procedures for certain classes of optimization problems. However, it's only relatively recently, more specifically in connection with the development of an approximation and sampling theory for stochastic programming problems, that the full connection has come to light. This in turn suggests a more comprehensive approach to the formulation of statistical estimation questions. This expository paper reviews some of the features of this approach.

Parametric and nonparametric estimation problems are in some sense at the opposite ends of what fits under the `statistical estimation' umbrella. Usually, some partial information is available about the unknown distribution, but not quite enough to be able to pinpoint the parametric class to which the true distribution function belongs. For example, one might know, or suspect, that this distribution function is associated with a unimodal density function. One might know, or stipulate, bounds on certain quantiles, etc. In the same way that knowledge about the parametric class plays an important role in the (final) choice of the estimate, whatever information is available should be exploited in the choice of a `best' estimate. In the formulation of the estimation problem, `available information' gets translated in terms of `constraints' that restrict the choice of the estimate to certain subclasses of distribution functions.

To justify the choice of an estimator, one generally appeals to asymptotic analysis: one proves consistency and, if possible, one derives a convergence rate that enables us to approximate the distribution of the error.

Although asymptotic analysis has been the mainstay of `mathematical statistics,' when dealing with practical applications statisticians have often made use of estimators that might not (yet) have been subjected to full asymptotic analysis. One of the reasons for this technological gap between theory and practice is that to carry out the asymptotic analysis, the estimator should be `nice': simple, smooth, etc. And practice might suggest or dictate a choice which doesn't quite fulfill these requirements. In particular, this is usually the case when the estimator is the argmax function, i.e., a mapping which associates with a sample the solution of an optimization problem (based on these observations). This is exactly the estimator analyzed in this paper. paper.

June 30, 1998