Why We Need Lots of Optimization Algorithms

Next: Optimizing With COOOL Up: The CWP Object-Oriented Optimization Previous: Overview

# Why We Need Lots of Optimization Algorithms

Just as no one seismic migration algorithm would be the ideal choice in all cases, no one optimization algorithm can possibly be efficient or even successful in all cases of interest. If all our problems involved objective functions that were quadratic, with analytic derivatives, then life would be simple. But in practice we know that many of the problems we face in geophysics are vastly more complicated than this. What's worse, often the function we are attempting to optimize is not available analytically, and is only known in the form of a computer code that evaluates this function point by point. What makes some problems hard and some easy is not always clear.

Figure 1: An overview of objective functions. A: unimodal. B: essentially unimodal but with parasitic local extrema. C: fundamentally multimodal, small number of local extrema. D: significant null-space effects. E: fundamentally multimodal, huge number of local extrema. F: lacking any useful structure, brute force probably required. To aid in the visualization, all the function examples shown are are functions of only two dimensions. In practice, we are faced with functions of hundreds or thousands of unknowns.

For now, let us focus on the unconstrained optimization of functions defined on dimensional Euclidean spaces. It is clear that geometry plays a key role in determining the difficulty of a problem. Can we follow the gradient up-hill or down-hill to our target or will we be trapped in local extrema? Are these local extrema important in that they represent significant features in the model or are they irrelevant roadblocks on our way to the global extremum?

Figure 1 gives a simple overview of some of these geometrical issues. This list is not exhaustive, but illustrates some of the practical issues that arise in attempting to find the appropriate tool for a given problem. We have drawn the picture to emphasize maximization, but obviously there is no fundamental distinction between maximization and minimization. Let us look at each case individually.

• A: Unimodal functions. These are functions which have a single extremum. The archetype of such functions is the dimensional quadratic form. Non-quadratic, but still unimodal, functions can usually be optimized by making a sequence of quadratic approximations. If the matrix of second derivatives of a quadratic form is known, then special-purpose algorithms can be used.
• B: Essentially unimodal functions. Because of noise and other factors, it is often the case in practice that the simple global structure of a problem is masked by parasitic local optima. We do not regard these local optima as representing important features of the model and so, if we could somehow smooth the objective function, we would be more confident that we had determined the extremum we seek.
• C: Functions that have a small number of significant local optima. Sometimes the local optima represent significant features in the model (for example, fundamental ambiguities can exist in inverse calculations; the global optimum may not be fundamentally more significant than other local optima). In this case we should not simply smooth the objective function; we must find these local optima. If their number is relatively small, then we might be able to rely on hill climbing from randomly chosen starting points.
• D: Functions with significant null-space effects. In the event that the objective function becomes flat in the neighborhood of the current point, then this flat region represents perturbations to the model which have little or no influence on the objective function. In this case it is important to map out these regions and characterize the ambiguities that they represent.
• E: Functions with a huge number of significant local optima. It may happen that there are a large number of local optima, many of which are significant. We cannot ignore them by smoothing the objective functions, and it may be that random hill climbing is too inefficient. This is where Monte Carlo methods such as Simulated Annealing and Genetic Algorithms are normally used.
• F: Functions whose global structure provides no useful information. If the objective function is essentially flat, except for an isolated, deep optimum, then the global structure of the function is of no use in finding the desired model. Unless an alternative parameterization can be found in which the function has some global structure, such as in B, extensive brute force random searching or enumeration may be the only alternative.

This simple picture must be taken with a grain of salt when considering optimization problems with many unknowns. Much of our geometric intuition fails when confronted with high-dimensional spaces. For example, if you calculate the volumes of dimensional spheres of radius and and take their difference, you will quickly discover that as gets large, all the volume of an sphere becomes concentrated in a thin shell just inside the radius . So if one is thinking of sampling points within the domain of a function of, let's say, 1000 unknowns, then just how to do so efficiently is not obvious. In fact, this is one of the fundamental limitations of Monte Carlo methods which, almost by definition, must sample the space of possible models globally.

Next: Optimizing With COOOL Up: The CWP Object-Oriented Optimization Previous: Overview

Sun Feb 25 12:08:00 MST 1996