Report

TRUST: A Deterministic Algorithm for Global Optimization

See allHide authors and affiliations

Science  16 May 1997:
Vol. 276, Issue 5315, pp. 1094-1097
DOI: 10.1126/science.276.5315.1094

Abstract

An approach to solving continuous global optimization problems was developed. It builds on two innovative concepts, subenergy tunneling and non-Lipschitzian terminal repellers, to ensure escape from local minima in a fast, reliable, and computationally efficient manner. The generally applicable methodology is embodied in the TRUST (terminal repeller unconstrained subenergy tunneling) algorithm, which is deterministic, scalable, and easy to implement. Benchmark results show that TRUST is faster and more accurate than previously reported global optimization techniques. An application of TRUST to a large-scale exploratory seismology problem of substantial computational complexity (that is, residual statics corrections) is also reported.

Examples of the role played by nonlinear optimization in the application of mathematics to practical problems abound in every field of scientific, technologic, economic, or social interest, including physical sciences (elasticity theory, hydrodynamics, celestial mechanics), computer science (design of VLSI circuits), geophysics (determination of unknown geologic parameters from subsurface or underwater measurements), biology (protein folding), industrial technology (optimal control of regulation systems, optimization of industrial flow through the factory line), and economics (optimal warehouse maintenance, optimal transportation routes).

Typically, the overall performance of a system is described by a multivariate function called the objective function. Optimality is achieved when the objective function attains its global extremum, which can be a maximum or a minimum, depending on the problem. For most real-life applications, the objective function depends on a large number of state variables or parameters. Finding the extrema, and in particular, the absolute extremum, of such a function turns out to be painstakingly tedious. The primary difficulty is that the global extremum (for example, a minimum) of a real function is—despite its name—a local property. Significant alteration of the location and magnitude of the global minimum can be carried out without affecting the locations and magnitudes of the other minima. Thus, short of exhaustive search, it would appear to be very difficult to design unfallible methods to locate the absolute minimum for an arbitrary function.

In recent years there has been a surge of interest in global optimization (1-6), and substantial progress has been achieved in breaking new theoretical ground (7-15). Nonetheless, many of the difficulties mentioned above have persisted.

The generic global optimization problem considered here can be stated as follows. Let f(x) : D → ℛ be a bounded function with a finite number of discontinuities, and x be an n -dimensional state vector. For n = 1, we denote the state variable by x. At any discontinuity pointx δ, the function f is required to satisfy the inequality limxxδ inf f(x) ≥ f(x δ) (lower semicontinuity condition). Hereafter, ℛ will denote the real numbers, f(x) will be referred to as the objective function, and the set D will be referred to as the set of feasible solutions (or search space). The goal is to find the state vectorx gm that minimizes f(x) in D. The index gm stands for global minimum. Without loss of generality, we shall take D as the hyperparallelepiped D = {xii ≤ xi ≤ βi +; i = 1, 2, … , n}, where βi and βi +denote, respectively, the lower and upper bound of the i th state variable.

We shall address the unconstrained global optimization problem in terms of the evolution of a deterministic dynamical system that combines subenergy tunneling and non-Lipschitzian “terminal” repellers. We define the subenergy tunneling transformation of the function f(x) by the following nonlinear monotonic mapping:Embedded Image(1)In Eq. 1, f̂(x) = f(x) − f(x*), a is a constant that affects the asymptotic behavior but not the monotonicity of the transformation, and x* is a fixed value of x, whose selection will be discussed below. Esub(x, x*) has the same discontinuity and extremal points as f(x) and the same relative ordering of the local and global minima, that is, Esub(x, x*) is a transformation of f(x) that preserves all properties relevant for optimization. In addition, this transformation is designed to ensure that (i) Esub(x,x*) quickly approaches zero for large, positivef̂(x); and (ii) Esub(x, x*) rapidly tends toward f̂(x) wheneverf̂(x) ≪ 0.

The second concept we have built on is terminal repellers. Suppose that x obeys an evolution given by the dynamical system dx/dt ≡ =g(x). The solutions of the corresponding stationary system, g(x) = 0, are called equilibrium points. An equilibrium point x eq of the dynamical system =g(x) is termed an attractor (repeller) if no (at least one) eigenvalue of the n × n matrix ℳ, ℳ = ∂g(x eq)/∂x, has a positive real part. Typically, a certain amount of regularity (Lipschitz condition) is required to guarantee the existence of a unique solution for each initial condition x(0), and in those cases, the system’s relaxation time to an attractor, or escape time from a repeller, is theoretically infinite. If the regularity condition at equilibrium points is violated, singular solutions are induced such that each solution approaches the terminal attractor or escapes from the terminal repeller in finite time (16). For example, the dynamical system x˙ = x1/3 has a repelling unstable equilibrium point at x = 0, which violates the Lipschitz condition. Any initial condition, no matter how close to the repelling point x = 0, will escape the repeller to reach any finite point x0 in a finite time, t0∼ x0 2/3. Terminal repellers, in conjunction with the subenergy tunneling, are the foundation of our global optimization algorithm.

We now assemble the above concepts into the TRUST (terminal repeller unconstrained subenergy tunneling) global optimization scheme. Let f(x) be a function one wishes to globally minimize over D. We define the TRUST virtual objective functionEmbedded Image Embedded Image(2) Embedded ImageIn the above expression θ is the Heaviside function, which is equal to 1 for positive values of the argument and zero otherwise. The first term on the right-hand side of Eq. 2corresponds to the subenergy tunneling function; the second term is the repeller energy term. The parameter ρ > 0 quantifies the strength of the repeller. Application of gradient descent to E(x,x*) results in the dynamical system (i = 1, … n)Embedded Image Embedded Image(3)By its very structure, E(x,x*) transforms the current local minimum of f(x) into a global maximum but preserves all lower local minima. Thus, when gradient descent is applied to the function E(x, x*), the new dynamics, initialized at a small perturbation from the local minimum of f(x), will escape this critical point to a lower valley of f(x) with a lower local minimum. Hence, application of gradient descent to E(x, x*) defined in Eq.2, as opposed to the original function f(x), results in a system that has a global descent property. This is the main idea behind constructing the TRUST virtual objective function.

The basic algorithm proceeds as follows. Initially, x*is chosen to be one corner of the hyperparallelepiped D; for example, x* i = βi , for i = 1, … n. A repeller is placed atx*. It should be noted that the repelling terms in the multidimensional case can be interpreted as hyperplane repellers and are active whenever f̂(x) ≥ 0. The initial state of the system is set to x* + d, where d is a small perturbation that drives the system into D. We assume d has a uniform sign during the basic stage of the optimization. Depending on the relative values of f(x*) and f(x* + d), the dynamical system will initially be in a tunneling phase or a gradient descent phase. Subsequent switchings between the two phases are autonomous.

Our tunneling idea is different in character and implementation from other tunneling methods based on classical diffusion, quantum effects, or metastability. Instead of the state of the system jumping over or diffusing through a barrier, the subenergy tunneling eliminates the barrier and flattens the landscapes (ideally to a plane). Then, instead of propagation of the state through that plane by diffusion, one uses the extremely fast engine of the terminal repeller to move the state on the downward slope of the virtual surface this repeller induces.

A first enhancement to this basic paradigm allows for Nϕ componentwise direction reversals of the solution flow at each boundary of the domain. The number Nϕ is specified. Because with each lower minimum identified a larger portion of the dynamical flow will be in a repeller rather than gradient descent mode, successive traversals of the domain become significantly less expensive.

A second enhancement is instantiated immediately after the occurrence of Nϕ reflections. It involves replacing the constant direction of multidimensional repelling with an ensemble of one-dimensional (1D) tunneling paths. As an illustration of this concept, assume that the system is descending toward the μ-th local minimum, μ x. The solution evolves on the hypersurface obtained by collapsing f(x) by means of the subenergy tunneling transformation. Specifically, we integrateEmbedded Image(4)whereEmbedded Image(5)After convergence to μ x, a terminal repeller is positioned there, that is,μ x* is set equal toμ x. Several schemes can be considered for implementing the 1D tunneling away from repellerμ x*. The simplest scheme consists in evaluating f[i](x) on a uniform grid. Alternately, one can integrate the differential equationEmbedded Image(6)from the initial conditionμx* i + di. In the above expressions, the notations f[i](x) andμ[i](x) are to indicate that all components of x except xi are kept fixed. Then, the first value xi for which the argument of the Heaviside function θ(μ[i]) becomes negative yields the initial condition, denotedx (μ+1), for reaching the next local minimumμ+1 x by means of Eq. 4. If θ(μ[i]) remains positive over both domains [βi ,μx* i) and (μx* i, βi +], the next parameter xi+1 is explored with, for example, Eq. 6. The choice of the dimension to be explored first (that is, the choice of i) can be random.

The successive tunneling and descent processes continue until a suitable stopping criterion is satisfied. For the 1D case when the state flows out of the domain boundary [for example, when x > β+ (assuming positive flow)], the last local minimum found is taken as the global minimum. A formal proof is given in (8, 13). The multidimensional stopping criterion is quite similar to the 1D case. When no new initial condition is obtained [when θ(f̂[i]) > 0, for i = 1, … n], the last local minimum found is taken as the global minimum.

For 1D problems, the TRUST algorithm is designed to sweep the whole search space and thus find the global minimum for any bounded lower semicontinuous function with a finite number of discontinuities (13). Indeed, in an ideal implementation, the replacement of the function f(x) with the function E(x, x*) ensures that (i) the flow follows the field lines in the negative gradient regions; (ii) when the objective function presents a discontinuity, the dynamical system (Eq. 3) can only flow down (like in a waterfall) or tunnel; (iii) once a (local) minimum is found, the regions above this minimum are flattened—ideally to a horizontal line; and (iv) the repeller then bends the horizontal portion down from the local minimum, and the system moves away from it, until a new region with negative gradient or waterfall is encountered and the process restarts.

A direct extension of the 1D scheme to multidimensional global optimization problems does not guarantee that the global optimum will always be found. This is due primarily to the tacit assumption (13) of constant direction of repelling, d(d ∈ ℛn), from a local minimumx*. The resulting trajectory of the solution in D during tunneling is then inflected only by the presence of very strong surface gradients. Consequently, a global minimum may, for example, not capture a trajectory that tunnels near the periphery of its basin of attraction if the surface gradients there are weak.

A formal convergence proof for the multidimensional case has not yet been obtained. In practice, however, because of its global descent property, the system dynamics always escapes local minima valleys with the help of the repeller effect and flows into lower valleys of the objective function using the information it gets from the tunneling process.

In practice, solving multidimensional optimization problems is a trade-off problem—namely, to find the optimal balance between accuracy and cost. In a strategy that realizes measurable trade-offs between these requirements, the multidimensional problem is reduced to the 1D case for which a formal convergence proof exists (8, 13). The first step is to transform a function of n variables into a function of one variable. In the second step, one applies the algorithm of Eq. 3, with n = 1, to this new function.

The idea is to construct a 1D curve that “covers” the n -dimensional phase space of the problem. An approximate realization has been proposed (7). Instead of actually “filling” the n -dimensional domain, the curve is only required to pass within a given distance (for example, πα) from any point in the domain. The parameter α is related to the available or desired accuracy, and it ensures that the curve has finite length and is differentiable. An explicit formula for this transformation can be found in (7), but other recipes can be devised, depending on the shape of the n -dimensional domain, optimization function, or other extraneous criteria. Such a reduction implies a worst case complexity that is proportional to the length of the 1D space and exponential in problem dimension, that is, O[(β/α)n]. Once the problem is reduced, one can apply the 1D version of TRUST, which is guaranteed to converge, within an accuracy determined by α.

To assess the TRUST algorithm, we carried out benchmark problems using several standard multidimensional test functions (12, 13). In Tables 1 and 2, the performance of TRUST is compared with the best competing global optimization methods, where “best” indicates the best widely reported reproducible results we could find for the particular test function. The criterion for comparison is the number of function evaluations. The improvement over our previously published results (13) for such benchmark problems reflects the current advances in the TRUST algorithm. The results in Tables 1 and 2 demonstrate that TRUST is substantially faster than these state-of-the-art methods and produces consistent and accurate results.

Table 1

Number of function evaluations required by different methods to reach a global minimum of standard test functions. Abbreviations for the functions: BR, Branin; CA, Camelback; GP, Goldstein-Price; RA, Rastrigin; SH, Shubert; and H3, Hartman. Abbreviations for the methods: SDE is the stochastic method of Aluffi-Pentini et al. (22); EA denotes the evolution algorithms of Yong et al. (23) or Schneider et al. (24); MLSL is the multiple-level single-linkage method of Kan and Timmer (25); IA is the interval arithmetic technique of Ratschek and Rokne (1); TUN is the tunneling method of Levy and Montalvo (26); and TS refers to the taboo search scheme of Cvijovic and Klinowski (12).

View this table:
Table 2

Comparison of the number of function evaluations (given in parentheses), and precision of each variable (x 1 to x 5) for TRUST, the fast simulated annealing (FSA) algorithm of Szu and Hartley (27), and the stochastic approximation paradigm (SAS) of Styblinski and Tang (28). Global minimum FSA and SAS results are from (28).

View this table:

One of the primary limitations of conventional global optimization algorithms is their lack of stopping criteria. This limitation is circumvented in benchmark problems, where the value and coordinates of the global minima are known in advance. The achievement of a desired accuracy (such as, ε = 10−6) is then considered as a suitable termination condition (2). For consistent comparisons, this condition was also used in TRUST, rather than its general stopping criterion. For each function, corners of the domain were taken as initial conditions; each result then represents the average number of evaluations required for convergence to the global minimum of the particular function. The TRUST calculations were performed with the value a = 2, for which the subenergy tunneling transformation achieves its most desirable asymptotic behavior (8). The dynamical equations were integrated with an adaptive scheme, that, within the basin of attraction of a local minimum, considers the local minimum as a terminal attractor. Typical base values for the key parameters Δt (time step used in integrating Eq. 3) and ρ were 0.05 and 10, respectively.

To assess the performance of TRUST for a large-scale practical application, we selected the problem of residual statics corrections for seismic data. In many geophysical tasks seismic energy is detected by receivers that are regularly spaced along a grid that covers the domain being explored. A source is positioned at some grid location to produce a shot. Time series data are collected from the detectors for each shot; then the source is moved to another grid node for the next shot.

A major degradation of seismic signals usually arises from near-surface geologic irregularities (17-19). These include uneven soil densities, topography, and significant lateral variations in the velocity of seismic waves. The most important consequence of such irregularities is a distorted image of the subsurface structure that is due to misalignment of signals caused by unpredictable delays in recorded travel times of seismic waves in the vertical neighborhood of every source and receiver. The quality of the seismic analysis is improved with timing adjustments (statics corrections). One typically distinguishes between field statics, which correspond to corrections that can be derived directly from topographic and well measurements, and residual statics, which incorporate adjustments that must be inferred statistically from the seismic data. The common occurrence of severe residual statics (where the dominant period of the recorded data is significantly exceeded) and the significant noise contamination render the automatic identification of large static shifts very difficult. This problem has generally been formulated in terms of global optimization, and to date, Monte-Carlo techniques (11) (such as simulated annealing and genetic algorithms) have provided the primary tools for seeking a potential solution. Such an approach, however, is extremely expensive.

The statics correction problem can be summarized as follows. Acoustic signals are shot from Ns source locations and received by Nr sensors. Each signal reflects at a midpoint k, k = 1, … , Nk. A trace t corresponds to seismic energy traveling from a source st to a receiver rtthrough a midpoint kt. We denote by Dft the (complex) Fourier coefficient of frequency f (f = 1, … , Nf) for trace t, t = 1, … , Nt ≤ NrNs. Ideally, after they have been corrected for normal move-out (18), all traces corresponding to the same midpoint carry coherent information. If there were no need for statics corrections, all signals, stacked by their common midpoint, should be in phase and yield a maximum for the total powerEmbedded Image Embedded Image(7)In Eq. 7, the statics correctionsS = (S1, … , SNs) and R = (R, … , RNr) are now considered independent variables. Their optimum values are found by maximizing the power E. Equation 7 highlights the multimodal nature of E, which, even for relatively low-dimensional Sand R, exhibits a very large number of local minima. This is illustrated in Fig. 1.

Figure 1

One-dimensional slice through a 154-dimensional objective function associated with a residual statics problem.

To assess the performance of TRUST, we considered a problem involving Ns = 77 shots and Nr = 77 receivers. A data set consisting of Nt = 1462 synthetic seismic traces folded over Nk = 133 common midpoint gathers was obtained from CogniSeis Corporation (20). The data set uses Nf = 49 Fourier components for data representation. Even though this set is somewhat smaller than typical collections obtained during seismic surveys by the oil industry, it is representative of the extreme complexity underlying residual statics problems. To derive a quantitative estimate of TRUST’s performance, let Ek denote the total contribution to the stack power arising from midpoint k, and let Bk refer to the upper bound of Ek in terms of S and R. Using a polar coordinates representation for the trace data Dft, that is, Dft = αft exp (iwft), we can prove (21) thatEmbedded Image(8)The TRUST results, illustrated in Fig. 2, show the significant improvement in the coherence factor of each common gather. This factor is the ratio Ek/Bk and characterizes the overall quality of the seismic image.

Figure 2

The coherence factors, that is, the dimensionless ratios Ek/Bk , are plotted for each common gather by using the initial and the optimal time shifts (residual statics). Ideally, at the global optimum, these ratios should be equal to 1. Our results, represented by the dotted line, are indeed quite close to unity.

In conclusion, the TRUST methodology for solving unconstrained global function optimization problems proves to be a powerful tool not only for academic problems; it has the robustness and consistency required by large-scale, real-life applications.

REFERENCES AND NOTES

View Abstract

Navigate This Article