Estimation of Nonlinear-Regression Parameters ¶

Contents

Estimation of Nonlinear-Regression Parameters

Problem Specification ¶

In an additive nonlinear regression model, the elements of random vector \mathbf{Y} are expressed as follows

(1)

Y_i=g(\mathbf{x}_i, \mathbf{\beta})+\varepsilon_i, \quad i=1,2,\ldots,n\,,

where \mathbf{x}_i^T=(x_1, x_2, \ldots, x_k) is i-th row of regressor matrix \mathbf{X}, \mathbf{\beta} is vector of parameters, g is a given function nonlinear in parameters, and \varepsilon_i‘s are iid random variables with zero means. The estimation of parameters by the least squares method means to find such estimates of \mathbf{\beta} that minimize the residual sum of squares Q(\mathbf{\beta}) given by the following equation

(2)

Q(\mathbf{\beta})=\sum_{i=1}^{n}\left[Y_i - g(\mathbf{x}_i, \mathbf{\beta})\right]^{2} .

Due to the fact that the function Q(\mathbf{\beta}) need not be unimodal, the estimation of \mathbf{\beta} is the global optimization problem. Iterative deterministic algorithms (Levenberg-Marquardt or Gauss-Newton) used in standard statistical packages often fail when searching for the true solution of the problem.

Definition: (Global optimization problem with box constraints)

The global optimization problem with box constraints is formed as follows:

For a given objective function f: D \rightarrow {\mathbb R}, \ D \subset {\mathbb R}^{d}, the point {\mathbf x}^{*} is to be found such that:

\vec{x}^{*} = \arg \min_{\vec{x} \in D } f(\vec{x}).

The point \vec{x}^{*} is called the global minimum, D is the search space defined as d-dimensional box, D=\prod_{i=1}^{d} [a_i,b_i], a_i < b_i, i=1,2,\ldots,d.

Description of Implemented Algorithms ¶

Controlled Random Search ¶

The implemented adaptive population-based search algorithms [8] are based on the controlled random search (CRS) proposed by Price [4]. The CRS algorithm can be written in pseudo-code as follows.

Algorithm 1: Controlled Random Search

generate P; // population of N points in D at random
find x_max; // the point in P with the highest function value
repeat
   y := newTrialPoint(D); // generate a new trial point y from D by using a heuristic
   if f(y) < f(x_max) then begin
      x_max := y; // replace x_max in pouplation with y
      find x_max; // find new x_max
   end
until stopping condition;

The role of a heuristic mentioned at line 4 can play any non-deterministic rule generating a new trial point \mathbf{y} \in D. There are many different heuristics that can be used and, moreover, the heuristics can alternate during the course of search.

Suppose we have H heuristics that can be chosen at each iteration step at random with the probability q_h, h=1,2, \ldots, H. The probabilities q_h are proportional to the previous success rates. The heuristic is successful in the current step, if it generates such a trial point {\bf y} that f({\mathbf y}) < f_{\mathrm{max}}, f_{\mathrm{max}}=f(\mathbf{x}_{\mathrm{max}}). The success is weighted by its relative change in objective function value

(3)

w_h = \frac{f_{\mathrm{max}} - \mathrm{max}(f(\mathbf{y}), f_{\mathrm{min}})}{f_{\mathrm{max}} - f_{\mathrm{min}}}

Thus w_h \in (0, \ 1] and the corresponding probability q_h is evaluated as

(4)

q_h = \frac{W_h + w_0}{\sum_{j=1}^{H} (W_j + w_0) }\,,

where W_h is the sum of w_h in previous searching steps and w_0>0 is an input parameter of the algorithm. In order to avoid the degeneration of evolutionary process the current values of q_i are reset to their starting values (q_h=1/H) when any probability q_h decreases below a given limit \delta>0.

Several sets of heuristics were experimentally checked on test functions [6] and non-linear regression tasks [7]. Four competing heuristics selected with respect to these experimental results were used in the implementation of the CRS algorithm for nonlinear regression parameter estimation. Three of them are based on a randomized reflection in the simplex S (d+1 points chosen from P) proposed by [2]. A new trial point \mathbf{y} is generated from the simplex by the relation

(5)

\mathbf{y} = \mathbf{g} + U(\mathbf{g}-\mathbf{x}_{H})\,,

where \mathbf{x}_{H}=\arg \max_{\mathbf{x} \in S} f(\mathbf{x}) and \mathbf{g} is the centroid of remaining d points of the simplex S. The multiplication factor U is a random variable distributed uniformly in [ s, \alpha-s), \alpha>0 and s being input parameters, 0<s<\alpha/2. All the d+1 points of simplex are chosen at random from P in two heuristics: the first one uses \alpha =2, , and , s=0.5, the second heuristic uses \alpha =5, and s=1.5. Regarding the third heuristic, one point of the simplex is the point of P with the minimum objective function value and the remaining d points of the simplex S are chosen at random from remaining points of P. Input parameters are set to \alpha =2 and s=0.5. The fourth competing heuristic is based on differential evolution [5]. A point \mathbf{u} is generated according to

(6)

\mathbf{u} = \mathbf{r}_{1} + F (\mathbf{r}_{2} - \mathbf{r}_{3})\,,

where \mathbf{r}_{1}, \mathbf{r}_{2} and \mathbf{r}_{3} are three distinct points taken randomly from P and F>0 is an input parameter. The elements y_{j}, j=1,2.\ldots,d of trial point \mathbf{y} are built up by the crossover of randomly taken \mathbf{x} (not coinciding with the current \mathbf{r}_1, \mathbf{r}_2, and \mathbf{r}_3) and \mathbf{u} using the following rule:

(7)

\begin{split}y_{j} = \left \{ \begin{array}{cccc} u_{j} & \mbox{if } U_j \leq C & \mbox{or} & j=l \\ x_{j} & \mbox{if } U_j > C & \mbox{and} & j \neq l\,, \end{array} \right.\end{split}

where l is a randomly chosen integer from \{1,2, \ldots, d\}, U_1, U_2,\ldots,U_d are independent random variables distributed uniformly in [0,\ 1], and C \in [0, 1] is an input parameter affecting the number of elements to be exchanged by crossover. Ali and Törn [1] suggested to adapt the value of the scaling factor F during searching process according to the equation

(8)

\begin{split}F = \left \{ \begin{array}{cl} \max(F_{\mathrm{min}}, 1-|\frac{f_{\mathrm{max}}}{f_{\mathrm{min}}}|) & \mbox{ if } |\frac{f_{\mathrm{max}}}{f_{\mathrm{min}}}|<1 \\ \max(F_{\mathrm{min}}, 1-|\frac{f_{\mathrm{min}}}{f_{\mathrm{max}}}|) & \mbox{ otherwise}\,, \end{array} \right.\end{split}

where f_{\mathrm{min}}, f_{\mathrm{max}} are the minimum and maximum objective function values in the population, respectively, and F_{\mathrm{min}} is an input parameter ensuring F\in[F_{\mathrm{min}},1). The heuristic uses (8) for evaluation of F with F_{\mathrm{min}}=0.4 and C=0.9.

This algorithm is implemented as crs4hc function in [3]. Recommended values of its control parameters are:

population size N=10\;d,
stopping condition R^2_{\mathrm{max}}-R^2_{\mathrm{min}}< \varepsilon or ne \geq 40000\;d,

where \varepsilon=1 \times 10^{-15}, R^2_{\mathrm{max}} and R^2_{\mathrm{min}} are the maximum and the minimum values of the determination index R^2 in population:

R^2= 1-Q(\mathbf{\beta})/\sum_{i=1}^{n}(Y_i-\overline{Y})^2\,.
parameters controlling the competition:

w_0=0.5, \delta for the reset of probabilities q_i to initial values, \delta=0.05.

CRS with Adaptive Stopping Condition ¶

Adaptive stopping condition is implemented in the second algorithm. The value \varepsilon starts from input value \varepsilon_0 and can be decreased if 1-R^2 < \gamma \times \varepsilon, where \gamma \gg 1 is another input parameter. Algorithm written in pseudo-code presents this adaptation in more detail. The line 15 statement prevents the endless outer while loop.

Algorithm 2: CRS with Adaptive Stopping Condition

generate P; // population of N points in D at random
epsilon := epsilon_0;
pass := false;
find x_max; // find x with the highest function value
while ((1 - R^2) < (gamma * epsilon)) and not pass do begin // outer loop
   while R_max^2 - R_min^2 > epsilon do begin // inner loop
      y := newTrialPoint(D); // generate a new trial point y from D
      if f(y) < f(x_max) then begin
         x_max := y; // replace x_max in pouplation with y
         find x_max; // find new x_max
      end;
      pass := true;
   end; // end of inner loop
   if not pass then gamma := gamma / 10;
   if ((1 - R^2) < (gamma * epsilon)) and pass then begin
      epsilon := epsilon / 10;
      pass = false;
   end;
end; // end of outer loop

This algorithm is implemented as crs4hce function in Matlab. Recommended values of additional input parameters are \varepsilon_0=1 \times 10^{-9} and \gamma=1 \times 10^7.

References ¶

[1]	Ali, M.M., Törn, A.: Population set based global optimization algorithms: Some modifications and numerical studies, Computers and Operations Research 31 (2004) 1703 – 1725.

[2]	Křivý, I., Tvrdík, J.: The controlled random search algorithm in optimizing regression models. Comput. Statist. and Data Anal. 20 (1995) 229 – 234.

[3]	MATLAB, version 2006b. The MathWorks, Inc., 2006.

[4]	Price, W. L.: A controlled random search procedure for global optimization. Computer J. 20 (1977) 367 – 370.

[5]	Storn, R., Price, K.: Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optimization 11 (1997) 341 – 359.

[6]	Tvrdík, J.: Generalized controlled random search and competing heuristics. MENDEL 2004, 10th International Conference on Soft Computing, (Matoušek R. and Ošmera P. eds). University of Technology, Brno (2004) 228 – 233.

[7]	Tvrdík, J., Křivý, I.: Comparison of algorithms for nonlinear regression estimates. In: Antoch, J. (ed) COMPSTAT 2004. Physica-Verlag, Heidelberg New York (2004) 1917 – 1924.

[8]	Tvrdík, J., Křivý, I., Mišík, L.: Adaptive Population-based Search: Application to Estimation of Nonlinear Regression Parameters. COMPUT STAT DATA AN. 52 (2007) 713 – 724.

Estimation of Nonlinear-Regression Parameters¶

Problem Specification¶

Description of Implemented Algorithms¶

Controlled Random Search¶

CRS with Adaptive Stopping Condition¶

References¶