Estimation of Nonlinear-Regression Parameters¶
Contents
Problem Specification¶
In an additive nonlinear regression model, the elements of random vector \mathbf{Y} are expressed as follows
(1)where \mathbf{x}_i^T=(x_1, x_2, \ldots, x_k) is i-th row of regressor matrix \mathbf{X}, \mathbf{\beta} is vector of parameters, g is a given function nonlinear in parameters, and \varepsilon_i‘s are iid random variables with zero means. The estimation of parameters by the least squares method means to find such estimates of \mathbf{\beta} that minimize the residual sum of squares Q(\mathbf{\beta}) given by the following equation
(2)Due to the fact that the function Q(\mathbf{\beta}) need not be unimodal, the estimation of \mathbf{\beta} is the global optimization problem. Iterative deterministic algorithms (Levenberg-Marquardt or Gauss-Newton) used in standard statistical packages often fail when searching for the true solution of the problem.
For a given objective function f: D \rightarrow {\mathbb R}, \ D \subset {\mathbb R}^{d}, the point {\mathbf x}^{*} is to be found such that:
The point \vec{x}^{*} is called the global minimum, D is the search space defined as d-dimensional box, D=\prod_{i=1}^{d} [a_i,b_i], a_i < b_i, i=1,2,\ldots,d.
Description of Implemented Algorithms¶
Controlled Random Search¶
The implemented adaptive population-based search algorithms [8] are based on the controlled random search (CRS) proposed by Price [4]. The CRS algorithm can be written in pseudo-code as follows.
Algorithm 1: Controlled Random Search
1 2 3 4 5 6 7 8 9 | generate P; // population of N points in D at random
find x_max; // the point in P with the highest function value
repeat
y := newTrialPoint(D); // generate a new trial point y from D by using a heuristic
if f(y) < f(x_max) then begin
x_max := y; // replace x_max in pouplation with y
find x_max; // find new x_max
end
until stopping condition;
|
The role of a heuristic mentioned at line 4 can play any non-deterministic rule generating a new trial point \mathbf{y} \in D. There are many different heuristics that can be used and, moreover, the heuristics can alternate during the course of search.
Suppose we have H heuristics that can be chosen at each iteration step at random with the probability q_h, h=1,2, \ldots, H. The probabilities q_h are proportional to the previous success rates. The heuristic is successful in the current step, if it generates such a trial point {\bf y} that f({\mathbf y}) < f_{\mathrm{max}}, f_{\mathrm{max}}=f(\mathbf{x}_{\mathrm{max}}). The success is weighted by its relative change in objective function value
(3)Thus w_h \in (0, \ 1] and the corresponding probability q_h is evaluated as
(4)where W_h is the sum of w_h in previous searching steps and w_0>0 is an input parameter of the algorithm. In order to avoid the degeneration of evolutionary process the current values of q_i are reset to their starting values (q_h=1/H) when any probability q_h decreases below a given limit \delta>0.
Several sets of heuristics were experimentally checked on test functions [6] and non-linear regression tasks [7]. Four competing heuristics selected with respect to these experimental results were used in the implementation of the CRS algorithm for nonlinear regression parameter estimation. Three of them are based on a randomized reflection in the simplex S (d+1 points chosen from P) proposed by [2]. A new trial point \mathbf{y} is generated from the simplex by the relation
(5)where \mathbf{x}_{H}=\arg \max_{\mathbf{x} \in S} f(\mathbf{x}) and \mathbf{g} is the centroid of remaining d points of the simplex S. The multiplication factor U is a random variable distributed uniformly in [ s, \alpha-s), \alpha>0 and s being input parameters, 0<s<\alpha/2. All the d+1 points of simplex are chosen at random from P in two heuristics: the first one uses \alpha =2, , and , s=0.5, the second heuristic uses \alpha =5, and s=1.5. Regarding the third heuristic, one point of the simplex is the point of P with the minimum objective function value and the remaining d points of the simplex S are chosen at random from remaining points of P. Input parameters are set to \alpha =2 and s=0.5. The fourth competing heuristic is based on differential evolution [5]. A point \mathbf{u} is generated according to
(6)where \mathbf{r}_{1}, \mathbf{r}_{2} and \mathbf{r}_{3} are three distinct points taken randomly from P and F>0 is an input parameter. The elements y_{j}, j=1,2.\ldots,d of trial point \mathbf{y} are built up by the crossover of randomly taken \mathbf{x} (not coinciding with the current \mathbf{r}_1, \mathbf{r}_2, and \mathbf{r}_3) and \mathbf{u} using the following rule:
(7)where l is a randomly chosen integer from \{1,2, \ldots, d\}, U_1, U_2,\ldots,U_d are independent random variables distributed uniformly in [0,\ 1], and C \in [0, 1] is an input parameter affecting the number of elements to be exchanged by crossover. Ali and Törn [1] suggested to adapt the value of the scaling factor F during searching process according to the equation
(8)where f_{\mathrm{min}}, f_{\mathrm{max}} are the minimum and maximum objective function values in the population, respectively, and F_{\mathrm{min}} is an input parameter ensuring F\in[F_{\mathrm{min}},1). The heuristic uses (8) for evaluation of F with F_{\mathrm{min}}=0.4 and C=0.9.
This algorithm is implemented as crs4hc function in [3]. Recommended values of its control parameters are:
population size N=10\;d,
stopping condition R^2_{\mathrm{max}}-R^2_{\mathrm{min}}< \varepsilon or ne \geq 40000\;d,
where \varepsilon=1 \times 10^{-15}, R^2_{\mathrm{max}} and R^2_{\mathrm{min}} are the maximum and the minimum values of the determination index R^2 in population:
R^2= 1-Q(\mathbf{\beta})/\sum_{i=1}^{n}(Y_i-\overline{Y})^2\,.
parameters controlling the competition:
w_0=0.5, \delta for the reset of probabilities q_i to initial values, \delta=0.05.
CRS with Adaptive Stopping Condition¶
Adaptive stopping condition is implemented in the second algorithm. The value \varepsilon starts from input value \varepsilon_0 and can be decreased if 1-R^2 < \gamma \times \varepsilon, where \gamma \gg 1 is another input parameter. Algorithm written in pseudo-code presents this adaptation in more detail. The line 15 statement prevents the endless outer while loop.
Algorithm 2: CRS with Adaptive Stopping Condition
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | generate P; // population of N points in D at random
epsilon := epsilon_0;
pass := false;
find x_max; // find x with the highest function value
while ((1 - R^2) < (gamma * epsilon)) and not pass do begin // outer loop
while R_max^2 - R_min^2 > epsilon do begin // inner loop
y := newTrialPoint(D); // generate a new trial point y from D
if f(y) < f(x_max) then begin
x_max := y; // replace x_max in pouplation with y
find x_max; // find new x_max
end;
pass := true;
end; // end of inner loop
if not pass then gamma := gamma / 10;
if ((1 - R^2) < (gamma * epsilon)) and pass then begin
epsilon := epsilon / 10;
pass = false;
end;
end; // end of outer loop
|
This algorithm is implemented as crs4hce function in Matlab. Recommended values of additional input parameters are \varepsilon_0=1 \times 10^{-9} and \gamma=1 \times 10^7.
References¶
[1] | Ali, M.M., Törn, A.: Population set based global optimization algorithms: Some modifications and numerical studies, Computers and Operations Research 31 (2004) 1703 – 1725. |
[2] | Křivý, I., Tvrdík, J.: The controlled random search algorithm in optimizing regression models. Comput. Statist. and Data Anal. 20 (1995) 229 – 234. |
[3] | MATLAB, version 2006b. The MathWorks, Inc., 2006. |
[4] | Price, W. L.: A controlled random search procedure for global optimization. Computer J. 20 (1977) 367 – 370. |
[5] | Storn, R., Price, K.: Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optimization 11 (1997) 341 – 359. |
[6] | Tvrdík, J.: Generalized controlled random search and competing heuristics. MENDEL 2004, 10th International Conference on Soft Computing, (Matoušek R. and Ošmera P. eds). University of Technology, Brno (2004) 228 – 233. |
[7] | Tvrdík, J., Křivý, I.: Comparison of algorithms for nonlinear regression estimates. In: Antoch, J. (ed) COMPSTAT 2004. Physica-Verlag, Heidelberg New York (2004) 1917 – 1924. |
[8] | Tvrdík, J., Křivý, I., Mišík, L.: Adaptive Population-based Search: Application to Estimation of Nonlinear Regression Parameters. COMPUT STAT DATA AN. 52 (2007) 713 – 724. |