2. Levenberg-Marquardt algorithm#

2.1. Problem position#

There are several families of minimization algorithms [bib1]. For relatively regular problems, the most used are descent methods. Their principle is to iteratively generate a \({({c}^{k})}_{k\in N}\) sequence defined by:

\(({c}^{k+1})={c}^{k}+{\alpha }^{k}{g}^{k}\) eq 2.1-1

such as, for \(f(x)=J({c}^{k}+x{g}^{k}),x\in {R}_{+}^{\text{*}}\)

\(f(x)\) is decreasing around \({0}^{+}\)
\(f({\alpha }^{k})=\underset{x>0}{\text{Min}}f(x)\)

\({g}^{k}\) is the direction of descent at step \(k\). It is the method for determining \({g}^{k}\) that determines the nature and therefore the efficiency of the algorithm used, knowing that these techniques are mainly based on approximations of \(J\) in order 1 or order 2. For the Levenberg-Marquardt algorithm, a second-order approximation of the functional is manipulated.

2.2. Resolution#

As part of the realignment, we manipulate lower-cost functional units such as:

\(J(c)=\sum _{n=1}^{N}{j}_{n}^{2}(c)\) eq 2.2-1

where for example \({j}_{n}(c)=({F}_{n}^{\text{calc}}(c)-{F}_{n}^{\text{exp}})\), with obvious notations.

The particularity of these cost functions lies in the fact that we know the shape of their first and second derivatives:

\(({\nabla }_{c}J(c)){}_{i}\text{}=2\sum _{n=1}^{N}{j}_{n}(c)\frac{\partial {j}_{n}}{\partial {c}_{i}}\) eq 2.2-2

\((H(c)){}_{\text{ij}}\text{}=2\sum _{n=1}^{N}(\frac{\partial {j}_{n}}{\partial {c}_{i}}\text{.}\frac{\partial {j}_{n}}{\partial {c}_{j}}+{j}_{n}(c)\frac{{\partial }^{2}{j}_{n}}{\partial {c}_{i}\partial {c}_{j}})\) eq 2.2-3

So, assuming that the second term in the previous equation is negligible against the first (which is true when \({j}_{k}\) are linear in \(c\): this term is zero), we can rewrite:

\((H(c)){}_{\text{ij}}\text{}\approx 2\sum _{n=1}^{N}\frac{\partial {j}_{n}}{\partial {c}_{i}}\text{.}\frac{\partial {j}_{n}}{\partial {c}_{j}}\) eq 2.2-4

At this level, it is interesting to introduce the sensitivity matrix or Jacobian matrix defined by:

\(A=\left[\begin{array}{cccc}\frac{\partial {j}_{1}}{\partial {c}_{1}}& \frac{\partial {j}_{1}}{\partial {c}_{2}}& \text{.}\text{.}\text{.}& \frac{\partial {j}_{1}}{\partial {c}_{n}}\\ \frac{\partial {j}_{2}}{\partial {c}_{1}}& \frac{\partial {j}_{2}}{\partial {c}_{2}}& \text{.}\text{.}\text{.}& \text{.}\text{.}\text{.}\\ \text{.}\text{.}\text{.}& \text{.}\text{.}\text{.}& \text{.}\text{.}\text{.}& \text{.}\text{.}\text{.}\\ \text{.}\text{.}\text{.}& \text{.}\text{.}\text{.}& \text{.}\text{.}\text{.}& \text{.}\text{.}\text{.}\\ \frac{\partial {j}_{N}}{\partial {c}_{1}}& \frac{\partial {j}_{N}}{\partial {c}_{2}}& \text{.}\text{.}\text{.}& \frac{\partial {j}_{N}}{\partial {c}_{n}}\end{array}\right]\) eq 2.2-5

We can thus express the gradient and the Hessian by:

\({\nabla }_{c}J({c}^{k})=2{A}^{T}j\) eq 2.2-6

\(H({c}^{k})\approx 2{A}^{T}A\) eq 2.2-7

with \(j={\left[{j}_{1},\text{.}\text{.}\text{.},{j}_{N}\right]}^{T}\).

Let’s then write the expansion limited to order 2 of \(J\):

\(J(c)\approx J({c}^{k})+{(c-{c}^{k})}^{T}\text{.}{\nabla }_{c}J({c}^{k})+\frac{1}{2}{(c-{c}^{k})}^{T}H({c}^{k})\text{.}(c-{c}^{k})\) eq 2.2-8

Let \({g}^{k}=c-{c}^{k}\) be the descent step to point \({c}^{k}\), it must verify the stationnarity condition of the quadratic approximation:

\({\nabla }_{c}J({c}^{k})+H({c}^{k}){g}^{k}=0\) eq 2.2-9

Based on the gradient and Hessian expression in \(J\), we can write:

\(({A}^{T}A){g}^{k}=-{A}^{T}j\) eq 2.2-10

Solving this equation leads to an algorithm known under the name of Gauss-Newton, which is very efficient but which nevertheless has some drawbacks:

\(({A}^{T}A)\) can be almost singular and cause the non-existence of a solution.
We have no control over \({g}^{k}\), which can be too big and therefore take the parameters out of the allowable space.

To overcome these drawbacks, it is preferable to use the Levenberg-Marquardt algorithm, which proposes a regularization of the Gauss-Newton algorithm:

\(({A}^{T}A+\lambda I){g}^{k}=-{A}^{T}j\) eq 2.2-11

where \(\lambda\) is a scalar and \(I\) is the identity matrix.

We notice that if \(\lambda =0\), we find the direction given by Gauss-Newton and if \(\lambda \to +\infty\), we find the direction given by the opposite of the gradient of \(J\) i.e. the largest slope.

The Levenberg-Marquardt algorithm therefore consists, starting from a « fairly high » value of \(\lambda\), in reducing it by a factor of 10 for example, with each decrease of \(J\). We thus gradually pass from an algorithm with a greater slope to the Gauss-Newton algorithm. We can therefore present this procedure in the form:

Choosing a starting point \({c}^{0}\) and an initial value of \(\lambda\)
In iteration \(k\), solve

\(\begin{array}{}({A}^{T}A+\lambda I){g}^{k}=-{A}^{T}j\\ {c}^{k+1}={c}^{k}+{g}^{k}\end{array}\)

If \(J({c}^{k+1})<J({c}^{k})\), then \(\lambda =\lambda /10\) if not \(\lambda =\lambda \ast 10\)
Convergence test

Note:

Above, we considered the Levenberg-Marquardt algorithm from the perspective of the regularization of the Gauss-Newton algorithm. It is possible to shed a different light on this algorithm by considering it as a trusted region algorithm [bib *2*]. Indeed, it can easily be shown that the [éq 2.2-11] system is the stationnarity condition of the minimization problem:

Determinate \({g}^{k}\) such as \({g}^{k}=\text{ArgMin}({g}^{kT}\text{.}{A}^{T}j+\frac{1}{2}{g}^{kT}{A}^{T}A{g}^{k})\) subject to \(\parallel {g}^{k}\parallel \le {D}^{k}\) .

Wher \({D}^{k}=-{({A}^{T}A+\lambda I)}^{-1}{A}^{T}j\text{et}\lambda \ge 0\).

This is a very simple implementation of the Levenberg-Marquardt algorithm in which various questions are not addressed:

How to define functional \(J\) when you have several tests?
How do I choose the initial value of \(\lambda\)?
How to make \(\lambda\) evolve more finely?
How to define the field of evolution of each parameter?

We will specify these various points in the following.

2.3. Practical implementation#

2.3.1. Definition of functional#

When recalibrating, the user often has several different measurements at his disposal; these are discrete physical quantities, possibly of different natures, measured during one or more tests. These are functions of a given parameter noted \(t\) (time, abscissa,…) that can therefore be represented by:

\({F}_{1}^{\text{exp}}=\left[\begin{array}{cc}{t}_{1}& {f}_{1}^{\text{exp}}({t}_{1})\\ ⋮& ⋮\\ {t}_{N}& {f}_{1}^{\text{exp}}({t}_{N})\end{array}\right]{F}_{2}^{\text{exp}}=\left[\begin{array}{cc}{t}_{1}& {f}_{2}^{\text{exp}}({t}_{1})\\ ⋮& ⋮\\ {t}_{M}& {f}_{2}^{\text{exp}}({t}_{M})\end{array}\right]{F}_{L}^{\text{exp}}=\left[\begin{array}{cc}{t}_{1}& {f}_{L}^{\text{exp}}({t}_{1})\\ ⋮& ⋮\\ {t}_{P}& {f}_{L}^{\text{exp}}({t}_{P})\end{array}\right]\) eq 3.1-1

Each of these experimental measurements has its counterpart

\({F}_{1}^{\text{calc}}({c}^{k})=\left[\begin{array}{cc}{\tilde{t}}_{1}& {f}_{1}^{\text{calc}}({c}^{k},{\tilde{t}}_{1})\\ ⋮& ⋮\\ {\tilde{t}}_{I}& {f}_{1}^{\text{calc}}({c}^{k},{\tilde{t}}_{I})\end{array}\right]{F}_{2}^{\text{calc}}({c}^{k})=\left[\begin{array}{cc}{\tilde{t}}_{1}& {f}_{2}^{\text{calc}}({c}^{k},{\tilde{t}}_{1})\\ ⋮& ⋮\\ {\tilde{t}}_{J}& {f}_{2}^{\text{calc}}({c}^{k},{\tilde{t}}_{J})\end{array}\right]{F}_{L}^{\text{calc}}({c}^{k})=\left[\begin{array}{cc}{\tilde{t}}_{1}& {f}_{L}^{\text{calc}}({c}^{k},{\tilde{t}}_{1})\\ ⋮& ⋮\\ {\tilde{t}}_{K}& {f}_{L}^{\text{calc}}({c}^{k},{\tilde{t}}_{K})\end{array}\right]\) eq 3.1-2

calculated for a given set of \({c}^{k}\) parameters. Note that the quantities calculated are not necessarily in the same number as the quantities measured or evaluated for the same value of the parameter \(t\). We can then define the least squares functional to be minimized by:

\(J({c}^{k})=\frac{\sum _{i=1}^{N}{(\frac{{f}_{1}^{\text{exp}}({t}_{i})-{f}_{1}^{\text{calc}}({c}^{k},{t}_{i})}{{f}_{1}^{\text{exp}}({t}_{i})})}^{2}+\sum _{i=1}^{M}{(\frac{{f}_{2}^{\text{exp}}({t}_{i})-{f}_{2}^{\text{calc}}({c}^{k},{t}_{i})}{{f}_{2}^{\text{exp}}({t}_{i})})}^{2}+\text{.}\text{.}\text{.}+\sum _{i=1}^{P}{(\frac{{f}_{L}^{\text{exp}}({t}_{i})-{f}_{L}^{\text{calc}}({c}^{k},{t}_{i})}{{f}_{L}^{\text{exp}}({t}_{i})})}^{2}}{J({c}^{0})}\) eq 3.1-3

It is important to note that:

if a calculated measure \({f}_{j}^{\text{calc}}\) is not defined at an instant \({t}_{i}\), then its value is linearly interpolated
if an experimental measure \({f}_{j}^{\text{exp}}\) is zero, we do not divide the quantity \({f}_{j}^{\text{exp}}({t}_{i})-{f}_{j}^{\text{calc}}({c}^{k},{t}_{i})\) and it is present as it is in the expression of the functional
functional \(J\) is normalized to be equal to 1. at the start of the realignment iterations

2.3.2. Expression of the Jacobian matrix#

For the calculation of the Jacobian matrix, we define the error vector \(j\) by:

\(j=\left[\begin{array}{c}\frac{{f}_{1}^{\text{exp}}({t}_{1})-{f}_{1}^{\text{calc}}({c}^{k},{t}_{1})}{{f}_{1}^{\text{exp}}({t}_{1})}\\ ⋮\\ \frac{{f}_{1}^{\text{exp}}({t}_{N})-{f}_{1}^{\text{calc}}({c}^{k},{t}_{N})}{{f}_{1}^{\text{exp}}({t}_{N})}\\ \frac{{f}_{2}^{\text{exp}}({t}_{1})-{f}_{2}^{\text{calc}}({c}^{k},{t}_{1})}{{f}_{2}^{\text{exp}}({t}_{1})}\\ ⋮\\ ⋮\\ \frac{{f}_{P}^{\text{exp}}({t}_{L})-{f}_{P}^{\text{calc}}({c}^{k},{t}_{L})}{{f}_{P}^{\text{exp}}({t}_{L})}\end{array}\right]\) eq 3.2-1

Either:

\({j}_{i}^{K}=\frac{{f}_{K}^{\text{exp}}({t}_{i})-{f}_{K}^{\text{calc}}({c}^{k},{t}_{i})}{{f}_{K}^{\text{exp}}({t}_{i})}\) eq 3.2-2

We then find the expression for the Jacobian matrix of [éq 2.2-5]:

\({\rm A}=\left[\begin{array}{cccc}\frac{\partial {j}_{1}^{1}}{\partial {c}_{1}}& \frac{\partial {j}_{1}^{1}}{\partial {c}_{2}}& \cdots & \frac{\partial {j}_{1}^{1}}{\partial {c}_{n}}\\ \cdots & \cdots & \cdots & \cdots \\ \frac{\partial {j}_{N}^{1}}{\partial {c}_{1}}& \frac{\partial {j}_{N}^{1}}{\partial {c}_{2}}& \cdots & \frac{\partial {j}_{N}^{1}}{\partial {c}_{n}}\\ \frac{\partial {j}_{1}^{2}}{\partial {c}_{1}}& \frac{\partial {j}_{1}^{2}}{\partial {c}_{2}}& \cdots & \cdots \\ \cdots & \cdots & \cdots & \cdots \\ \cdots & \cdots & \cdots & \cdots \\ \frac{\partial {j}_{L}^{P}}{\partial {c}_{1}}& \frac{\partial {j}_{L}^{P}}{\partial {c}_{2}}& \cdots & \frac{\partial {j}_{L}^{P}}{\partial {c}_{n}}\end{array}\right]\) eq 3.2-3

Where the terms are calculated by direct finite differences:

\(\frac{\partial {j}_{1}^{1}}{\partial {c}_{i}}({c}_{1},\text{.}\text{.}\text{.},{c}_{i},\text{.}\text{.}\text{.},{c}_{n})\approx \frac{{j}_{1}^{1}({c}_{1},\text{.}\text{.}\text{.},{c}_{i}+{\mathrm{\alpha c}}_{i},\text{.}\text{.}\text{.},{c}_{n})-{j}_{1}^{1}({c}_{1},\text{.}\text{.}\text{.},{c}_{i},\text{.}\text{.}\text{.},{c}_{n})}{{\mathrm{\alpha c}}_{i}}\) eq 3.2-4

2.3.3. Regularization of the linear system#

Here we address the problem of determining and changing the regularization parameter \(\lambda\). To do this, we define:

\({\lambda }_{\text{max}}=\text{Max}(\text{Valeurs propres de}{A}^{T}A)\), \({\lambda }_{\text{min}}\mathrm{=}\text{Min}(\text{Valeurs propres de}{\mathrm{A}}^{\mathrm{T}}\mathrm{A})\), \(\mathrm{cond}={\lambda }_{\mathrm{max}}/{\lambda }_{\mathrm{min}}\) if \({\lambda }_{\mathrm{min}}\ne 0\)
\(Q(c)=J({c}^{k})+{(c-{c}^{k})}^{T}\text{.}{A}^{T}j+\frac{1}{2}{(c-{c}^{k})}^{T}\text{.}({A}^{T}A+\lambda I)\text{.}(c-{c}^{k})\)

2.3.3.1. Initial value of \(\lambda\)#

Knowing the above quantities, the following algorithm is defined:

If \({\lambda }_{\text{min}}=0\), then \(\lambda =1.E-3{\lambda }_{\text{max}}\)
Otherwise
If \(\mathrm{cond}<1.E5\), then \(\lambda =1.E-16{\lambda }_{\text{max}}\)
Otherwise \(\lambda =∣(1.E5{\lambda }_{\text{min}}-{\lambda }_{\text{max}})∣/10001\)

Note:

In the latter case, the value assigned to \(\lambda\) has the effect of reducing the conditioning from \({A}^{T}A\) to \(1.E5\)

2.3.3.2. Evolution of the value of \(\lambda\)#

To change \(\lambda\), we define the \({R}^{k}=\frac{J({c}^{k})-J({c}^{k+1})}{Q({c}^{k})-Q({c}^{k+1})}\) ratio, which makes it possible to assess the validity of the quadratic approximation of \(J\): the closer it is to 1, the more valid this approximation is. We deduce the following algorithm [bib2]:

If \({R}^{k}<0.25\), then \(\lambda \mathrm{=}\lambda \mathrm{\times }10\)
If \({R}^{k}>0.75\), then \(\lambda \mathrm{=}\lambda \mathrm{/}15\)

2.3.4. Limitations of the domain of evolution of the parameters#

For various reasons such as to guarantee the physical validity of the parameters (strictly positive Young’s modulus, Poisson’s ratio between \(0\) and \(0.5\),…), it is necessary to limit their domain of evolution. We therefore require \(c\) to remain in an admissible domain \(O\), closed convex of \({R}^{n}\). This therefore imposes constraints on the parameters:

\({c}_{\text{sup}}>{c}^{k}+{g}^{k}>{c}_{\text{inf}}\)

After dualizing these conditions by introducing Lagrange multipliers \({\mu }_{\text{inf}}\) and \({\mu }_{\text{sup}}\), the system is solved:

Find \({g}^{k}\) \({\mu }_{\text{inf}}\) \({\mu }_{\text{sup}}\) such as

\(\begin{array}{c}({A}^{T}A+\lambda I){g}^{k}+{\mu }_{\text{inf}}+{\mu }_{\text{sup}}=-{A}^{T}j\\ \{\begin{array}{}{c}^{k}+{g}^{k}>{c}_{\text{inf}}\\ {\mu }_{\text{inf}}>0\\ {({c}^{k}+{g}^{k}-{c}_{\text{inf}})}_{i}{({\mu }_{\text{inf}})}_{i}=0\forall i=[\mathrm{1,}n]\end{array}\\ \{\begin{array}{}{c}^{k}+{g}^{k}<{c}_{\text{sup}}\\ {\mu }_{\text{sup}}<0\\ {({c}^{k}+{g}^{k}-{c}_{\text{sup}})}_{i}{({\mu }_{\text{sup}})}_{i}=0\forall i=[\mathrm{1,}n]\end{array}\end{array}\)

This resolution is carried out using an active constraints algorithm. For more details on this algorithm, refer to [bib3] or [bib4].

2.3.5. Dimensioning#

It is often necessary to identify parameters of various physical natures. The orders of magnitude of these parameters can be extremely different. This can cause very strong differences in the orders of magnitude of the gradient and Hessian components of the cost functional and compromise the resolution.

To overcome this difficulty, it is imperative to dimension the unknowns before starting the resolution. Here is a simple and effective procedure.

Let \({c}^{0}={}^{T}\text{}\left[{c}_{1}^{0},{c}_{2}^{0},\text{.}\text{.}\text{.},{c}_{n}^{0}\right]\) be the initial vector of the quantities to be rebuilt. The dimensioning matrix \(D\) is defined:

\(D=\left[\begin{array}{ccccc}{c}_{1}^{0}& & & & \\ & {c}_{2}^{0}& & & \\ & & & & \\ & & & {c}_{n-1}^{0}& \\ & & & & {c}_{n}^{0}\end{array}\right]\) eq 3.5-1

So, if the \({c}_{i}^{0}\) are all non-zero, we can define the dimensionless unknowns by:

\({\tilde{c}}^{0}={D}^{-1}\text{.}{c}^{0}\) eq 3.5-2

Likewise, a functional dimensionless cost is introduced:

\(\tilde{\mathrm{J}}(\tilde{\mathrm{c}})\mathrm{=}\mathrm{J}(\mathrm{D}\text{.}\tilde{\mathrm{c}})\mathrm{=}\mathrm{J}(\mathrm{c})\) eq 3.5-3

As well as its gradient:

\({\mathrm{\nabla }}_{\tilde{\mathrm{c}}}\tilde{\mathrm{J}}(\tilde{\mathrm{c}})\mathrm{=}\frac{\mathrm{\partial }\tilde{\mathrm{J}}(\tilde{\mathrm{c}})}{\mathrm{\partial }\tilde{\mathrm{c}}}\mathrm{=}\frac{\mathrm{\partial }\mathrm{J}(\mathrm{D}\text{.}\tilde{\mathrm{c}})}{\mathrm{\partial }\tilde{\mathrm{c}}}\mathrm{=}\frac{\mathrm{\partial }\mathrm{J}(\mathrm{c})}{\mathrm{\partial }\mathrm{c}}\text{.}\frac{\mathrm{\partial }\mathrm{c}}{\mathrm{\partial }\tilde{\mathrm{c}}}\mathrm{=}\mathrm{D}\text{.}{\mathrm{\nabla }}_{c}\mathrm{J}(\mathrm{c})\) eq 3.5-4

And its Jacobian matrix:

\({\tilde{A}}_{\text{ij}}\mathrm{=}{A}_{\text{ij}}\mathrm{\times }{c}_{j}^{0}\) eq 3.5-5

From an algorithmic point of view, the Jacobian matrix is calculated classically with the functional \(J\), then it is dimensionless as well as the current parameters \(c\), before being transmitted to the minimization algorithm. At the output of the latter, the parameters \(\tilde{c}\) are resized to allow the calculation of the functional \(J\).

2.3.6. Convergence criterion#

The convergence criterion used in MACR_RECAL consists in testing the decrease of the functional gradient. It should be noted that the use of this criterion is naturally justified by the fact that the objective of the realignment algorithm is to cancel this gradient.

\(\frac{\mid \mid {\nabla }_{c}J({c}^{k})\mid \mid }{\mid \mid {\nabla }_{c}J({c}^{0})\mid \mid }<\mathrm{Prec.}\) eq 3.6-1

Where \(\mathrm{Prec}\) is by default taken to be equal to 1.E-3.

2.4. Global algorithm#

In order to explain the sequence of the various operations described above, the readjustment algorithm is formally presented:

Initializations
\(k=0\)
calculation of \(A\), dimensioning of \(A\)
Calculation of initial \(\lambda\)
Global iterations
Dimensioning of \({c}^{k}\)
Solving the Levenberg-Marquardt equation
Imposition of respect for the terminals
Resizing \({c}^{k+1}\)
Calculation of \(J({c}^{k+1})\)
Update \(\lambda\)
Calculation of \(A\), dimensionalization of \(A\)
Convergence test
\(k=k+1\)
End