4. Solving a rectangular linear system#

The two methods for solving the [éq 1-1] system that we present in sections 4.1 (method of normal equations) and 4.2 (decomposition into singular values) aim at the resolution of normal equations [éq 2.1-5]. These two methods are distinguished not only by the choice of the algorithms they use (reversal versus pseudo-inversion), but also by their degree of generality and by their numerical properties, which are compared in section 4.3

4.1. Method of normal equations#

The solution of system \(\text{Ax}=b\) by the method of normal equations consists in calculating the solution in the sense of least squares [éq 2.4-1] in a « direct » way, that is to say by using the relationship \(x={({A}^{T}A)}^{-1}{A}^{T}b\) directly. To do this, it is first necessary to calculate \({A}^{T}A\) and \({A}^{T}b\), then to solve the system obtained either by iterative method or by a factorization of \({A}^{T}A\).

We can already notice that this method is limited to regular \({A}^{T}A\) matrices, which limits its field of application to the [éq 1-1] system whose matrix is of full rank (see section 2.3). In particular, the method of normal equations cannot deal with either strictly understressed systems or singular equi-constrained systems (see section 2.4).

4.2. Singular value decomposition method#

We saw in section 3.3 that the solution of system \(\text{Ax}=b\) in the least squares sense defined by [éq 2.4-1] can be characterized by the relationship \(\mathrm{x}\mathrm{=}{({\mathrm{A}}^{\mathrm{T}}\mathrm{A})}^{+}{\mathrm{A}}^{\mathrm{T}}\mathrm{b}\) [éq 3.3-3]. The method for solving the system based on this property is called the method by decomposition into singular values because it constructs the pseudo-inverse [éq 3.3-2] of \({A}^{T}A\) via the decomposition SVD [éq3.1‑1] of the matrix \(A\).

As any matrix can be decomposed into singular values, it follows that any system of the type [éq1-1] can be solved in the sense of [éq 2.4-1] by this method which therefore has, at least, the advantage of generality compared to the method of normal equations.

That is not all. The method by decomposition into singular values, unlike the method of normal equations, does not require the explicit construction of the matrix \({A}^{T}A\) and the vector \({A}^{T}b\) (we will see in section 4.3 the numerical value of this property). Indeed, it is easy to verify that the matrix \(\Sigma\) of the singular values of the factorization [éq 3.1-1] satisfies the identity \({\Sigma }^{+}{({\Sigma }^{T})}^{+}{\Sigma }^{T}={\Sigma }^{+}\), so that, premultiplying \({A}^{T}\) by the pseudo-inverse of \({A}^{T}A\) and taking into account the factorization [éq 3.1-1], we obtain \({({A}^{T}A)}^{+}{A}^{T}=P{\Sigma }^{+}{({\Sigma }^{T})}^{+}{\Sigma }^{T}{P}^{T}P{\Sigma }^{T}{Q}^{T}\), which, by orthogonality of \(P\) we Give \({({A}^{T}A)}^{+}{A}^{T}={A}^{+}\). Therefore, the characterizations [éq 3.3-3] and [éq 2.4-1] of the solution sought are equivalent to the characterization:

\(x\text{est solution de}\text{Ax}=b\iff x={A}^{+}b\) eq 4.2-1

4.3. Comparison of the method of normal equations to the method of decomposition into singular values#

In the two previous sections, we have just noted that algebraically, the method of decomposition into singular values is more general and simpler than the method of normal equations. We will now see, following [bib3] p. 336, that it is also numerically superior to him. This superiority is expressed on the one hand, in terms of the stability not only of the resolution, but also of the construction of the problem and, on the other hand, at a less critical level, in terms of adaptation to the treatment of sparse matrices.

4.3.1. Packaging#

The conditioning of a matrix \(A\) of order \(m\times n\) is defined as the ratio of its extreme and non-zero singular values:

\(\text{cond}(A)=\frac{{\mu }_{1}}{{\mu }_{r}}\)

where \(r\) is the rank of the \(A\) matrix.

The results presented in [bib3] p.184, using normal equations as an analysis tool and not as a calculation tool, show that the disturbance in the solution of the optimization problem [éq 2.1-1] due to rounding errors may be proportional to \(\text{cond}{(A)}^{2}\). But the classical results of the stability analysis of the solution of a linear system with respect to these same errors show a proportionality to the number of conditioning of the matrix. So in the case of solving normal equations directly, we get an error that is always proportional to \(\text{cond}({A}^{T}A)=\text{cond}{(A)}^{2}\), which is not as good as \(\text{cond}(A)\).

The method of solving by decomposition into singular values only uses orthogonal transformations (see paragraph 4), so it does not change the initial conditioning of the problem [éq 1-1] and is therefore, from this point of view, more attractive than the method of normal equations.

4.3.2. Loss of precision#

We have just seen that rounding errors lead to a more significant degradation of the solution when it is calculated using normal equations rather than by decomposition into singular values. The following example, taken from [bib 2], shows that the very construction of the [éq2.1-5] system of normal equations is disturbed by rounding errors.

So let’s be the following matrix:

\(A=\left[\begin{array}{cc}1& 1\\ \varepsilon & 0\\ 0& \varepsilon \end{array}\right]\) leading to \({A}^{T}A=\left[\begin{array}{cc}1+{\varepsilon }^{2}& 1\\ 1& 1+{\varepsilon }^{2}\end{array}\right]\)

whose singular values are \({\mu }_{1}=\sqrt{2+{\varepsilon }^{2}}\) and \({\mu }_{2}=\mid \varepsilon \mid\), so the rank of \(A\) is \(2\) as soon as \(\varepsilon \ne 0\). If \(\varepsilon\) verifies \({\varepsilon }^{2}<{\varepsilon }_{\text{mach}}<\varepsilon\) where \({\varepsilon }_{\text{mach}}\) is the machine precision, then all the coefficients of \({A}^{T}A\) will be calculated to the value \(1\) and the singular values calculated will be, at best, \({\mu }_{1}=\sqrt{2}\) and \({\mu }_{2}=0\). It follows that the numerical rank, calculated by the normal equations, will be \(1\), while that calculated by a SVD decomposition of the matrix \(A\) would be equal to \(2\)

4.3.3. Hollow structure#

At a lower level, the construction of the matrix of normal equations induces a filling of the associated system that the method using the decomposition into singular values avoids.

4.3.4. Conclusion#

The following table summarizes the discussion of the previous subsections:

General

Packaging

Loss of precision when constructing the problem

filling

Normal equations

full rank systems

\(\text{cond}{(A)}^{2}\)

possible

yes

SVD

any system

\(\text{cond}(A)\)

impossible

no