2. Solution of a rectangular linear system#
In this paragraph we are going to define a concept of solution for the linear system [éq 1-1] which has the properties of existence and unity. The approach involves two stages:
First, using a least squares approach, we build a differentiable and convex optimization problem (section 2.1) that always has at least one solution (section 2.2). Situation (2) in paragraph 1 is then eliminated,
Then, analyzing the uniqueness property (section 2.3) to see that it is not always guaranteed, we will impose an additional constraint (section 2.4) on the solution characterized in section 2.1 in order to restore uniqueness.
2.1. Least squares formalism#
The unique solution of a linear system \(\text{Ax}=b\) with a square and regular matrix achieves the minimum of the quantity \(\parallel \text{Ay}-b\parallel\) when \(y\) describes \({ℝ}^{n}\). This property opens the way for us to a concept of solution for a general linear system of the type [éq1-1] which gives it the same properties as those of the particular case of the regular system. We will therefore say from point \(x\) of \({ℝ}^{n}\) that it is a solution of the system [éq1-1] if it is a solution of the optimization problem:
\(\parallel \text{Ax}-b\parallel =\underset{y\in {ℝ}^{n}}{\text{Min}}\parallel \text{Ay}-b\parallel\) eq 2.1-1
This approach is natural because it defines a solution whose residue \(r=\text{Ax}-b\) is zero in the case where the second member is an element of \(\text{Im}A\) and is of minimum standard otherwise, which is the best that can be expected.
To analyze the [éq 2.1-1] problem, it is convenient to replace it with the following equivalent unconstrained optimization problem:
\(\text{trouver}x\in {ℝ}^{n}\text{tel que}J(x)=\underset{y\in {ℝ}^{n}}{\text{MinJ}}(y)\) eq 2.1-2
where \(J(\text{.})\) is the functional defined by:
\(J:y\in {ℝ}^{n}\to J(y)=\frac{1}{2}\parallel \text{Ax}-b\parallel\)
The advantage of the problem [éq 2.1-2] lies in the fact that the \(J(\text{.})\) functional verifies the following properties:
\(J(\text{.})\) is twice continuously differentiable:
\(\text{DJ}(x):h\in {ℝ}^{n}\to \text{DJ}(x)h=({A}^{T}\text{Ax}-{A}^{T}\text{b,h})\in ℝ\) eq 2.1-3
\({D}^{2}J(x):(\text{h,k})\in {ℝ}^{n}\times {ℝ}^{n}\to \text{DJ}(x)(\text{h,k})=({A}^{T}\text{Ah},k)\in ℝ\) eq 2.1-4
\(J(\text{.})\) is quadratic and convex.
Thus, the problem [éq 2.1-2] falls within the framework of differentiable and convex optimization so that we have the following results ([bib1] p. 156 and 146):
Convexity: any local optimum is in fact a global optimum, that is to say a solution of [éq2.1-2],
Differentiability: any local optimum verifies the Euler equation \(\text{DJ}(x)=0\) on Än which, taking into account [éq 2.1-3], leads to characterization by the equations called normal:
\({A}^{T}\text{Ax}={A}^{T}b\) eq 2.1-5
2.2. Existence of optimums#
In [bib1] p. 171 we find a demonstration of the existence of at least one solution to normal equations [éq 2.1-5]. This demonstration is based on arguments intended to take into account the case of the infinite dimension (projection theorem on a closed convex of a Hilbert space).
Since our case is much simpler, we give a demonstration of the result using only simple algebraic arguments which, in addition, we will be useful in paragraph 3. Showing that, for any \(b\) element of \({ℝ}^{m}\), the normal equations [éq 2.1-5] admit a solution is equivalent to establishing inclusion \(\text{Im}{A}^{T}\subset \text{Im}{A}^{T}A\). Now, for any real matrix \(M\) of order \(m\times n\) we have \(\text{Im}{M}^{T}={(\text{Ker}M)}^{\perp }\) ([bib3] p. 28). Also, the inclusion to be established which is equivalent to \({(\text{Ker}A)}^{\perp }\subset {(\text{Ker}{A}^{T}A)}^{\perp }\) which is itself equivalent to \(\text{Ker}{A}^{T}A\subset \text{Ker}A\). So let’s be \(x\in \text{Ker}{A}^{T}A\); then \(\text{Ax}\in \text{Ker}{A}^{T}\), that is to say \(\text{Ax}\in {(\text{Im}A)}^{\perp }\). Since \(\text{Ax}\) is also part of \(\text{Im}A\), it can only be zero, which means that \(x\in \text{Ker}A\) is the end of the demonstration.
At this stage of the argument, we can say that any system of the type [éq 1-1] admits at least one solution in the sense of [éq 2.1-3] and all these solutions are characterized as solutions in the Cramer sense of the normal equations of [éq 2.1-5]. Situation 2) in paragraph 1 is eliminated.
What remains is to eliminate the situation 3), that is, to guarantee uniqueness.
2.3. Uniqueness of the optimum and ranking of the system#
It is clear that the normal equations [éq 2.1-5], characterizing the optimums we are looking for, admit a single solution under the necessary and sufficient condition that \({A}^{T}A\) be regular. As \({A}^{T}A\) is always positive semi-definite, its inversibility is equivalent to its positivity definition, so that, taking into account the expression [éq 2.1-4] for the second derivative of the functional \(J(\text{.})\), we find the well-known theorem of uniqueness of the optimum of the problem [éq2.1-2] for a convex functional that is twice continuously differentiable ([bib1] th 7.4-3 and 7.4-4).
In general, nothing prevents the matrix \({A}^{T}A\) from being singular, the solution of the system [éq1‑1] in the sense of [éq 2.1-2] is therefore not always unique. However, we have a criterion to detect this situation. In section 2.2 we established that \(\text{Im}{A}^{T}\subset \text{Im}{A}^{T}A\) and since reciprocal inclusion is trivially true, we can conclude that the identity is \(\text{Im}{A}^{T}=\text{Im}{A}^{T}A\). The introduction of the rank \(\text{rg}(A)\) of the matrix \(A\), the dimension of its image space, then allows us to say that a necessary and sufficient condition for \({A}^{T}A\) to be invertible is that \(\text{rg}({A}^{T}A)=n\) which is equivalent to \(\text{rg}(A)=n\) because \(\text{rg}({A}^{T}A)=\text{rg}({A}^{T})=\text{rg}(A)\).
The advantage of this criterion is that it limits the analysis to the only matrix \(A\) without the need to explicitly form \({A}^{T}A\). This criterion also shows us that the normal equations associated with a strictly understressed linear system always admit an infinity of solutions. In fact, the rank of a matrix is also equal to the number of independent columns it has; therefore, for this rank to reach the value \(n\), it is necessary for the columns of the matrix to be of order at least \(n\).
2.4. Solution in the sense of least squares#
We have just noticed that the set of points that minimize the residue of the [éq 1-1] system is not necessarily reduced to a single point. To restore uniqueness we refine the concept of the solution of the [éq 1-1] system in section 2.1 by defining the solution in the sense of least squares as the minimum norm element of the set of points that minimize the residue. This solution \(x\) is then characterized by:
\(x\in {S}^{\text{def}}=\left\{y\in {ℝ}^{n};{A}^{T}\text{Ax}={A}^{T}b\right\}\text{et}\parallel x\parallel =\underset{y\in S}{\text{Inf}}\parallel y\parallel\)
This characterization is not satisfactory in practice because it requires the resolution of an optimization problem under constraints. We will replace it with another characterization that is more suitable in the sense that it will lead (see section 4.2) to a much simpler calculation procedure.
Set \(S\) above is the kernel translation of \({A}^{T}A\) by any one of the solution vectors of the normal equations [éq 2.1-5]. Also, the additional condition for minimizing the norm is interpreted as a simple projection: the solution in the sense of least squares of the system [éq 1-1] is nothing other than the projection of the origin of Än onto the set of solutions of the normal equations. Also, we can characterize it as the intersection point between the set \(S\) and the orthogonal of the core of \({A}^{T}A\).
The definition of a solution to the [éq 1-1] system can then be summarized as follows:
\(x\text{est solution de}\text{Ax}=b\iff \{\begin{array}{}{A}^{T}\text{Ax}={A}^{T}b\\ x\in {(\text{Ker}{A}^{T}A)}^{\perp }\end{array}\) eq 2.4-1
The first condition makes \(x\) a vector with a minimum residue while the second selects, from among the vectors with a minimum residue, that of minimum norm.
The definition [éq 2.4-1] is a classical generalization of the concept of the solution of a regular equi‑constrained system and gives any system of the [éq 1-1] type one solution and only one solution.