14. Appendix 2. Gram-Schmidt orthogonalization#
14.1. Introduction#
We have seen several times in this document that the orthogonalization quality of a vector family is crucial for the smooth operation of the algorithms and the quality of the modes obtained. Moreover, this task must be carried out constantly, which is why a fast algorithm is important.
Simple orthonormalization algorithms are deduced from the classical Gram-Schmidt process but they are often « conditionally stable » (the quality of their work depends on the matrix conditioning of the family to be orthonormalized). This lack of robustness can prove to be a problem in dealing with particularly poorly conditioned situations. More expensive but robust algorithms based on projections or rotations are then preferred: Householder and Givens spectral transformations.
In practice, for the implementation of the IRA method in Code_Aster, we retained an iterative version of the Gram-Schmidt process (the IGSM of Kahan-Parlett [Par80]) It achieves a good compromise between robustness and computational complexity since it is unconditionally stable and makes it possible to orthogonalize to machine precision, in, at most, twice as long as a Gram-Schmidt Classic (GS).
In the following paragraphs we will detail how the basic algorithm works, as well as that of these two main variants. The first was set up in CALC_MODES with OPTION among [“PROCHE”, “”, “AJUSTE”, “SEPARE”] and the second one is used in CALC_MODES with OPTION among [“BANDE”, “CENTRE”, “”, “”, “PLUS_”]. TOUT A very powerful comparison of these methods is presented in [Vau00] (pp33-36) based on a very simple example.
14.2. Gram-Schmidt algorithm (GS)#
Given \(k\) independent vectors of \({ℝ}^{n}\), \({({\mathrm{x}}_{i})}_{i\mathrm{=}\mathrm{1,}k}\) we want to obtain \(k\) orthonormal vectors (with respect to any dot product) \({({\mathrm{y}}_{i})}_{i\mathrm{=}\mathrm{1,}k}\) of the space they generate. In other words, we want to obtain another orthonormal family generating the same space. The classical Gram-Schmidt orthonormalization process is as follows:
\(\begin{array}{c}\text{Pour}\text{i =}\text{1,}\text{k}\text{faire}\\ \text{Pour}\text{j =}\text{1,}\text{i}\text{-1 faire}\\ \text{calcul de}{r}_{\mathit{ji}}\mathrm{=}({\mathrm{y}}_{j},{\mathrm{x}}_{i})\\ \text{Fin boucle.}\\ \text{calcul de}{\stackrel{ˆ}{\mathrm{y}}}_{i}\mathrm{=}{\mathrm{x}}_{i}\mathrm{-}\mathrm{\sum }_{j\mathrm{=}\mathrm{1,}i\mathrm{-}1}{r}_{\mathit{ji}}{\mathrm{y}}_{j},\\ \text{calcul de}{r}_{\mathit{ii}}\mathrm{=}\mathrm{\parallel }{\stackrel{ˆ}{\mathrm{y}}}_{i}\mathrm{\parallel },\\ \text{calcul de}{\mathrm{y}}_{i}\mathrm{=}\frac{{\stackrel{ˆ}{\mathrm{y}}}_{i}}{{r}_{\mathit{ii}}}\\ \text{Fin boucle}\text{.}\end{array}\)
Algorithm 2.1. Gram-Schmidt algorithm (GS) .
This process is simple but very unstable due to rounding errors, which has the effect of producing non-orthogonal vectors. In particular, when the initial vectors are almost dependent, this creates significant magnitude differences in the second stage of the process.
\({\stackrel{ˆ}{\mathrm{y}}}_{i}\mathrm{=}{\mathrm{x}}_{i}\mathrm{-}\mathrm{\sum }_{j\mathrm{=}\mathrm{1,}i\mathrm{-}1}{r}_{\text{ij}}{\mathrm{y}}_{j}\)
From a digital perspective, managing these differences is very difficult.
Note:
By noting \(Q\) the matrix generated by the \({({\mathrm{y}}_{i})}_{i\mathrm{=}\mathrm{1,}k}\), we have therefore explicitly constructed a QR factorization of the initial matrix \(X\) linked to the \({({\mathrm{x}}_{i})}_{i\mathrm{=}\mathrm{1,}k}\). This is in fact the aim of any orthogonalization process.
14.3. Modified Gram-Schmidt Algorithm (GSM)#
In order to eliminate these numerical instabilities, the previous algorithm is reorganized. Mathematically equivalent to the previous method, this one is then much more robust because it avoids significant differences in magnitude between the vectors manipulated in the algorithm.
In the initial method, the orthogonal vectors \({\mathrm{y}}_{i}\) are obtained without taking into account the preceding \(i-1\) orthogonalizations. With Modified Gram-Schmidt, we orthogonalize more progressively taking into account previous alterations following the process below.
\(\begin{array}{c}\text{Pour}\text{i =}\text{1,}\text{k}\text{faire}\\ \text{}{\stackrel{ˆ}{\mathrm{y}}}_{i}\mathrm{=}{\mathrm{x}}_{i}\\ \text{Pour}\text{j =}\text{1,}\text{i}\text{-1 faire}\\ \text{calcul de}{r}_{\mathit{ji}}\mathrm{=}({\mathrm{y}}_{j},{\mathrm{x}}_{i}),\\ \text{calcul de}{\stackrel{ˆ}{\mathrm{y}}}_{i}\mathrm{=}{\mathrm{x}}_{i}\mathrm{-}{r}_{\mathit{ji}}{\mathrm{y}}_{j},\\ \text{Fin boucle.}\\ \text{calcul de}{r}_{\mathit{ii}}\mathrm{=}\mathrm{\parallel }{\stackrel{ˆ}{\mathrm{y}}}_{i}\mathrm{\parallel },\\ \text{calcul de}{\mathrm{y}}_{i}\mathrm{=}\frac{{\stackrel{ˆ}{\mathrm{y}}}_{i}}{{r}_{\mathit{ii}}}\\ \text{Fin boucle}\text{.}\end{array}\)
Algorithm 2.2. Modified Gram-Schmidt Algorithm (GSM) .
The orthonormality of the base vectors is much better with this method and it can even be obtained with machine precision and to a constant (depending on the conditioning of \(X\)). However, to deal with particularly poorly conditioned situations, this « conditional » stability can quickly prove to be problematic, hence the use of the following iterative algorithm.
Note:
GSM is twice as efficient as a Householder method for obtaining a QR factorization of the initial \(X\) matrix. It only requires \(O({\mathrm{2nk}}^{2})\) operations (with \(n\) being the number of rows in the matrix).
14.4. Iterative Gram-Schmidt Algorithm (IGSM)#
To assure nevertheless of orthogonality to within machine precision, it is recommended to perform a second orthogonalization. And if, at the end of this last one, orthogonality is not assured, it is no longer worth starting again, the quantities manipulated are then certainly very similar and their differences oscillate around zero. This approach is based on a theoretical analysis by W.Kahan and taken up by B.N.Parlett [Par80] (cf. pp105-110).
Algorithm 2.3. Gram-Schmidt Iterative Kahan-Parlett Algorithm (IGSM) .
When using IRAM in Code_Aster, the \(\alpha\) criterion is set by the PARA_ORTHO_SOREN keyword (see §7.5). We show that its validity interval is \(\mathrm{[}1.2\varepsilon ,0.83\mathrm{-}\varepsilon \mathrm{]}\) with \(\varepsilon\) the machine precision and is generally given the value 0.717 (by default).
Notes:
The larger the value of the parameter, the less the reorthogonalization is triggered, but it affects the quality of the process.
Unlike the « homemade » version developed for Lanczos here, the stopping criteria focus on the norms of orthogonalized vectors rather than on inter-vector dot products, which are more likely to reflect rounding effects. This, combined with the removal of iterations greater than two, which are therefore useless, may explain the increased efficiency of the Kahan-Parlett version.
According to D.C.Sorensen, the authorship of this method can rather be attributed to J.Daniel (1976 paper submitted to Mathematics of Computation, vol.30, pp772-795).