1. Description of the method#
The multifrontal method is a direct method for solving linear systems, which is particularly suitable for systems with a sparse matrix. It factorizes matrices that are symmetric or not, which are not necessarily positive definite. In the most general case, the multifrontal method uses the Gauss method with the search for pivots.
The method implemented in Code_Aster factors real or complex matrices, symmetric or not (a non-symmetric matrix is accepted as long as its structure remains symmetric), and uses the algorithm called \({\mathit{LDL}}^{T}\) or \(\mathit{L.U}\), and uses the algorithm called or, without looking for a pivot.
The resolution of a linear system is carried out in three steps:
renumbering the unknowns,
factorization of the matrix,
descent/ascent.
In the case where several linear systems, with the same matrix, are to be solved, only the descences/ascents are to be carried out. Likewise, if several matrices of the same structure are to be factored, the renumbering of the unknowns will not have to be done again. This preliminary phase, which will ensure the performance of factorization, has a significant cost, however its relative weight (in terms of calculation time), decreases with the size of the matrices to be factored.
We will follow the following plan:
presentation of the classical \({\mathrm{LDL}}^{T}\) method adapted to full symmetric matrices. Notion of elimination,
extension of the method to sparse matrices, concept of filling,
presentation of the multi-frontal method.
1.1. Method LDLT for full matrices#
Either
, an invertible matrix, we know that there is a lower triangular matrix \(L\) with a unit diagonal and an upper triangular matrix \(U\), such as \(A=L\cdot U\).
The order of these matrices will be \(n\).
Si
is symmetric, this decomposition can be written as:
\({A=\mathrm{LDL}}^{T}\) eq 1.1-1
where
is a diagonal matrix and \({L}^{T}\) the matrix transposed from \(L\), to unit diagonal.
Let \((i,j)\in {[\mathrm{1,}n]}^{2}\) be; from [éq1.1-1] we deduce the expression for the coefficients of \(L\) and \(D\). In fact we can write:
\({A}_{\text{ij}}=\sum _{k=1}^{n}\sum _{l=1}^{n}{L}_{\text{ik}}{D}_{\text{kl}}{L}_{\text{lj}}^{T}\) eq 1.1-2
\(L\) being triangular less than unit diagonal, and \(D\) being diagonal, [éq1.1-2] becomes:
\({A}_{\text{ij}}=\sum _{k=1}^{\text{min}(i,j)}{L}_{\text{ik}}{L}_{\text{jk}}{D}_{\text{kk}}\) eq 1.1-3
There are many ways to calculate the coefficients of
and
starting with [éq1.1-3]. We will see the so-called « per line » method usually used in the Code_Aster, and the « by column » method used in the multifrontal method.
Line method
The lines of \(L\) and \(D\) are calculated jointly one after the other. Assume these lines known up to order \((i-1)\), and also assume that the coefficients in line \(i\) are known up to order \(j-1\); [éq1.1-3] is written, with \(j<i\):
\({A}_{\text{ij}}=\sum _{k=1}^{j-1}{L}_{\text{ik}}{L}_{\text{jk}}{D}_{\text{kk}}+{L}_{\text{ij}}{D}_{\text{jj}}{L}_{\text{jj}}\) eq 1.1-4
We thus have:
\({L}_{\text{ij}}=\) eq 1.1-5
and in a similar way:
\({D}_{\text{ii}}={A}_{\text{ii}}-\sum _{k=1}^{j-1}{L}_{\text{ik}}{L}_{\text{ik}}{D}_{\text{kk}}\) eq 1.1-6
With this method, each coefficient is calculated in one once (operation [éq1.1-5]), by fetching the coefficients previously calculated, and by doing the dot product of \(({L}_{\text{jk}},k=\mathrm{1,}j-1)\) and \(({T}_{\text{k}},k=\mathrm{1,}j-1)\), with \({T}_{k}={L}_{\mathrm{ik}}\mathrm{.}{D}_{\mathrm{kk}}\)
This algorithm is illustrated by [Figure1.1-a].
Figure 1.1-a
Column method
Let’s look at the following figure [Figure1.1-b]:
Figure 1.1-b
We will see this in detail later (Cf. Figure 1.3-g).
Assume that column \(j\), i.e. the terms \({D}_{\text{jj}}\) and \(({L}_{\text{ij}},i=j+\mathrm{1,}n)\), are known and perform the following algorithm:
\(\mid \begin{array}{cc}\text{pour}i=j+1\text{à}n,\text{faire}& \text{éq 1.1-7}\\ \text{}\begin{array}{}{D}_{\text{ii}}={D}_{\text{ii}}-{L}_{\text{ij}}^{2}{D}_{\text{jj}}\\ \mid \begin{array}{}\text{pour}k=i+1\text{à}n,\text{faire}\\ \text{}{L}_{\text{ki}}={L}_{\text{ki}}-{L}_{\text{kj}}{L}_{\text{ij}}{L}_{\text{jj}}\text{(saxpy)}\end{array}\end{array}& \text{éq 1.1-8}\end{array}\)
Let us make three remarks:
operations [éq1.1-7] are called the removal of the unknown \(j\). In fact, after [éq1.1-7], we will never call again the terms of column \(j\) in the rest of the algorithm. The column method is sometimes referred to as the « looking forward method »; as soon as they are calculated, the terms in the matrix act on the following terms. On the other hand, line methods are called « looking backward methods »; we will look for the terms previously calculated for each new calculation,
the [éq1.1-7] operation is a « saxpy » operation, the product of the constant \({L}_{\text{ij}}\cdot {D}_{\text{jj}}\) and the vector \(({L}_{\text{kj}},k=i+\mathrm{1,}n)\) is subtracted from the vector \(({L}_{\text{ki}},k=i+\mathrm{1,}n)\),
having done [éq1.1-7] for \(j\) fixed, let’s see what’s left to do to know column \((j+1)\),
\({D}_{\text{j+1,j+1}}\) is known (it can be easily verified),
simply divide column \(({L}_{\text{k,j+1}},k=j+\mathrm{2,}n)\) by \({D}_{\text{j+1,j+1}}\) to get the final value of this column.
(The definitive \({L}_{\text{ki}}\) terms and their programming name, which contains the values of \({L}_{\text{ki}}\) modified during the kills, have been misunderstood above.)
We therefore deduce the general factorization algorithm \({\mathrm{LDL}}^{T}\) by columns.
\(\mid \begin{array}{cc}\text{pour}j=1\text{à}n,\text{faire}& \\ & \\ \text{}\mid \begin{array}{}\text{pour}k=i+1\text{à}n,\text{faire}\\ \text{}{L}_{\text{kj}}={L}_{\text{kj}}/{D}_{\text{jj}}\end{array}& \text{/normalisation/}\\ & \\ \text{}\mid \begin{array}{}\text{pour}i=j+1\text{à}n,\text{faire}\\ \text{}{D}_{\text{ii}}={D}_{\text{ii}}-{L}_{\text{ij}}^{2}{D}_{\text{jj}}\\ \text{}\mid \begin{array}{}\text{pour}k=i+1\text{à}n,\text{faire}\\ \text{}{L}_{\text{ki}}={L}_{\text{ki}}-{L}_{\text{kj}}{L}_{\text{ij}}{L}_{\text{jj}}\end{array}\end{array}& \begin{array}{}\text{/élimination/}\\ \\ \\ \text{/saxpy/}\end{array}\end{array}\) eq 1.1-9
Before moving on to the concept of filling, it is appropriate to make a useful remark now for the future. If we look at [éq1.1-9], it appears that we can eliminate the unknown
, even if the terms \(({L}_{\text{ki}},k=i+\mathrm{1,}n)\) and \({D}_{\text{ii}}\) are not yet available. In fact, all you have to do is keep the terms \((-{L}_{\text{kj}}{L}_{\text{ij}}{D}_{\text{jj}})\), and add them then to the terms \({L}_{\text{ki}}\). These terms \(({L}_{\text{kj}}{L}_{\text{ij}}{D}_{\text{jj}})\), \(i\) varying from \(j+1\) to \(n\) and \(k\) from \(i+1\) to \(n\) form a matrix associated with the elimination of the unknown \(j\), which will later be called the frontal matrix \(j\).
1.2. Hollow matrix and filling#
It should be noted that if the initial matrix includes zero terms, the successive removals cause filling, that is to say that some terms \({L}_{\text{ki}}\) are different from zero while the initial term \({A}_{\text{ki}}\) is.
Suppose that before the elimination of the unknown \(j\), the term \({L}_{\text{ki}}\) is zero; if \({L}_{\text{kj}}\) and \({L}_{\text{ij}}\) are both not zero, [éq1.1-9] shows that \({L}_{\text{ki}}\) will also become non-zero. This filling has a graphic interpretation. Assume in this case that all the unknowns are represented by the nodes of a graph: we will connect the nodes \(k\) and \(i\) if and only if the term \({A}_{\text{ki}}\) of the initial matrix is non-zero. If \({A}_{\text{ki}}\) sucks with \(i\) linked to \(j\) and \(k\) linked to \(j\), the elimination from the graphical point of view of \(j\) will consist in linking \(i\) and \(k\) then.
Figure [Figure1.2-a] illustrates this interpretation. The dashed edges are the ones created by eliminating \(j\). They correspond to new non-zero terms from matrix \(L\).
Figure 1.2-a: Eliminating node j connects all of its neighbors together
1.3. Multifrontal method#
The multifrontal method is a direct method, of the Gauss type, which aims to exploit as much as possible the trough of the matrix to be factored. On the one hand, it seeks to minimize filling by using optimal renumbering, on the other hand, it extracts information from the sparse structure of the matrix that allows it to remove (cf. page 5 note (1)) unknowns independently of each other.
Let’s look at the simple case in [Figure1.3-A], where the matrix has a single zero term, \({A}_{\text{21}}\).
Figure 1.3-a
Column 2 of
does not suffer the effects of eliminating the unknown1, because the coefficient
(see [éq1.1-9] seen earlier). Contributions to columns 3 and 4 of
, unknowns 1 and 2 are independent of their order of elimination (you have to look at [éq1.1‑9] in detail). From this observation, we can introduce the concept of elimination tree presented by I.Duff [bib1].
The matrix elimination tree
can be represented by figure [Figure 1.3-b].
Figure 1.3-b
This tree structure contains two concepts:
the independence of certain eliminates (here the variables 1 and 2), which will lead to a possible parallelism of operations,
the minimization of the operations to be performed, (we can see on the tree structure that the term
is not to be calculated).
Given a hollow matrix, whose filling is known, the elimination tree can be constructed as follows:
all the leaves of the tree (lower extremities) correspond to the unknowns
, such as
to
unto
. Here 1 is good on a sheet because there is no step
to
, 2 is also a sheet because
,
a knot
A for father
, yes
is the lowest line number such as
. Here, 3 is the father of 1 and 2.
Notes:
From now on we use the vocabulary of graph theory, tree, leaf, node… Here the tree is turned over, its leaves are at the bottom,
refer to [bib1] for more details and demonstrations of the validity of the method,
in the example above, the order of elimination between the unknowns (3) and (4) is fixed by the initial numbering, we could have swapped the rows and columns of the matrix and had 4 as the father of 1 and 2,
it should be noted that the manufacture of this tree must take into account the non-zero terms obtained by filling during elimination. (More details on this topic can be found in [bib2]). It is not possible to manufacture the elimination shaft solely from the initial hollow matrix: it is also necessary to know the filling terms as already mentioned above. The numerical factorization of the multifrontal method is preceded by an important phase: the simulation of the eliminates and thus the creation of non-zero terms. This simulation is also called logical or symbolic elimination. This simulation takes place during the first phase: renumbering. We will see the four phases of the multifrontal method, the most important being the first and the fourth.
First phase: The renumbering of the « minimum degree ».
The first purpose of this renumbering is to minimize the number of operations to be performed during factorization. To do this, we simulate the elimination of nodes and we choose as candidate for elimination the node in the graph with the lowest number of neighbors. Here we use the concept of graph seen in figure [Figure 1.2-a]. The initial sparse matrix defines the graph we are working on. The latter is then updated each time nodes are eliminated (link creation). The simulation of filling makes it possible to achieve the second goal, which is the manufacture of the elimination shaft seen in the previous paragraph. The third goal achieved is the creation of super‑nodes. This is an important concept that we are going to develop.
A supernode (SN) is composed of the set of nodes that, during the elimination, have the same neighbors in the sense of the elimination graph. During the elimination simulation, it is detected that, for example, the nodes
,
and
are:
on the one hand, neighbors between them (the terms
,
,
… are not zero),
on the other hand, they have nodes as common neighbors:
(see [Figure1.3-c]).
They then form the supernode
and will be eliminated all together during numerical factorization.
MM-F is a block factorization method, when it uses the concept of « supernode ».
This concept has the following double advantage:
it reduces the (not negligible) cost of renumbering,
it reduces the cost of factorization by grouping the calculations (use of linear algebra routines such as blas-2 or blas-3).
We can see on [Figure1.3-c] the structure of the columns
(non-zero terms in gray) in a sparse virtual global matrix. In fact, such a hollow matrix is never assembled. We work in local full matrices that are called frontal matrixs. We have a frontal matrix per SN (We will review this concept in the « Numerical factorization » paragraph).
Figure 1.3-c
In summary, the first phase of the multifrontal method (renumbering by the « minimum degree ») consists of the following three actions:
a renumbering of the unknowns to be eliminated, in order to minimize filling. Eliminations are simulated by updating the lists of node neighbors each time,
in conjunction with the previous action, the elimination tree is constructed, and
we detect « supernodes », which can be described as an equivalence class for the relationship « have the same neighbors ».
Note:
The elimination tree provided by the renumbering is expressed in terms of supernodes, because the numerical elimination to follow will be done by supernodes.
Second phase: Symbolic factorization
This phase is not as fundamental as the previous one. It is a technical phase intended to build certain pointers. In particular, it is the tables of global and local indices that establish the correspondence between the unknowns when assembling the front matrices.
Third phase: The execution sequence
It is also a technical phase. We saw in the previous paragraph that the method consisted in going through the elimination tree, performing an elimination at each node of the tree. The result of this elimination is a frontal matrix. The order of this matrix is the number of neighbors of the eliminated SN. The storage of front-end matrices is expensive in terms of memory occupancy.
The frontal matrix
(result of the elimination of
) will be assembled in the front-end matrix of node \(i\), where \(i\) is the father of
. We will see this phase in more detail during numerical factorization). All front-end matrices should be retained, until they are used when eliminating the « SNpère ». We can then store the « SNpère » matrix in place of the « SNfils » matrix.
There are several ways to walk through the tree while respecting the constraint: « the son must be eliminated before the father. » The object of the third phase is to find the order of navigation that minimizes the memory space required to organize the front matrices ([bib2], page2.12).
Fourth phase: Numerical factorization
This phase is the effective factorization, that is to say the calculation of the matrices \(L\) and \(D\). Subsequently, we will confuse these two matrices and from a computer point of view \(D\) will be seen as the diagonal terms of \(L\).
Numerical factorization consists in going through the elimination tree; for each « Supernode », we perform:
the assembly, in the « mother » frontal matrix, of the « daughter » frontal matrices,
the elimination of supernode columns.
Let’s see the following example with the graph and the elimination tree for [Figure1.3-d] (the nodes 8,9, 10 are not shown in the drawing).
Figure 1.3-d: Example of a graph and elimination tree
In brackets, we read the numbers of the unknown numbers of the SN.
The elimination of SN1 consists of:
the assembly of columns \(1\) and \(2\) of the initial matrix, in a working matrix, called frontal matrix before elimination. This matrix is of order \(5\), linked to the unknowns \((\mathrm{1,2}\mathrm{,4}\mathrm{,5}\mathrm{,6})\). (Because \(1\) and \(2\) are linked to \((\mathrm{4,5}\mathrm{,6})\) only),
the elimination of SN1 (from columns \(1\) and \(2\), according to the formulas [éq1.1-7] and [éq1.1‑8] seen previously),
the order of the two columns \(1\) and \(2\) of the matrix, in a table FACTOR, which contains the columns of the global factored matrix,
the order of the front matrix \(1\) of order \(3\) linked to the unknowns \((\mathrm{4,5}\mathrm{,6})\).
We do the same thing with SN2.
These two eliminates are illustrated by figure [Figure 1.3-e], where we see in gray with hatching the two front matrices.
Figure 1.3-e
The elimination of SN3 consists of:
the addition of columns \((\mathrm{4,5}\mathrm{,6}\mathrm{,7})\) of the initial matrix, to the work matrix of order 7, linked to the unknowns \((\mathrm{4,5}\mathrm{,6}\mathrm{,7}\mathrm{,8}\mathrm{,9}\mathrm{,10})\),
the assembly of the front matrices \(1\) and \(2\) in this working matrix,
the elimination of columns \((\mathrm{4,5}\mathrm{,6}\mathrm{,7})\),
obtaining four additional columns in FACTOR,
obtaining the front matrix \(3\) of order \(3\) (columns \(\mathrm{8,9}\mathrm{,10}\)).
Note that the front matrix \(3\) can be stored in place of the front matrices \(1\) and \(2\). The storage of these matrices requires a stack structure where one stacks at the end of a disposal, and where one unstacks during assembly. It is the maximum length of this stack that is minimized during the execution sequence phase.
Figure [Figure1.3-f] illustrates the elimination of SN3.
Figure 1.3-f: Elimination of SN3
Note that the coefficient \({L}_{\text{74}}\) (white square in figure [Figure1.3-f]), comes from the elimination of SN3, and that the initial term \({A}_{\text{74}}\) is zero (dotted link in figure [Figure 1.3-d]).
We saw in paragraph 1.1 that the elimination of a column consisted of a « saxpy » operation, adding to a vector the product of a vector by a scalar.
It is easy to see that the elimination of a supernode, a group of columns, consists of a « matrix-vector product » operation. These operations consume most of the computation time and the work of factorizing the matrix. They are carried out by sub-programs from library BLAS, provided after having been optimized on most computers. Figure [Figure 1.3-g] shows the column
updated by the product of the matrix \(A\left[j+1:n,1:m\right]\) and the vector \(A\left[j,1:m\right]\).
The factorization being complete, the factored matrix is available in the form of a compact table of non-zero terms only. It is a « walrus » type of storage. The global index table mentioned in the symbolic factorization indicates, for each column of \(L\), the row numbers of each stored coefficient.
Figure 1.3-g: Updating the column
by a matrix*vector product
1.4. Descent - Lift#
Since the columns of \(L\) are stored in a compact manner, the descent is of the « saxpy » type and the ascent is of the scalar product type, these two operations being both indexed. That is to say, the descent algorithm is roughly the following, (having previously initialized
by the second member of the system):
\(\begin{array}{}\mid \begin{array}{}\text{pour}i=1\text{à}n\text{faire}\\ \text{}\mid \begin{array}{}\text{pour}k\in \text{colonne}i\text{faire}\\ x(\mathrm{global}(k))=x(\mathrm{global}(k))-{L}_{\text{ki}}\times x(i)\text{}(\mathrm{saxpy}\mathrm{indexé})\end{array}\end{array}\\ \mid \begin{array}{}\text{pour}i=1\text{à}n\text{faire}\\ \text{}x(i)=x(i)/{D}_{\text{ii}}\end{array}\end{array}\)
The ascent, on the other hand, is written in the form:
\(\mid \begin{array}{}\text{pour}i\text{variant}\mathrm{de}r\text{à}1\text{faire}\\ \text{}x(i)=x(i)-\sum _{k\in \mathrm{colonne}i}{L}_{\text{ik}}\cdot x(\mathrm{global}(k))\text{}:\text{}\text{produit scalaire indexé}\end{array}\)