Deterministic computation of the characteristic polynomial in the time of matrix multiplication
Vincent Neiger
. . . U. Limoges, FranceClément Pernet
. . . Univ Grenoble Alpes, FranceJournées Nationales de Calcul Formel 2021
Luminy, France (online), March 2, 2021
Outline
• Context, problem, state of the art
• Obstacles, overview of the approach
• Complexity, Spin-off results
Context, problem, state of the art
Outline
• Context, problem, state of the art
• Obstacles, overview of the approach
• Complexity, Spin-off results
Context, problem, state of the art
Context
• field K , algebraic complexity (counting operations in K )
• ω: exponent of MatMul over K : m × m by m × m in O(m
ω) Reductions: of most problems to matrix multiplication
Rank Det
PLUQ CharPoly
MinPoly
TRSM
MatMul LinSys
Inverse
× log m
LinSys Det Rank PLUQ T RSM Inverse
= O(MatMul)
MatMul = O(
Det , PLUQ , CharPoly ,
Inverse )
CharPoly = O(MatMul) ?
Context, problem, state of the art
Context
• field K , algebraic complexity (counting operations in K )
• ω: exponent of MatMul over K : m × m by m × m in O(m
ω) Reductions: of most problems to matrix multiplication
Rank Det
PLUQ CharPoly
MinPoly
TRSM
MatMul LinSys
Inverse
× log m
LinSys Det Rank PLUQ T RSM Inverse
= O(MatMul)
MatMul = O(
Det , PLUQ , CharPoly ,
Inverse )
CharPoly = O(MatMul) ?
Context, problem, state of the art
Context
• field K , algebraic complexity (counting operations in K )
• ω: exponent of MatMul over K : m × m by m × m in O(m
ω) Reductions: of most problems to matrix multiplication
Rank Det
PLUQ CharPoly
MinPoly
TRSM LinSys
Inverse
LinSys Det Rank PLUQ T RSM Inverse
= O(MatMul)
MatMul = O(
Det , PLUQ , CharPoly , )
CharPoly = O(MatMul) ?
Context, problem, state of the art
Context
• field K , algebraic complexity (counting operations in K )
• ω: exponent of MatMul over K : m × m by m × m in O(m
ω) Reductions: of most problems to matrix multiplication
Rank Det
PLUQ CharPoly
MinPoly
TRSM
MatMul LinSys
Inverse
LinSys Det Rank PLUQ T RSM Inverse
= O(MatMul)
MatMul = O(
Det , PLUQ , CharPoly ,
Inverse )
CharPoly = O(MatMul) ?
Context, problem, state of the art
Problem and result
Characteristic polynomial. . .
given M ∈ K
m×m, compute det (x I
m− M ) ∈ K [x]
• deterministic, general: O(m
ωlog (m)) [Keller-Gehrig 1985]
• deterministic, generic input: O(m
ω) [Giorgi & Jeannerod & Villard 2003]
• randomized, general: O(m
ω) [P. & Storjohann 2007]
. . . in the time of matrix multiplication
Deterministic charpoly algorithm in O(m ω )
using any MatMul algorithm in O(m
ω) with 2 < ω 6 3
Context, problem, state of the art
Motivation
Remark
In theory, for all ω, CharPoly = O(m
ω) is achieved by running
• any O(m
ωlog m) algorithm (like Keller-Gehrig’s)
• using an O(m
ω−ε) MatMul
More relevant model
Rely on a given MatMul algorithm, and reach the same exponent
• No cheating w.r.t. the presence of log
• Underlying hard question:
“How to compute an object related to a Krylov iteration without log ?”
In practice:
• Only a few subcubic time MatMul are available (mainly Strassen’s)
• [P. Storjohann’07] randomized algorithm inefficient with small fields
Context, problem, state of the art
Motivation
Remark
In theory, for all ω, CharPoly = O(m
ω) is achieved by running
• any O(m
ωlog m) algorithm (like Keller-Gehrig’s)
• using an O(m
ω−ε) MatMul
More relevant model
Rely on a given MatMul algorithm, and reach the same exponent
• No cheating w.r.t. the presence of log
• Underlying hard question:
“How to compute an object related to a Krylov iteration without log ?”
In practice:
• Only a few subcubic time MatMul are available (mainly Strassen’s)
• [P. Storjohann’07] randomized algorithm inefficient with small fields
Context, problem, state of the art
Motivation
Remark
In theory, for all ω, CharPoly = O(m
ω) is achieved by running
• any O(m
ωlog m) algorithm (like Keller-Gehrig’s)
• using an O(m
ω−ε) MatMul
More relevant model
Rely on a given MatMul algorithm, and reach the same exponent
• No cheating w.r.t. the presence of log
• Underlying hard question:
“How to compute an object related to a Krylov iteration without log ?”
In practice:
• Only a few subcubic time MatMul are available (mainly Strassen’s)
• [P. Storjohann’07] randomized algorithm inefficient with small fields
Context, problem, state of the art
Charpoly via linear algebra over K
Traces of Powers: [LeVerrier1840] [Faddeev’49, Souriau’48, ...] , O(m
4) or O(m
ω+1) used by [Csanky’75] to prove NC
2membership.
Determinant expansion: [Samuelson’42, Berkowitz’84] O(m
4) suited for division free algorithms with later developments in
[Abdlejaoued and Malaschonok’01, Kaltofen and Villard’05]
Krylov methods: [Danilevskij’37,Keller-Gehrig’85, P. and Storjohann’07]
• Deterministic O(m
3) or O(m
ωlog m)
• Generic O(m
ω)
• Las-Vegas probabilistic for large fields (|K| > 2 m
2) O(m
ω)
Context, problem, state of the art
Charpoly via linear algebra over K [x]
Determinant of a matrix A ∈ K [x] m×m of degree d d = 1
Evaluation-Interpolation: [folklore] O(m
ω+1)
Diagonalization: Smith Form [Storjohann 2003] O(m
ωlog (m)
2) Las Vegas randomized + additional logs for small fields
Triangularization:
• Generic: [Giorgi-Jeannerod-Villard 2003] : O(m
ω) diagonal of Hermite form must be 1, . . . , 1, det ( A )
• [Neiger-Labahn-Zhou 2017] via triangularization: O(m
ω) logarithmic factors in m and d
For polynomials of degree 6 d in K [x]:
PolMul in 6 M (d) f.ops. GCD in 6 M
0(d) ∈ O( M (d) log (d)) f.ops.
Context, problem, state of the art
Where do the log come from?
Explicit Krylov iteration: to compute a Krylov basis v Av . . . A m− 1 v
→ Fast matrix exponentiation: log m × O(m ω )
Polynomial matrix determinants: If Divide and conquer is applied
• on matrix dimension → no log (m)
• on degree (Polynomial arithmetic) → no log (m)
• on both dimension and degrees:
→ manageable if the total degree remains controlled. Hard case: long Krylov chains with small dimension drop
→ reminiscent of the failure to de-randomize [P. Storjohann’07]
Context, problem, state of the art
Where do the log come from?
Explicit Krylov iteration: to compute a Krylov basis v Av . . . A m− 1 v
→ Fast matrix exponentiation: log m × O(m ω ) Polynomial matrix determinants: If Divide and conquer is applied
• on matrix dimension → no log (m)
• on degree (Polynomial arithmetic) → no log (m)
• on both dimension and degrees:
→ manageable if the total degree remains controlled.
Hard case: long Krylov chains with small dimension drop
→ reminiscent of the failure to de-randomize [P. Storjohann’07]
Obstacles, overview of the approach
Outline
• Context, problem, state of the art
• Obstacles, overview of the approach
• Complexity, Spin-off results
Obstacles, overview of the approach
Partial block triangularization
[Giorgi-Jeannerod-Villard 2003, Zhou 2012, Neiger-Labahn-Zhou 2017]
Triangularization of m × m matrix A using m/ 2 × m/ 2 blocks ∗ ∗
K 1 K 2 A 1 A 2 A 3 A 4
=
R ∗
0 B
K 1 A 2 + K 2 A 4 row basis of [ A A
13
] kernel basis of [ A A
13
]
not computed
Property: det ( A ) = det ( R ) det ( B )
Obstacles, overview of the approach
Partial block triangularization
[Giorgi-Jeannerod-Villard 2003, Zhou 2012, Neiger-Labahn-Zhou 2017]
Triangularization of m × m matrix A using m/ 2 × m/ 2 blocks ∗ ∗
K 1 K 2 A 1 A 2 A 3 A 4
=
R ∗
0 B
K 1 A 2 + K 2 A 4 row basis of [ A A
13
] kernel basis of [ A A
13
]
not computed
Property: det ( A ) = det ( R ) det ( B )
Obstacles, overview of the approach
Generic case without log factor
[Giorgi-Jeannerod-Villard 2003, Zhou 2012, Neiger-Labahn-Zhou 2017]
Triangularization of m × m matrix A using m/ 2 × m/ 2 blocks ∗ ∗
K 1 K 2 A 1 A 2 A 3 A 4
=
R ∗
0 B
K 1 A 2 + K 2 A 4 row basis of [ A A
13
] kernel basis of [ A A
13
]
not computed
Property: det ( A ) = det ( R ) det ( B )
Generic input ⇒ det ( A ) without log (m) [Giorgi-Jeannerod-Villard 2003]
A
1and A
3are coprime ⇒ R = I
m/2⇒ det ( A ) = det ( B )
• Compute kernel [ K
1K
2]; deduce B by MatMul O(m
ωM
0(d))
• Recursively, compute det ( B ), return it
A and [ K
1K
2] have degree d ⇒ B has degree 2 d: controlled total degree
total cost O(m
ωM
0(d) + (m/ 2 )
ωM
0( 2 d) + · · · + M
0(md)) ⊂ O(m
ωM
0(d))
Obstacles, overview of the approach
General case with log factor
[Giorgi-Jeannerod-Villard 2003, Zhou 2012, Neiger-Labahn-Zhou 2017]
Triangularization of m × m matrix A using m/ 2 × m/ 2 blocks ∗ ∗
K 1 K 2 A 1 A 2 A 3 A 4
=
R ∗
0 B
K 1 A 2 + K 2 A 4 row basis of [ A A
13
] kernel basis of [ A A
13
]
not computed
Property: det ( A ) = det ( R ) det ( B ) Matrix degree not controlled: degree of B up to D = P
rdeg ( A ) 6 md but controlled average row degree: at most m D
General input ⇒ det ( A ) in O(m
ω Dm) [Labahn-Neiger-Zhou 2017]
Obstacles, overview of the approach
Be lazy: if hard to compute, don’t compute
[Giorgi-Jeannerod-Villard 2003, Zhou 2012, Neiger-Labahn-Zhou 2017]
Triangularization of m × m matrix A using m/ 2 × m/ 2 blocks
∗ ∗
K 1 K 2 A 1 A 2
A 3 A 4
=
R ∗
0 B
K 1 A 2 + K 2 A 4 row basis of [ A A
13
] kernel basis of [ A A
13
]
not computed
Property: det ( A ) = det ( R ) det ( B )
Obstacle: removing log factors in row basis computation
⇒ solution: remove row basis computation
I m/ 2 0 K 1 K 2
A 1 A 2 A 3 A 4
=
A 1 A 2
0 B
Property: det ( A ) = det ( A 1 ) det ( B )/ det ( K 2 )
Obstacles, overview of the approach
Further obstacles (brought by laziness)
I m/ 2 0 K 1 K 2
A 1 A 2 A 3 A 4
=
A 1 A 2
0 B
Property: det ( A ) = det ( A 1 ) det ( B )/ det ( K 2 )
no log (m) in the computation of A
1, B , K
2requires nonsingular A
1, otherwise det ( K
2) = 0
3 recursive calls in matrix size m/ 2 is , but requires P
rdeg ( A
1) 6 D/ 2 otherwise degree control is too weak. (this implies P
rdeg ( K
2) 6 D/ 2 )
Solution: require A in weak Popov form
(the characteristic matrix A = x I
m− M is in Popov form)
implies A
1nonsingular and P
rdeg ( A
1) 6 D/ 2 up to easy transformations both A
1and B are also in weak Popov form ⇒ suitable for recursive calls K
2is in “shifted reduced” form. . . find weak Popov P with same determinant
P = transpose of the − rdeg
rdeg(A4)( K
2)-Popov form of K
T2Obstacles, overview of the approach
Further obstacles (brought by laziness)
I m/ 2 0 K 1 K 2
A 1 A 2 A 3 A 4
=
A 1 A 2
0 B
Property: det ( A ) = det ( A 1 ) det ( B )/ det ( K 2 )
no log (m) in the computation of A
1, B , K
2requires nonsingular A
1, otherwise det ( K
2) = 0
3 recursive calls in matrix size m/ 2 is , but requires P
rdeg ( A
1) 6 D/ 2 otherwise degree control is too weak. (this implies P
rdeg ( K
2) 6 D/ 2 )
Solution: require A in weak Popov form
(the characteristic matrix A = xI
m− M is in Popov form)
implies A
1nonsingular and P
rdeg ( A
1) 6 D/ 2 up to easy transformations both A
1and B are also in weak Popov form ⇒ suitable for recursive calls K
2is in “shifted reduced” form. . . find weak Popov P with same determinant
P = transpose of the − rdeg
rdeg(A4)( K
2)-Popov form of K
T2Obstacles, overview of the approach
Further obstacles (brought by laziness)
I m/ 2 0 K 1 K 2
A 1 A 2 A 3 A 4
=
A 1 A 2
0 B
Property: det ( A ) = det ( A 1 ) det ( B )/ det ( K 2 )
no log (m) in the computation of A
1, B , K
2requires nonsingular A
1, otherwise det ( K
2) = 0
3 recursive calls in matrix size m/ 2 is , but requires P
rdeg ( A
1) 6 D/ 2 otherwise degree control is too weak. (this implies P
rdeg ( K
2) 6 D/ 2 )
Solution: require A in weak Popov form
(the characteristic matrix A = xI
m− M is in Popov form)
implies A
1nonsingular and P
rdeg ( A
1) 6 D/ 2 up to easy transformations
Complexity, Spin-off results
Outline
• Context, problem, state of the art
• Obstacles, overview of the approach
• Complexity, Spin-off results
Complexity, Spin-off results
Complexity
C (m , D) 6 2C m 2 , D
2
+ C m 2 , D
+O
m
ωM
0D m
6 O
m
ωM
0D m
where: M
0(d) = GCD (d) ∈ O( PolMul (d) log (d))
D
m
=
degdetm= avg row degree
m , D
m 2 , D m
2 , b D 2 c
m 4 , D m
4 , b D 2 c m 4 , b D 4 c
m 8 , D m
8 , b D 2 c m 8 , b D 4 c m 8 , b D 8 c
1, D . . . 1, b D
2
jc . . . 1, b 2 D
µc
1
1 2
1 4 4
1 6 12 8
1 2
j µj2
µO(m
log23) O(m
ωM
0(
mD)2
−3ε) O(m
ωM
0(
mD)2
−2ε) O(m
ωM
0(
mD)2
−ε)
O(m
ωM
0(
Dm))
1 2
1 2 2 4
1 2 4 8 4 8
Complexity, Spin-off results
Complexity
C (m , D) 6 2C m 2 , D
2
+ C m 2 , D
+O
m
ωM
0D m
6 O
m
ωM
0D m
where: M
0(d) = GCD (d) ∈ O( PolMul (d) log (d))
D
m
=
degdetm= avg row degree
m , D
m 2 , D m
2 , b D 2 c
m 4 , D m
4 , b D 2 c m 4 , b D 4 c
m 8 , D m
8 , b D 2 c m 8 , b D 4 c m 8 , b D 8 c
1, D . . . 1, b D c . . . 1, b D c
1
1 2
1 4 4
1 6 12 8
1 2
j µj2
µO(m
log23) O(m
ωM
0(
mD)2
−3ε) O(m
ωM
0(
mD)2
−2ε) O(m
ωM
0(
Dm)2
−ε)
O(m
ωM
0(
Dm))
1 2
1 2 2 4
1 2 4 8 4 8
Complexity, Spin-off results
Complexity
C (m , D) 6 2C m 2 , D
2
+ C m 2 , D
+O
m
ωM
0D m
6 O
m
ωM
0D m
where: M
0(d) = GCD (d) ∈ O( PolMul (d) log (d))
D
m
=
degdetm= avg row degree
m , D
m 2 , D m
2 , b D 2 c
m 4 , D m
4 , b D 2 c m 4 , b D 4 c
m 8 , D m
8 , b D 2 c m 8 , b D 4 c m 8 , b D 8 c
1
1 2
1 4 4
1 6 12 8
1 2
j µj2
µO(m
log23) O(m
ωM
0(
mD)2
−3ε) O(m
ωM
0(
mD)2
−2ε) O(m
ωM
0(
Dm)2
−ε)
O(m
ωM
0(
Dm))
1 2
1 2 2 4
1 2 4 8 4 8
Complexity, Spin-off results
Complexity
C (m , D) 6 2C m 2 , D
2
+ C m 2 , D
+O
m
ωM
0D m
6 O
m
ωM
0D m
where: M
0(d) = GCD (d) ∈ O( PolMul (d) log (d))
D
m
=
degdetm= avg row degree
m , D
m 2 , D m
2 , b D 2 c
m 4 , D m
4 , b D 2 c m 4 , b D 4 c
m 8 , D m
8 , b D 2 c m 8 , b D 4 c m 8 , b D 8 c
1, D . . . 1, b D c . . . 1, b D c
1
1 2
1 4 4
1 6 12 8
1 2
j µj2
µO(m
log23) O(m
ωM
0(
mD)2
−3ε) O(m
ωM
0(
mD)2
−2ε) O(m
ωM
0(
Dm)2
−ε)
O(m
ωM
0(
Dm))
1 2
1 2 2 4
1 2 4 8 4 8
Complexity, Spin-off results
Complexity
C (m , D) 6 2C m 2 , D
2
+ C m 2 , D
+O
m
ωM
0D m
6 O
m
ωM
0D m
where: M
0(d) = GCD (d) ∈ O( PolMul (d) log (d))
D
m
=
degdetm= avg row degree
m , D
m 2 , D m
2 , b D 2 c
m 4 , D m
4 , b D 2 c m 4 , b D 4 c
m 8 , D m
8 , b D 2 c m 8 , b D 4 c m 8 , b D 8 c
1
1 2
1 4 4
1 6 12 8
O(m
ωM
0(
mD)2
−3ε) O(m
ωM
0(
mD)2
−2ε) O(m
ωM
0(
Dm)2
−ε)
O(m
ωM
0(
Dm))
1 2
1 2 2 4
1 2 4 8 4 8
Complexity, Spin-off results
Complexity
C (m , D) 6 2C m 2 , D
2
+ C m 2 , D
+O
m
ωM
0D m
6 O
m
ωM
0D m
where: M
0(d) = GCD (d) ∈ O( PolMul (d) log (d))
D
m
=
degdetm= avg row degree
m , D
m 2 , D m
2 , b D 2 c
m 4 , D m
4 , b D 2 c m 4 , b D 4 c
m 8 , D m
8 , b D 2 c m 8 , b D 4 c m 8 , b D 8 c
1, D . . . 1, b D c . . . 1, b D c
1
1 2
1 4 4
1 6 12 8
1 2
j µj2
µO(m
log23) O(m
ωM
0(
mD) 2
−3ε) O(m
ωM
0(
mD)2
−2ε) O(m
ωM
0(
Dm)2
−ε)
O(m
ωM
0(
mD))
Since: P µ
j= 0 µ j
= 2 µ for ε > 0 s.t. M (d) = O(d ω− 1 − )
1 2
1 2 2 4
1 2 4 8 4 8
Complexity, Spin-off results
Complexity
C (m , D) 6 2C m 2 , D
2
+ C m 2 , D
+O
m
ωM
0D m
6 O
m
ωM
0D m
where: M
0(d) = GCD (d) ∈ O( PolMul (d) log (d))
D
m
=
degdetm= avg row degree
m , D
m 2 , D m
2 , b D 2 c
m 4 , D m
4 , b D 2 c m 4 , b D 4 c
m 8 , D m
8 , b D 2 c m 8 , b D 4 c m 8 , b D 8 c
1
1 2
1 4 4
1 6 12 8
O(m
ωM
0(
mD) 2
−3ε) O(m
ωM
0(
mD)2
−2ε) O(m
ωM
0(
Dm)2
−ε)
O(m
ωM
0(
mD))
for ε > 0 s.t. M (d) = O(d ω− 1 − )
1 2
1 2 2 4
1 2 4 8 4 8
Complexity, Spin-off results
New polynomial matrix transformations
Weak Popov to Popov
Input: s ∈ Z
m, a shift,
A ∈ K [x]
m×m, a matrix in s-weak Popov form Output: the s-Popov form of A
Requirement: −s > pivotDegree ( A )
Complexity: O(m
ωM
0(
mD)) , where D = P s
Improvement and generalization of [Sarkar-Storjohann 2011, Section 4]
support nonzero shifts and involve average degree
Dm• problem viewed as a change of shift with a priori known output degrees
• introduction of partial linearization techniques for kernel bases
Reduced to weak Popov Input: s ∈ Z
n, a shift
A ∈ K [x]
m×n, a matrix in s-reduced form Output: an s-weak Popov form of A
Complexity: O(m
ω−1n(
mD+ 1 )), where D = P
rdeg
s( A ) − m min (s)
Easy extension of [Sarkar-Storjohann 2011, Section 3] to shifted forms
Complexity, Spin-off results
New polynomial matrix transformations
Weak Popov to Popov
Input: s ∈ Z
m, a shift,
A ∈ K [x]
m×m, a matrix in s-weak Popov form Output: the s-Popov form of A
Requirement: −s > pivotDegree ( A )
Complexity: O(m
ωM
0(
mD)) , where D = P s
Improvement and generalization of [Sarkar-Storjohann 2011, Section 4]
support nonzero shifts and involve average degree
Dm• problem viewed as a change of shift with a priori known output degrees
• introduction of partial linearization techniques for kernel bases
Reduced to weak Popov
Input: s ∈ Z
n, a shift
Summary and perspectives
Summary
• CharPoly = O( MatMul )
• Determinant of reduced polynomial matrices in O(m ω M 0 ( m D ))
• Fast transformations between shifted forms of polynomial matrices
D
m
=
degdetm= average row degree
Perspectives
• Implementation and practical efficiency (small fields, degenerate instances, . . . )
• Approach without fast polynomial arithmetic
→ Exploit the quasiseparable struct. of linearized polynomial matrices
• Frobenius normal form & Smith normal form
Summary and perspectives
Summary
• CharPoly = O( MatMul )
• Determinant of reduced polynomial matrices in O(m ω M 0 ( m D ))
• Fast transformations between shifted forms of polynomial matrices
D
m