Add tolerance to `krylov_schur` #3

Marius1311 · 2020-05-06T20:42:22Z

This adds the possibility to change the convergence threshold used internally by SLEPc. Default is 1e-8, which is SLEPc's default. It also relaxes the tolerance used for checking whether the rows of the memberships vectors sum to one.

This solves my problems with krylov_schur not returning invariant subspaces for ill-conditioned matrices.

Marius1311 · 2020-05-06T20:54:06Z

There are two other things that I don't really know where to discuss because I can't open issues here, so I will just mention them here.

method='scipy' fails for ill cond. matrices. That's because of the function smallest_eigenvalue. Since our matrices always have the largest eigenvalue 1, this automatically means that for ill-conditioned matrices, the smalles eigenvalue becomes extremly small - order of magnitude of 1/condition_number. That means that krylov_schur no longer converges here, it returns zero eigenvalues and that makes this method fail. I thought about ways to circumvent this and I think the easiest would be to just not look for the smalles eigenvalue, but rather just sort the full spectrum. This makes hardly any difference in terms of performance for large matrices (3.5k x 3.5k), see below - in fact, it will be much faster for every even slightly ill-conditioned matrix, since finding their smallest eigenvalue is either impossible or very difficult.
deviations between brandts and krylov_schur in terms of the returned final membership vectors grow with m, at least for my ill cond. matrix. I check for the maximum deviation between the membership matrices as a function of m. This is very small for m=3, O(1e-14). It then increases slightly until m=9 and reaches O(1e-9). Then, it increases crazily: for m=10, I suddenly get O(1e-2). I continued until m=25 and the maximum deviation fluctuates around a bit between O(1e-2) and O(1e-1). It decreases and then increases again. First I tought that decreasing the tolerance for krylov_schur could solve this, but it did not really have any effect. I don't have intuition to explain this, esp. the very sudden increase by 7 orders of magnitude. Do you @msmdev have any idea? Could this be caused by another part of GPCCA, or can it only be the schur decomposition? Could anything here be not very robust?

comparing `scipy.linalg.schur`, with and without sorting

No sorting:

 %timeit R, Q = scipy.linalg.schur(P)

9.57 s ± 447 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sorting:

%timeit R, Q, sdim = scipy.linalg.schur(P, sort='iuc')

10.1 s ± 694 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Within one standard deviation, these two overlap...

msmdev · 2020-05-07T07:47:57Z

There are two other things that I don't really know where to discuss because I can't open issues here, so I will just mention them here.

method='scipy' fails for ill cond. matrices. That's because of the function smallest_eigenvalue. Since our matrices always have the largest eigenvalue 1, this automatically means that for ill-conditioned matrices, the smalles eigenvalue becomes extremly small - order of magnitude of 1/condition_number. That means that krylov_schur no longer converges here, it returns zero eigenvalues and that makes this method fail. I thought about ways to circumvent this and I think the easiest would be to just not look for the smalles eigenvalue, but rather just sort the full spectrum. This makes hardly any difference in terms of performance for large matrices (3.5k x 3.5k), see below - in fact, it will be much faster for every even slightly ill-conditioned matrix, since finding their smallest eigenvalue is either impossible or very difficult.

Sounds fine in principle, but does scipy.linalg.schur(P, sort='iuc') really return all top eigenvalues sorted (sorted by largest magnitude or largest real part?)? I don't really trust scipy with sorting, after having a lot of troubles with scipy.linalg.schur(P, sort=lambda x: np.real(x) > cutoff) (at least with this option you can't sort the whole spectrum, but maximum all eigenvalues larger than the smallest - if the condition and the god of scipy lets you...).

deviations between brandts and krylov_schur in terms of the returned final membership vectors grow with m, at least for my ill cond. matrix. I check for the maximum deviation between the membership matrices as a function of m. This is very small for m=3, O(1e-14). It then increases slightly until m=9 and reaches O(1e-9). Then, it increases crazily: for m=10, I suddenly get O(1e-2). I continued until m=25 and the maximum deviation fluctuates around a bit between O(1e-2) and O(1e-1). It decreases and then increases again. First I tought that decreasing the tolerance for krylov_schur could solve this, but it did not really have any effect. I don't have intuition to explain this, esp. the very sudden increase by 7 orders of magnitude. Do you @msmdev have any idea? Could this be caused by another part of GPCCA, or can it only be the schur decomposition? Could anything here be not very robust?

First of all: What is ill-conditioned here (how du you caclulate it and what magnitude do you get?)?
Second: There are several places in GPCCA were ill-conditioning could make troubles - this is why I check at this places (at least I think I check...).
Third: I'm not even sure who is making the troubles here: brandts or krylov-schur...
Fourth: We really shouldn't work with really ill-conditioned matrices (anything above 1/eps (eps = machine precision) is a no-go...). In my experiments I even insured that the condition wasn't above 1e6, to ensure that linalg functions work properly. It's not GPCCA causing the ruckus here, but plain simple numpy or scipy.linalg...
If you are having ill-conditioned matrices all the time, we should tweak the way the transition matrices are constructed, imposing an acceptable condition...

comparing scipy.linalg.schur, with and without sorting

No sorting:
 %timeit R, Q = scipy.linalg.schur(P)
9.57 s ± 447 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sorting:
%timeit R, Q, sdim = scipy.linalg.schur(P, sort='iuc')
10.1 s ± 694 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Within one standard deviation, these two overlap...

msmdev · 2020-05-07T08:23:52Z

@marius have you tried dimension reduction before generating your transition matrix (e.g. via PCA or ICA and kmeans clustering or some more advanced clustering technique)?
A smaller and less sparse matrix is virtually always better conditioned...

Marius1311 · 2020-05-07T08:35:27Z

Hi Bernhard, thanks for getting back to me. There are quite a few things here so I would say let's discuss all of them during our skype tomorrow.

The easy change I can do right now is to change the tolerance used by krylov-schur to be 1e-16 by default.

msmdev · 2020-05-07T10:14:22Z

Just to remind myself: If method = 'brandts', we should ensure that after sorting the schur vectors in

msmtools/msmtools/util/sorted_schur.py

Line 489 in 914c0dd

Q, R, ap = sort_real_schur(Q, R, z=z, b=m)

the result is compared with the original schur vectors from

msmtools/msmtools/util/sorted_schur.py

Line 486 in 914c0dd

R, Q = schur(P, output='real')

to ensure that we are still in the invariant subspace...

Marius1311 · 2020-05-07T12:34:39Z

Right, if the method is brandts and we cut through a block of complex conjugates, the error I get is

m = 6: According to scipy.linalg.subspace_angles() X isn't an invariant subspace of P, since the subspace angles between the column spaces of P*X and X*R (resp. X, if you chose the Krylov-Schur method)aren't near zero. The subspace angles are: `[4.90140248e-03 2.39150618e-15 2.16468702e-15 1.87234927e-15
 1.71006757e-15 9.37945059e-16]`

and this doesn't tell me anything about cutting through a block of complex conjugetes being split.

Marius1311 · 2020-05-07T13:14:29Z

@msmdev , I have changed this PR to no longer expose the tolerance to the user but instead to change it to 1e-16 internally.

Marius1311 · 2020-05-11T09:53:41Z

Hi @msmdev , can we merge this or would you like me to change anything?

msmdev · 2020-05-11T10:10:12Z

Mb. you could change the sorting criterion in sorted_scipy_schur to iuc?
Or would you prefer to open a new pull request for this?

Marius1311 · 2020-05-12T07:25:01Z

No, you are right, I will change this here as well! I will also remove the smallest ev calculation then as that's no longer needed if we change the sorting criterion.

Krylov schur

Marius1311 · 2020-05-18T08:18:50Z

Hi @msmdev , what's the sorting used in 'brandts'? Is it just distance from the unit circle, or distance from a target value, e.g. 1?

msmdev · 2020-05-18T10:11:51Z

Hi @msmdev , what's the sorting used in 'brandts'? Is it just distance from the unit circle, or distance from a target value, e.g. 1?

@Marius1311 Currently you can choose between "'LM': the m eigenvalues with the largest magnitude are sorted up." and "'LR': the m eigenvalues with the largest real part are sorted up."
The corresponding function is select at

msmtools/msmtools/util/sort_real_schur.py

Line 263 in 914c0dd

def select(p, z):

Marius1311 · 2020-05-19T08:04:08Z

Thanks! Could it be that this does not work in 100% of the cases? I'm checking this in an example case by taking the full output of brandts, i.e. matrix R and Q, the full, sorted real schur decomposition. I'm then turning this into a complex schur decomposition using scipy.linalg.rsf2csf, I take the absolute value of the diagonal and I check whether this is monotonically decreasing. This is mostly true, but not always. See my code below.

First, I compute the real schur decomposition:

from msmtools.util.sorted_schur import sorted_schur
from scipy.linalg import schur, rsf2csf
R_b, Q_b = sorted_schur(T, m=10, method='brandts', z='LM')

Then, I need two utility functions to check the sorting:

import itertools
def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b) 

def check_sorting(R, Q):
    from scipy.linalg import rsf2csf
    eps = np.finfo(np.float64).eps
    
    R_com, Q_com = rsf2csf(R, Q)
    e_vals = np.abs(R_com.diagonal())

    greater_equal_bool = {}
    for i, (a, b) in enumerate(pairwise(e_vals)):
        greater_equal_bool[i] = {'left': a, 'right': b, 'greater': a>=(b-1e-12)}
        
    return greater_equal_bool

I run these:

ge_bool = check_sorting(R_b, Q_b)
pd.DataFrame.from_dict(ge_bool, orient='index').head(50)

My output is:

left	right	greater
0	1.000000	0.990945	True
1	0.990945	0.975737	True
2	0.975737	0.861521	True
3	0.861521	0.853648	True
4	0.853648	0.853648	True
5	0.853648	0.797249	True
6	0.797249	0.784729	True
7	0.784729	0.742847	True
8	0.742847	0.733630	True
9	0.733630	0.733630	True
10	0.733630	0.689641	True
11	0.689641	0.683335	True
12	0.683335	0.621811	True
13	0.621811	0.606209	True
14	0.606209	0.606209	True
15	0.606209	0.557852	True
16	0.557852	0.557852	True
17	0.557852	0.544426	True
18	0.544426	0.524358	True
19	0.524358	0.499279	True
20	0.499279	0.499279	True
21	0.499279	0.501240	False
22	0.501240	0.484542	True
23	0.484542	0.484542	True
24	0.484542	0.460512	True
25	0.460512	0.460512	True
26	0.460512	0.434826	True
27	0.434826	0.434826	True
28	0.434826	0.435977	False
29	0.435977	0.435977	True
30	0.435977	0.443433	False
31	0.443433	0.432692	True
32	0.432692	0.420567	True
33	0.420567	0.411191	True
34	0.411191	0.411191	True
35	0.411191	0.405862	True
36	0.405862	0.405862	True
37	0.405862	0.397171	True
38	0.397171	0.393496	True
39	0.393496	0.369567	True
40	0.369567	0.369567	True
41	0.369567	0.375119	False
42	0.375119	0.366818	True
43	0.366818	0.366818	True
44	0.366818	0.357722	True
45	0.357722	0.357722	True
46	0.357722	0.327143	True
47	0.327143	0.327143	True
48	0.327143	0.349199	False
49	0.349199	0.349199	True

Marius1311 · 2020-05-19T08:05:13Z

Would you say that counts as numerical inprecision? Would you expect the sorting to be perfect? It get's the first 20 eigenvalues right, which is something I guess.

msmdev · 2020-05-20T11:31:53Z

Would you say that counts as numerical inprecision? Would you expect the sorting to be perfect? It get's the first 20 eigenvalues right, which is something I guess.

This is quite off from the behavior I had with the original Brandts method in Matlab.
There I never had such issues - even when sorting a 2200x2200 Schurform completely. I remember I checked, if the sorting was correct: It was.
The current code is a quite direct translation of the original Matlab code.
Thus, the problem must be either inside of some scipy/numpy function used as analogue to a Matlab function inside of sorted_real_schur or in scipy.linalg.rsf2csf.

Markedly, the problem occurs (in the above example) only with complex conjugate pairs. This is interesting, but I can't reasonably explain it.
I would propose to take a close look on the source code of scipy.linalg.rsf2csf first...

msmdev · 2020-05-20T11:36:33Z

Do you have comparable issues with Krylov-Schur?
Are there any issues with Krylov-Schur left so far?

Marius1311 · 2020-05-20T13:45:57Z

I haven't checked this with krylov-schur yet, but will do.

Marius1311 · 2020-05-22T11:12:36Z

Update from my side

method `brandts`

I must correct myself, sorting works. I made a mistake initially, I run sorted_schur(T, m=15, method='brandts', z='LM'). Since it returned the full schur decomposition, I assumed all of it must be sorted but that's not what's happening as in sort_real_schur(Q, R, z, b, inplace=False), only the first b(=m) eigenvalues are sorted. So If I set m=100, then the False entries above, e.g. for row 21, disappear. I also checked sorting the whole matrix, that also works, for every single eigenvalue. So, conclusion here: sorting using brandts works just as it should. It's also relatively fast, 2min on my laptop on a 2.5x2.5k ill cond. matrix.

method `krylov`

This method also works. When I compute the first m=100 eigenvalues and corresponding schur vectors, they are all sorted correctly according to their magnitude.

comparing `brandts` and `krylov`

I compared the two methods both in terms of their returned eigenvalues as well as in terms of the real subspaces they span. On the level of eigenvalues, they agree fairly well. From krylov, I get the eigenvalues directly, for brands, I take the real sorted schur form, transform it to a complex schur form using scipy, take the diagonal and compare with krylov's eigenvalues. For the first 100, the maximum deviation in magnitude is 1e-5, the median is 1e-8. On the level of subspaces, they also agree well, if I take the subspace spanned by the first 20 schur vectors returned by either method, than the maximum subspace angle between them is 1e-9.

method `scipy`

This method does not work, however, hard I try. I tried various sorting options (`sort='iuc', 'lhp', lambda x: True, lambda x: np.abs(x) > 0.1) and both complex and real output, I never get even the first 20 eigenvalues sorted according to their magnitude.

My conclusion is: brands and krylov both work and produce consistent results. Using scipy for sorting does not work and I would advocate for simply removing this option. @msmdev, what's your opionion? If you agree, then I will go ahead and create a corresponding PR. To keep things organized, I think we should however, merge this first, and then potentially remove scipy sorting in a separate PR.

Marius1311 · 2020-05-22T14:53:27Z

I pushed a further change here: using eigenvectors rather than the backward iteration for the stationary distribution. In my case, this reduces the total computation time by a factor of 20, from 11 minutes to 30 seconds.

Marius1311 · 2020-05-22T14:54:19Z

This makes the implementation practical for really large matrices. I'm just trying it on a 30k x 30k matrix and it run in 30 seconds for three states, 39 seconds for 5 states, all on my laptop.

msmdev · 2020-05-24T08:40:43Z

Update from my side

method brandts

I must correct myself, sorting works. I made a mistake initially, I run sorted_schur(T, m=15, method='brandts', z='LM'). Since it returned the full schur decomposition, I assumed all of it must be sorted but that's not what's happening as in sort_real_schur(Q, R, z, b, inplace=False), only the first b(=m) eigenvalues are sorted. So If I set m=100, then the False entries above, e.g. for row 21, disappear. I also checked sorting the whole matrix, that also works, for every single eigenvalue. So, conclusion here: sorting using brandts works just as it should. It's also relatively fast, 2min on my laptop on a 2.5x2.5k ill cond. matrix.

That's wonderful news!

method krylov

This method also works. When I compute the first m=100 eigenvalues and corresponding schur vectors, they are all sorted correctly according to their magnitude.

Perfect!

comparing brandts and krylov

I compared the two methods both in terms of their returned eigenvalues as well as in terms of the real subspaces they span. On the level of eigenvalues, they agree fairly well. From krylov, I get the eigenvalues directly, for brands, I take the real sorted schur form, transform it to a complex schur form using scipy, take the diagonal and compare with krylov's eigenvalues. For the first 100, the maximum deviation in magnitude is 1e-5, the median is 1e-8. On the level of subspaces, they also agree well, if I take the subspace spanned by the first 20 schur vectors returned by either method, than the maximum subspace angle between them is 1e-9.

I think this is acceptable considering the use of scipy.linalg.rsf2csf. The more important criterion here (in my opinion) is the max subspace angle: 1e-9 is fair here.

method scipy

This method does not work, however, hard I try. I tried various sorting options (`sort='iuc', 'lhp', lambda x: True, lambda x: np.abs(x) > 0.1) and both complex and real output, I never get even the first 20 eigenvalues sorted according to their magnitude.

Matches with my troubles with this method. It's sad, since I planed to replace brandts at some point completely with scipy, since this is considerably faster, if one wants to sort a few thousand eigenvalues... but obviously its not up to the task :(

My conclusion is: brands and krylov both work and produce consistent results. Using scipy for sorting does not work and I would advocate for simply removing this option. @msmdev, what's your opionion? If you agree, then I will go ahead and create a corresponding PR. To keep things organized, I think we should however, merge this first, and then potentially remove scipy sorting in a separate PR.

I agree - lets do it so :)

Co-authored-by: Bernhard Reuter <[email protected]>

Marius1311 · 2020-05-25T08:46:53Z

Okay cool, I will go ahead then and remove scipy for sorting.

Marius1311 · 2020-05-25T11:33:15Z

These latest commits remove method = 'scipy' as an option. They also remove the smalles_eigenvalue functions as it's no longer needed and unstable for ill-conditioned matrices. @msmdev , let me know whether there is anything else that needs to be done before merging.

I tested this version on an example and it works.

Marius1311 · 2020-05-26T08:33:07Z

Hi @msmdev , from my side, everything is ready for merging. Is there anything else from your side?

msmdev · 2020-05-26T08:37:50Z

Please excuse the delay - I'm currently quite sick (strong headaches), but I promise to tend to it this very day. :-)

msmtools/analysis/sparse/stationary_vector.py

msmdev

Its all fine, except I'm somehow insecure regarding the changes in stationary_vector (see comments)...

Marius1311 · 2020-05-27T13:03:28Z

The latest commit adds the three checks we discussed above.

Marius1311 · 2020-05-27T13:55:37Z

I tested this already. Initially, the thresholds for the tests were too stringent so I relaxed them a bit. It works fine now.

msmtools/analysis/sparse/stationary_vector.py

Marius1311 · 2020-05-28T14:50:24Z

The lastest commit increases the tolerance for the irreducibility check and sorts the top 2 computed vectors by their real part since I just realised in an example that scipy does not always return them in the right order.

With all of this included, I reliably get the stationary distribution on a 32k x 32k sparse, ill-cond matrix in <4s on my laptop.

Marius1311 · 2020-05-29T08:39:03Z

@msmdev , anything missing or can we merge?

Marius1311 added 5 commits May 6, 2020 18:30

Add tolerance for krylov

a7f12c4

Add tolerance to gpcca class

74a7d50

Forgot tol in init

b6f9758

Relax row sum chi check

46b0b75

Add one forgoten tol parameter

2712426

Don't expose tol_krylov to the user

9417e53

Change default values everywhere

190c399

michalk8 added a commit to michalk8/msmtools that referenced this pull request May 16, 2020

Merge pull request msmdev#3 from michalk8/krylov_schur

753f2bf

Krylov schur

Marius1311 added 2 commits May 22, 2020 16:48

Use eigenvector for stationary dist

4b1ea40

Delete comments

4a6c7ca

Marius1311 and others added 3 commits May 25, 2020 10:38

Update msmtools/analysis/dense/gpcca.py

8c0e567

Co-authored-by: Bernhard Reuter <[email protected]>

Update msmtools/util/sorted_schur.py

2fb6047

Co-authored-by: Bernhard Reuter <[email protected]>

Update msmtools/util/sorted_schur.py

9800a97

Co-authored-by: Bernhard Reuter <[email protected]>

Marius1311 added 4 commits May 25, 2020 13:23

Remove scipy reference from class GPCCA

4b2a891

Remove scipy reference from other functions in gpcca.py

e724fee

Remove references and sorted_scipy_schur from sorted_schur.py

a92d76e

Remove the smallest_eigenvalues function

8b94946

This was referenced May 25, 2020

Krylov schur #5

Merged

Profile GPCCA with krylov-schur theislab/cellrank#104

Closed

msmdev reviewed May 26, 2020

View reviewed changes

msmtools/analysis/sparse/stationary_vector.py Outdated Show resolved Hide resolved

msmdev reviewed May 26, 2020

View reviewed changes

msmtools/analysis/sparse/stationary_vector.py Show resolved Hide resolved

msmdev reviewed May 26, 2020

View reviewed changes

Add checks for reducibility, imag. comp and sign structure

b892295

Marius1311 added 2 commits May 27, 2020 15:07

Fix sorting

ce75468

Increase tolerance

6352d8c

msmdev reviewed May 28, 2020

View reviewed changes

msmtools/analysis/sparse/stationary_vector.py Outdated Show resolved Hide resolved

Add sorting and increse tolerance

13898dd

msmdev approved these changes May 29, 2020

View reviewed changes

msmdev merged commit 5a2c91d into msmdev:krylov_schur May 29, 2020

Marius1311 deleted the krylov_schur branch May 30, 2020 11:11

Marius1311 restored the krylov_schur branch May 30, 2020 11:11

Add tolerance to krylov_schur #3

Add tolerance to krylov_schur #3

Conversation

Marius1311 commented May 6, 2020

Marius1311 commented May 6, 2020 • edited Loading

comparing scipy.linalg.schur, with and without sorting

msmdev commented May 7, 2020 • edited Loading

comparing scipy.linalg.schur, with and without sorting

msmdev commented May 7, 2020

Marius1311 commented May 7, 2020

msmdev commented May 7, 2020

Marius1311 commented May 7, 2020

Marius1311 commented May 7, 2020

Marius1311 commented May 11, 2020

msmdev commented May 11, 2020

Marius1311 commented May 12, 2020

Marius1311 commented May 18, 2020

msmdev commented May 18, 2020

Marius1311 commented May 19, 2020

Marius1311 commented May 19, 2020

msmdev commented May 20, 2020

msmdev commented May 20, 2020

Marius1311 commented May 20, 2020

Marius1311 commented May 22, 2020 • edited Loading

Update from my side

method brandts

method krylov

comparing brandts and krylov

method scipy

Marius1311 commented May 22, 2020

Marius1311 commented May 22, 2020 • edited Loading

msmdev commented May 24, 2020

Update from my side

method brandts

method krylov

comparing brandts and krylov

method scipy

Marius1311 commented May 25, 2020

Marius1311 commented May 25, 2020

Marius1311 commented May 26, 2020

msmdev commented May 26, 2020

msmdev left a comment

Choose a reason for hiding this comment

Marius1311 commented May 27, 2020

Marius1311 commented May 27, 2020

Marius1311 commented May 28, 2020

Marius1311 commented May 29, 2020

Add tolerance to `krylov_schur` #3

Add tolerance to `krylov_schur` #3

Marius1311 commented May 6, 2020 •

edited

Loading

comparing `scipy.linalg.schur`, with and without sorting

msmdev commented May 7, 2020 •

edited

Loading

comparing `scipy.linalg.schur`, with and without sorting

Marius1311 commented May 22, 2020 •

edited

Loading

method `brandts`

method `krylov`

comparing `brandts` and `krylov`

method `scipy`

Marius1311 commented May 22, 2020 •

edited

Loading

method `brandts`

method `krylov`

comparing `brandts` and `krylov`

method `scipy`