Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tolerance to krylov_schur #3

Merged
merged 20 commits into from
May 29, 2020
Merged

Conversation

Marius1311
Copy link

This adds the possibility to change the convergence threshold used internally by SLEPc. Default is 1e-8, which is SLEPc's default. It also relaxes the tolerance used for checking whether the rows of the memberships vectors sum to one.

This solves my problems with krylov_schur not returning invariant subspaces for ill-conditioned matrices.

@Marius1311
Copy link
Author

Marius1311 commented May 6, 2020

There are two other things that I don't really know where to discuss because I can't open issues here, so I will just mention them here.

  1. method='scipy' fails for ill cond. matrices. That's because of the function smallest_eigenvalue. Since our matrices always have the largest eigenvalue 1, this automatically means that for ill-conditioned matrices, the smalles eigenvalue becomes extremly small - order of magnitude of 1/condition_number. That means that krylov_schur no longer converges here, it returns zero eigenvalues and that makes this method fail. I thought about ways to circumvent this and I think the easiest would be to just not look for the smalles eigenvalue, but rather just sort the full spectrum. This makes hardly any difference in terms of performance for large matrices (3.5k x 3.5k), see below - in fact, it will be much faster for every even slightly ill-conditioned matrix, since finding their smallest eigenvalue is either impossible or very difficult.
  2. deviations between brandts and krylov_schur in terms of the returned final membership vectors grow with m, at least for my ill cond. matrix. I check for the maximum deviation between the membership matrices as a function of m. This is very small for m=3, O(1e-14). It then increases slightly until m=9 and reaches O(1e-9). Then, it increases crazily: for m=10, I suddenly get O(1e-2). I continued until m=25 and the maximum deviation fluctuates around a bit between O(1e-2) and O(1e-1). It decreases and then increases again. First I tought that decreasing the tolerance for krylov_schur could solve this, but it did not really have any effect. I don't have intuition to explain this, esp. the very sudden increase by 7 orders of magnitude. Do you @msmdev have any idea? Could this be caused by another part of GPCCA, or can it only be the schur decomposition? Could anything here be not very robust?

comparing scipy.linalg.schur, with and without sorting

No sorting:

 %timeit R, Q = scipy.linalg.schur(P)

9.57 s ± 447 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sorting:

%timeit R, Q, sdim = scipy.linalg.schur(P, sort='iuc')

10.1 s ± 694 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Within one standard deviation, these two overlap...

@msmdev
Copy link
Owner

msmdev commented May 7, 2020

There are two other things that I don't really know where to discuss because I can't open issues here, so I will just mention them here.

  1. method='scipy' fails for ill cond. matrices. That's because of the function smallest_eigenvalue. Since our matrices always have the largest eigenvalue 1, this automatically means that for ill-conditioned matrices, the smalles eigenvalue becomes extremly small - order of magnitude of 1/condition_number. That means that krylov_schur no longer converges here, it returns zero eigenvalues and that makes this method fail. I thought about ways to circumvent this and I think the easiest would be to just not look for the smalles eigenvalue, but rather just sort the full spectrum. This makes hardly any difference in terms of performance for large matrices (3.5k x 3.5k), see below - in fact, it will be much faster for every even slightly ill-conditioned matrix, since finding their smallest eigenvalue is either impossible or very difficult.

Sounds fine in principle, but does scipy.linalg.schur(P, sort='iuc') really return all top eigenvalues sorted (sorted by largest magnitude or largest real part?)? I don't really trust scipy with sorting, after having a lot of troubles with scipy.linalg.schur(P, sort=lambda x: np.real(x) > cutoff) (at least with this option you can't sort the whole spectrum, but maximum all eigenvalues larger than the smallest - if the condition and the god of scipy lets you...).

  1. deviations between brandts and krylov_schur in terms of the returned final membership vectors grow with m, at least for my ill cond. matrix. I check for the maximum deviation between the membership matrices as a function of m. This is very small for m=3, O(1e-14). It then increases slightly until m=9 and reaches O(1e-9). Then, it increases crazily: for m=10, I suddenly get O(1e-2). I continued until m=25 and the maximum deviation fluctuates around a bit between O(1e-2) and O(1e-1). It decreases and then increases again. First I tought that decreasing the tolerance for krylov_schur could solve this, but it did not really have any effect. I don't have intuition to explain this, esp. the very sudden increase by 7 orders of magnitude. Do you @msmdev have any idea? Could this be caused by another part of GPCCA, or can it only be the schur decomposition? Could anything here be not very robust?

First of all: What is ill-conditioned here (how du you caclulate it and what magnitude do you get?)?
Second: There are several places in GPCCA were ill-conditioning could make troubles - this is why I check at this places (at least I think I check...).
Third: I'm not even sure who is making the troubles here: brandts or krylov-schur...
Fourth: We really shouldn't work with really ill-conditioned matrices (anything above 1/eps (eps = machine precision) is a no-go...). In my experiments I even insured that the condition wasn't above 1e6, to ensure that linalg functions work properly. It's not GPCCA causing the ruckus here, but plain simple numpy or scipy.linalg...
If you are having ill-conditioned matrices all the time, we should tweak the way the transition matrices are constructed, imposing an acceptable condition...

comparing scipy.linalg.schur, with and without sorting

No sorting:

 %timeit R, Q = scipy.linalg.schur(P)

9.57 s ± 447 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sorting:

%timeit R, Q, sdim = scipy.linalg.schur(P, sort='iuc')

10.1 s ± 694 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Within one standard deviation, these two overlap...

@msmdev
Copy link
Owner

msmdev commented May 7, 2020

@marius have you tried dimension reduction before generating your transition matrix (e.g. via PCA or ICA and kmeans clustering or some more advanced clustering technique)?
A smaller and less sparse matrix is virtually always better conditioned...

@Marius1311
Copy link
Author

Hi Bernhard, thanks for getting back to me. There are quite a few things here so I would say let's discuss all of them during our skype tomorrow.

The easy change I can do right now is to change the tolerance used by krylov-schur to be 1e-16 by default.

@msmdev
Copy link
Owner

msmdev commented May 7, 2020

Just to remind myself: If method = 'brandts', we should ensure that after sorting the schur vectors in

Q, R, ap = sort_real_schur(Q, R, z=z, b=m)

the result is compared with the original schur vectors from
R, Q = schur(P, output='real')

to ensure that we are still in the invariant subspace...

@Marius1311
Copy link
Author

Right, if the method is brandts and we cut through a block of complex conjugates, the error I get is

m = 6: According to scipy.linalg.subspace_angles() X isn't an invariant subspace of P, since the subspace angles between the column spaces of P*X and X*R (resp. X, if you chose the Krylov-Schur method)aren't near zero. The subspace angles are: `[4.90140248e-03 2.39150618e-15 2.16468702e-15 1.87234927e-15
 1.71006757e-15 9.37945059e-16]`

and this doesn't tell me anything about cutting through a block of complex conjugetes being split.

@Marius1311
Copy link
Author

@msmdev , I have changed this PR to no longer expose the tolerance to the user but instead to change it to 1e-16 internally.

@Marius1311
Copy link
Author

Hi @msmdev , can we merge this or would you like me to change anything?

@msmdev
Copy link
Owner

msmdev commented May 11, 2020

Mb. you could change the sorting criterion in sorted_scipy_schur to iuc?
Or would you prefer to open a new pull request for this?

@Marius1311
Copy link
Author

No, you are right, I will change this here as well! I will also remove the smallest ev calculation then as that's no longer needed if we change the sorting criterion.

michalk8 added a commit to michalk8/msmtools that referenced this pull request May 16, 2020
@Marius1311
Copy link
Author

Hi @msmdev , what's the sorting used in 'brandts'? Is it just distance from the unit circle, or distance from a target value, e.g. 1?

@msmdev
Copy link
Owner

msmdev commented May 18, 2020

Hi @msmdev , what's the sorting used in 'brandts'? Is it just distance from the unit circle, or distance from a target value, e.g. 1?

@Marius1311 Currently you can choose between "'LM': the m eigenvalues with the largest magnitude are sorted up." and "'LR': the m eigenvalues with the largest real part are sorted up."
The corresponding function is select at

def select(p, z):

@Marius1311
Copy link
Author

Thanks! Could it be that this does not work in 100% of the cases? I'm checking this in an example case by taking the full output of brandts, i.e. matrix R and Q, the full, sorted real schur decomposition. I'm then turning this into a complex schur decomposition using scipy.linalg.rsf2csf, I take the absolute value of the diagonal and I check whether this is monotonically decreasing. This is mostly true, but not always. See my code below.

First, I compute the real schur decomposition:

from msmtools.util.sorted_schur import sorted_schur
from scipy.linalg import schur, rsf2csf
R_b, Q_b = sorted_schur(T, m=10, method='brandts', z='LM')

Then, I need two utility functions to check the sorting:

import itertools
def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b) 

def check_sorting(R, Q):
    from scipy.linalg import rsf2csf
    eps = np.finfo(np.float64).eps
    
    R_com, Q_com = rsf2csf(R, Q)
    e_vals = np.abs(R_com.diagonal())

    greater_equal_bool = {}
    for i, (a, b) in enumerate(pairwise(e_vals)):
        greater_equal_bool[i] = {'left': a, 'right': b, 'greater': a>=(b-1e-12)}
        
    return greater_equal_bool

I run these:

ge_bool = check_sorting(R_b, Q_b)
pd.DataFrame.from_dict(ge_bool, orient='index').head(50)

My output is:

left	right	greater
0	1.000000	0.990945	True
1	0.990945	0.975737	True
2	0.975737	0.861521	True
3	0.861521	0.853648	True
4	0.853648	0.853648	True
5	0.853648	0.797249	True
6	0.797249	0.784729	True
7	0.784729	0.742847	True
8	0.742847	0.733630	True
9	0.733630	0.733630	True
10	0.733630	0.689641	True
11	0.689641	0.683335	True
12	0.683335	0.621811	True
13	0.621811	0.606209	True
14	0.606209	0.606209	True
15	0.606209	0.557852	True
16	0.557852	0.557852	True
17	0.557852	0.544426	True
18	0.544426	0.524358	True
19	0.524358	0.499279	True
20	0.499279	0.499279	True
21	0.499279	0.501240	False
22	0.501240	0.484542	True
23	0.484542	0.484542	True
24	0.484542	0.460512	True
25	0.460512	0.460512	True
26	0.460512	0.434826	True
27	0.434826	0.434826	True
28	0.434826	0.435977	False
29	0.435977	0.435977	True
30	0.435977	0.443433	False
31	0.443433	0.432692	True
32	0.432692	0.420567	True
33	0.420567	0.411191	True
34	0.411191	0.411191	True
35	0.411191	0.405862	True
36	0.405862	0.405862	True
37	0.405862	0.397171	True
38	0.397171	0.393496	True
39	0.393496	0.369567	True
40	0.369567	0.369567	True
41	0.369567	0.375119	False
42	0.375119	0.366818	True
43	0.366818	0.366818	True
44	0.366818	0.357722	True
45	0.357722	0.357722	True
46	0.357722	0.327143	True
47	0.327143	0.327143	True
48	0.327143	0.349199	False
49	0.349199	0.349199	True

@Marius1311
Copy link
Author

Would you say that counts as numerical inprecision? Would you expect the sorting to be perfect? It get's the first 20 eigenvalues right, which is something I guess.

@msmdev
Copy link
Owner

msmdev commented May 20, 2020

Would you say that counts as numerical inprecision? Would you expect the sorting to be perfect? It get's the first 20 eigenvalues right, which is something I guess.

This is quite off from the behavior I had with the original Brandts method in Matlab.
There I never had such issues - even when sorting a 2200x2200 Schurform completely. I remember I checked, if the sorting was correct: It was.
The current code is a quite direct translation of the original Matlab code.
Thus, the problem must be either inside of some scipy/numpy function used as analogue to a Matlab function inside of sorted_real_schur or in scipy.linalg.rsf2csf.

Markedly, the problem occurs (in the above example) only with complex conjugate pairs. This is interesting, but I can't reasonably explain it.
I would propose to take a close look on the source code of scipy.linalg.rsf2csf first...

@msmdev
Copy link
Owner

msmdev commented May 20, 2020

Do you have comparable issues with Krylov-Schur?
Are there any issues with Krylov-Schur left so far?

@Marius1311
Copy link
Author

I haven't checked this with krylov-schur yet, but will do.

@Marius1311
Copy link
Author

Marius1311 commented May 22, 2020

Update from my side

method brandts

I must correct myself, sorting works. I made a mistake initially, I run sorted_schur(T, m=15, method='brandts', z='LM'). Since it returned the full schur decomposition, I assumed all of it must be sorted but that's not what's happening as in sort_real_schur(Q, R, z, b, inplace=False), only the first b(=m) eigenvalues are sorted. So If I set m=100, then the False entries above, e.g. for row 21, disappear. I also checked sorting the whole matrix, that also works, for every single eigenvalue. So, conclusion here: sorting using brandts works just as it should. It's also relatively fast, 2min on my laptop on a 2.5x2.5k ill cond. matrix.

method krylov

This method also works. When I compute the first m=100 eigenvalues and corresponding schur vectors, they are all sorted correctly according to their magnitude.

comparing brandts and krylov

I compared the two methods both in terms of their returned eigenvalues as well as in terms of the real subspaces they span. On the level of eigenvalues, they agree fairly well. From krylov, I get the eigenvalues directly, for brands, I take the real sorted schur form, transform it to a complex schur form using scipy, take the diagonal and compare with krylov's eigenvalues. For the first 100, the maximum deviation in magnitude is 1e-5, the median is 1e-8. On the level of subspaces, they also agree well, if I take the subspace spanned by the first 20 schur vectors returned by either method, than the maximum subspace angle between them is 1e-9.

method scipy

This method does not work, however, hard I try. I tried various sorting options (`sort='iuc', 'lhp', lambda x: True, lambda x: np.abs(x) > 0.1) and both complex and real output, I never get even the first 20 eigenvalues sorted according to their magnitude.

My conclusion is: brands and krylov both work and produce consistent results. Using scipy for sorting does not work and I would advocate for simply removing this option. @msmdev, what's your opionion? If you agree, then I will go ahead and create a corresponding PR. To keep things organized, I think we should however, merge this first, and then potentially remove scipy sorting in a separate PR.

@Marius1311
Copy link
Author

I pushed a further change here: using eigenvectors rather than the backward iteration for the stationary distribution. In my case, this reduces the total computation time by a factor of 20, from 11 minutes to 30 seconds.

@Marius1311
Copy link
Author

Marius1311 commented May 22, 2020

This makes the implementation practical for really large matrices. I'm just trying it on a 30k x 30k matrix and it run in 30 seconds for three states, 39 seconds for 5 states, all on my laptop.

@msmdev
Copy link
Owner

msmdev commented May 24, 2020

Update from my side

method brandts

I must correct myself, sorting works. I made a mistake initially, I run sorted_schur(T, m=15, method='brandts', z='LM'). Since it returned the full schur decomposition, I assumed all of it must be sorted but that's not what's happening as in sort_real_schur(Q, R, z, b, inplace=False), only the first b(=m) eigenvalues are sorted. So If I set m=100, then the False entries above, e.g. for row 21, disappear. I also checked sorting the whole matrix, that also works, for every single eigenvalue. So, conclusion here: sorting using brandts works just as it should. It's also relatively fast, 2min on my laptop on a 2.5x2.5k ill cond. matrix.

That's wonderful news!

method krylov

This method also works. When I compute the first m=100 eigenvalues and corresponding schur vectors, they are all sorted correctly according to their magnitude.

Perfect!

comparing brandts and krylov

I compared the two methods both in terms of their returned eigenvalues as well as in terms of the real subspaces they span. On the level of eigenvalues, they agree fairly well. From krylov, I get the eigenvalues directly, for brands, I take the real sorted schur form, transform it to a complex schur form using scipy, take the diagonal and compare with krylov's eigenvalues. For the first 100, the maximum deviation in magnitude is 1e-5, the median is 1e-8. On the level of subspaces, they also agree well, if I take the subspace spanned by the first 20 schur vectors returned by either method, than the maximum subspace angle between them is 1e-9.

I think this is acceptable considering the use of scipy.linalg.rsf2csf. The more important criterion here (in my opinion) is the max subspace angle: 1e-9 is fair here.

method scipy

This method does not work, however, hard I try. I tried various sorting options (`sort='iuc', 'lhp', lambda x: True, lambda x: np.abs(x) > 0.1) and both complex and real output, I never get even the first 20 eigenvalues sorted according to their magnitude.

Matches with my troubles with this method. It's sad, since I planed to replace brandts at some point completely with scipy, since this is considerably faster, if one wants to sort a few thousand eigenvalues... but obviously its not up to the task :(

My conclusion is: brands and krylov both work and produce consistent results. Using scipy for sorting does not work and I would advocate for simply removing this option. @msmdev, what's your opionion? If you agree, then I will go ahead and create a corresponding PR. To keep things organized, I think we should however, merge this first, and then potentially remove scipy sorting in a separate PR.

I agree - lets do it so :)

Marius1311 and others added 3 commits May 25, 2020 10:38
@Marius1311
Copy link
Author

Okay cool, I will go ahead then and remove scipy for sorting.

@Marius1311
Copy link
Author

These latest commits remove method = 'scipy' as an option. They also remove the smalles_eigenvalue functions as it's no longer needed and unstable for ill-conditioned matrices. @msmdev , let me know whether there is anything else that needs to be done before merging.

I tested this version on an example and it works.

@Marius1311
Copy link
Author

Hi @msmdev , from my side, everything is ready for merging. Is there anything else from your side?

@msmdev
Copy link
Owner

msmdev commented May 26, 2020

Please excuse the delay - I'm currently quite sick (strong headaches), but I promise to tend to it this very day. :-)

Copy link
Owner

@msmdev msmdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its all fine, except I'm somehow insecure regarding the changes in stationary_vector (see comments)...

@Marius1311
Copy link
Author

The latest commit adds the three checks we discussed above.

@Marius1311
Copy link
Author

I tested this already. Initially, the thresholds for the tests were too stringent so I relaxed them a bit. It works fine now.

@Marius1311
Copy link
Author

The lastest commit increases the tolerance for the irreducibility check and sorts the top 2 computed vectors by their real part since I just realised in an example that scipy does not always return them in the right order.

With all of this included, I reliably get the stationary distribution on a 32k x 32k sparse, ill-cond matrix in <4s on my laptop.

@Marius1311
Copy link
Author

@msmdev , anything missing or can we merge?

@msmdev msmdev merged commit 5a2c91d into msmdev:krylov_schur May 29, 2020
@Marius1311 Marius1311 deleted the krylov_schur branch May 30, 2020 11:11
@Marius1311 Marius1311 restored the krylov_schur branch May 30, 2020 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants