Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in pinv? #214

Closed
sglyon opened this issue May 14, 2015 · 6 comments
Closed

Bug in pinv? #214

sglyon opened this issue May 14, 2015 · 6 comments

Comments

@sglyon
Copy link

sglyon commented May 14, 2015

Suppose we had this matrix:

S = [494.1013106366904 46142.52662766732 13.948592561126539 -0.5761953808074929
 46142.52662766732 4.592191854163587e6 1176.2963747371257 -42.84536709656223
 13.948592561126539 1176.2963747371257 0.4603766442896071 -0.02261134866942
 -0.5761953808074929 -42.84536709656223 -0.02261134866942 0.001308271770670055]

Then consider the following code from v0.4 (master branch, 5-13-15):

julia> S = [494.1013106366904 46142.52662766732 13.948592561126539 -0.5761953808074929
        46142.52662766732 4.592191854163587e6 1176.2963747371257 -42.84536709656223
        13.948592561126539 1176.2963747371257 0.4603766442896071 -0.02261134866942
        -0.5761953808074929 -42.84536709656223 -0.02261134866942 0.001308271770670055]
4x4 Array{Float64,2}:
   494.101     46142.5          13.9486      -0.576195
 46142.5           4.59219e6  1176.3        -42.8454
    13.9486     1176.3           0.460377    -0.0226113
    -0.576195    -42.8454       -0.0226113    0.00130827

julia> versioninfo()
Julia Version 0.4.0-dev+4817
Commit c790369* (2015-05-13 14:00 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.3.0)
  CPU: Intel(R) Core(TM) i7-3635QM CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> pinv(S)
4x4 Array{Float64,2}:
    5.27356    -0.0261815     -138.437      -927.494
   -0.0261815   0.000132159      0.673668      4.44041
 -138.437       0.673668      3733.89      25625.4
 -927.494       4.44041      25625.4           1.80587e5

Now on the latest 0.3.8 release:

julia> S = [494.1013106366904 46142.52662766732 13.948592561126539 -0.5761953808074929
        46142.52662766732 4.592191854163587e6 1176.2963747371257 -42.84536709656223
        13.948592561126539 1176.2963747371257 0.4603766442896071 -0.02261134866942
        -0.5761953808074929 -42.84536709656223 -0.02261134866942 0.001308271770670055]
4x4 Array{Float64,2}:
   494.101     46142.5          13.9486      -0.576195
 46142.5           4.59219e6  1176.3        -42.8454
    13.9486     1176.3           0.460377    -0.0226113
    -0.576195    -42.8454       -0.0226113    0.00130827

julia> versioninfo()
Julia Version 0.3.8
Commit 79599ad* (2015-04-30 23:40 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.0.0)
  CPU: Intel(R) Core(TM) i7-3635QM CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> pinv(S)
4x4 Array{Float64,2}:
  0.500222    -0.00332121   -6.62205      0.94274
 -0.00332121   2.26771e-5    0.0423813   -0.00603419
 -6.62205      0.0423813    93.8221     -13.3545
  0.94274     -0.00603419  -13.3545       1.90086

Is there a reason why pinv gives such a different answer for the exact same input matrix?

@sglyon
Copy link
Author

sglyon commented May 14, 2015

For what it is worth, I tested against scipy.linalg.pinv and scipy.linalg.pinv2 and it looks like the pinv2 function there gives the same answer as what we get on 0.3.8 whereas the pinv function gives a similar result to what we are getting on 0.4dev:

In [13]: import numpy as np

In [14]: import scipy.linalg as la

In [15]: S = np.array([[494.1013106366904, 46142.52662766732, 13.948592561126539, -0.5761953808074929],
[46142.52662766732, 4.592191854163587e6, 1176.2963747371257, -42.84536709656223],
[13.948592561126539, 1176.2963747371257, 0.4603766442896071, -0.02261134866942],
[-0.5761953808074929, -42.84536709656223, -0.02261134866942, 0.001308271770670055],
])

In [16]: np.set_printoptions(precision=4)

In [17]: la.pinv(S)
Out[17]:
array([[  5.2736e+00,  -2.6182e-02,  -1.3844e+02,  -9.2749e+02],
       [ -2.6182e-02,   1.3216e-04,   6.7367e-01,   4.4404e+00],
       [ -1.3844e+02,   6.7367e-01,   3.7339e+03,   2.5625e+04],
       [ -9.2749e+02,   4.4404e+00,   2.5625e+04,   1.8059e+05]])

In [18]: la.pinv2(S)
Out[18]:
array([[  5.0022e-01,  -3.3212e-03,  -6.6220e+00,   9.4274e-01],
       [ -3.3212e-03,   2.2677e-05,   4.2381e-02,  -6.0342e-03],
       [ -6.6220e+00,   4.2381e-02,   9.3822e+01,  -1.3354e+01],
       [  9.4274e-01,  -6.0342e-03,  -1.3354e+01,   1.9009e+00]])

the docs for these routines say that scipy.lingalg.pinv uses a least squares solver where scipy.linalg.pinv2 uses the SVD.

@jiahao
Copy link
Member

jiahao commented May 14, 2015

The change is intentional. The default inverse condition number threshold was changed in 0.4; see Issue JuliaLang/julia#8859.

julia> pinv(S, 1e-9) #On 0.4, basically reproduces the 0.3.8 answer
4x4 Array{Float64,2}:
  0.500222    -0.00332121   -6.62205      0.94274   
 -0.00332121   2.26771e-5    0.0423813   -0.00603419
 -6.62205      0.0423813    93.8221     -13.3545    
  0.94274     -0.00603419  -13.3545       1.90086   

I suspect that numpy uses different regularizations in pinv and pinv2 also.

Note that your matrix has a condition number ~1e12

julia> cond(S)
8.461070041485869e11

julia> svdvals(S)
4-element Array{Float64,1}:
  4.59266e6 
 30.6059    
  0.010396  
  5.42798e-6

and so the default in 0.3.8 produced essentially a rank 3 pseudoinverse. The default threshold in 0.4 is more conservative and produces a rank 4 pseudoinverse.

julia> svdvals(pinv(S, 1e-9))
4-element Array{Float64,1}:
 96.1906     
  0.0326735  
  2.17739e-7 
  4.10145e-16

julia> svdvals(pinv(S))
4-element Array{Float64,1}:
  1.8423e5  
 96.1906    
  0.0326735 
  2.17739e-7

@sglyon
Copy link
Author

sglyon commented May 14, 2015

Thanks for the edited comment.

Not being a numerical linear algebra expert myself, what does this mean for routines that use pinv that now give very different answers based on the version of Julia that is used? I guess my question is, should I as a user be happy with this intentional change and why?

@jiahao
Copy link
Member

jiahao commented May 14, 2015

There is no general answer to your question, unfortunately. It depends on what you want to use the pseudoinverse for.

@andreasnoack
Copy link
Member

The pseudo-inverse is not continuous when the rank of the matrix changes. The reason your result changes so much is that the rank estimate for your matrix changes between 0.3 and 0.4. To summarize the discussion that @jiahao links to, there exist two kinds of numerical rank deficient matrices: the first kind has a very well determined rank, say k, where the magnitude of the sorted singular drops dramatically at k, the second kind is often related to so called ill-posed problems and where the magnitude of the sorted singular values drops regularly such that the numerical rank cannot be easily be determined, but at the same time spanning a range so wide that the matrix is ill conditioned. The tolerance of the pinv function has to balance these two cases.

@sglyon
Copy link
Author

sglyon commented May 14, 2015

OK great, thank you both for the tips.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants