-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory fault during scaling of singular matrix #205
Conversation
@chrhansk unfortunately this change seems to break the main SSIDS test on all platforms. |
Function Line 571 in 077bd63
So I would suggest not to change the behaviour of Alternatively one could tackle Lines 10 to 13 in 077bd63
|
I was not aware of the problem. I tries to zero out the problematic entries manually in the postprocessing function. There still seems to be a problem in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is certainly an improvement over the current behavior that causes a segfault. However I am not sure the solution of doing the negative to zero conversion in the postprocessing function match_postproc
is ideal:
- It gives unnecessary responsibility to
match_postproc
and requires changing thematch
argument tointent(inout)
which to my mind makes the calling contract less clear - When using the auction method, the conversion in
match_postproc
is done redundantly for a second time afterLines 1487 to 1488 in 662c7ac
! We expect unmatched columns to have match(col) = 0 where(match(:) .eq. -1) match(:) = 0
Instead, I think it would be better to do the conversion in hungarian_wrapper
before calling match_postproc
. This is pretty minor and invisible to users though, so perhaps not relevant enough on its own.
More relevant though is that the way unmatched entries are returned from hungarian_scale_sym
and hungarian_scale_unsym
is now inconsistent. The unsymmetric version will now return 0 while the symmetric one continues to return negative entries, since the singular symmetric case does not call match_postproc
as seen here:
Lines 679 to 686 in 662c7ac
if ((.not. sym) .or. (inform%matched .eq. n)) then ! Unsymmetric or symmetric and full rank | |
! Note that in this case m=n | |
rscaling(1:m) = dualu(1:m) | |
cscaling(1:n) = dualv(1:n) - cmax(1:n) | |
call match_postproc(m, n, ptr, row, val, rscaling, cscaling, & | |
inform%matched, match, inform%flag, inform%stat) | |
return | |
end if |
hungarian_scale_sym
and hungarian_scale_unsym
), perhaps copy-pasted from the description of the auction method for which it is correct. Options I can think of:
- Do the conversion from negative to zero on a temporary copy of
match
. That way, bothhungarian_scale_sym
andhungarian_scale_unsym
continue to return negative entries. With this option, the incorrect documentation for both cases should be fixed (perhaps in a separate issue). - Accept this inconsistency. With this option, the incorrect documentation for the symmetric case should be fixed (perhaps in a separate issue)
- Make the symmetric case work with and return zeros for unmatched entries too. The necessary changes should be limited to
hungarian_wrapper
and should work well with the changes for the unsymmetric case (when done inhungarian_wrapper
, which would add a major reason for doing so to the above). Because of that it might make sense to change both at once instead of in a seperate issue.
The latter two options would break potential users who are relying on the negative entries (despite the wrong documentation) or would like to do so in the future, but it would introduce consistency with how the auction methods returns the matching and with the documented behavior. Not sure what's the best call here.
tests/scaling.f90
Outdated
|
||
allocate(a%ptr(n+1)) | ||
allocate(a%row(nz), a%val(nz)) | ||
allocate(rscaling(m), cscaling(n), match(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be match(m)
, not match(n)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about that. The docs state that it should be n
(similarly for all matching algorithms). I am not exactly sure why to be honest though, but calling it with m
in the unit test causes a segfault on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm you are right, I do get invalid write with valgrind for match(m)
. But when doing match(n)
, the last two entries in the example of that test are left uninitialized which surely is not intended either? Unless the idea is to use the info struct to obtain until where the values are initialized. Though the signature for hungarian_scale_unsym
does use m
:
Line 188 in 077bd63
integer, dimension(m), optional, intent(out) :: match |
There seems to be another (unrelated?) issue here... :(
But the existing random unsymmetric tests also use an overallocated match(maxn)
, i.e. not leading to invalid writes but only uninitialized return, so agree with doing the same here. We can deal with that in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #207
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that match(m)
is correct but that this happened to reveal a secondary bug which should probably be fixed before applying the changes proposed in this PR, see #200 (comment).
src/scaling.f90
Outdated
@@ -1619,7 +1619,7 @@ subroutine match_postproc(m, n, ptr, row, val, rscaling, cscaling, nmatch, & | |||
real(wp), dimension(m), intent(inout) :: rscaling | |||
real(wp), dimension(n), intent(inout) :: cscaling | |||
integer, intent(in) :: nmatch | |||
integer, dimension(m), intent(in) :: match | |||
integer, dimension(m), intent(inout) :: match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be avoided by doing the conversion from negative to zero entries at the callsite instead of here, see detailed comment
tests/scaling.f90
Outdated
a%n = n | ||
a%m = m | ||
|
||
a%ptr(1:n+1) = (/ 1, 3, 5, 5, 5, 7 /) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 1, 3, 5, 6, 6, 7
as in #200? As it stands, this matrix would have a duplicate entry.
Looking back at this: In particular in regards to #205: What is the proposed solution that you are converging on? As I understand #200 (comment) The lines setting the unmatched rows to negative values should be commented out, which to me implies that the values of unmatched rows should be set to zero instead. Am I correct in this regard? Such a change would necessitate corresponding changes in the implementation of SSIDS, as mentioned here: #205 (comment) Should I make those changes to the scaling and within SSIDS? |
a5e5bb2
to
8b27c63
Compare
Yes we have come to the conclusion that whilst negative row indices (to signal which of the rows are unmatched) make sense for square matrices, this does not make sense for general rectangular matrices. As far as we can tell SSIDS does not make use of the negative row indices themselves, but merely checks if a row is unmatched, so in theory changing the values of unmatched rows to zero should work fine provided we update SSIDS to check for zero rather than negative values. |
Many thanks @chrhansk, the suggestion has been to comment this problematic section out as it should no longer be required if we now return zero for unmatched entries. I guess it's a case of trying that and seeing if anything breaks? |
Basically the suggestion was this mjacobse@b815fac and indeed, it seems to be all that's needed to fix all issues at once. Personally I would like to see randomly generated singular tests to confirm, since all the random tests right now are nonsingular. |
Indeed many thanks! @chrhansk I suggest we apply mjacobse@b815fac to this PR and add a randomly generated singular matrix test to verify that things don't break. We think this should be all that is required now. |
Great, I appreciate your effort. |
cd86051
to
4205d0c
Compare
@mjacobse would you be able to re-review this PR at some point? I think this is now more or less ready to go in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure, let's merge it. I had tried to allow singular cases within the functions that do the random tests for scaling, but it turned out to not be so easy so I kinda lost track of it, sorry about that. Perhaps at some point we can look into more extensive tests, but for now let's go with this, should be a useful fix.
No worries, generating random singular matrices is not so easy. I'll add your suggestions and merge. |
Should fix #200