-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
btl/vader: handle unexpected short read/write in process_vm_{read,write}v #4832
btl/vader: handle unexpected short read/write in process_vm_{read,write}v #4832
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented on the issue #4829 this fix do not match the expected behavior of these functions according to their man pages.
@bosilca I also read
Since our |
I concur. I run on 2 kernel versions and it is the same behavior, and I did not had the time to look further. |
If this behavior is contrary to system man pages, I think you should definitely add a comment in both locations explaining why this code is necessary (and cite the specific system/OS/distro/version/whatever where the non-manpage-conforming behavior was seen). This won't matter in the next few months, but 2 years from now, someone will look at this code and say "That code is unnecessary because the man page guarantees that the unit of granularity of transfer is a single iovec!" -- opening the door to a possible regression if they delete this extra logic. |
Per 2018-02-20 webex:
|
@ggouaillardet Can you add the comments that @jsquyres mentioned. I would like to get this into master. |
373bc9c
to
3d6f714
Compare
…te}v Important note : According to the man page "On success, process_vm_readv() returns the number of bytes read and process_vm_writev() returns the number of bytes written. This return value may be less than the total number of requested bytes, if a partial read/write occurred. (Partial transfers apply at the granularity of iovec elements. These system calls won't perform a partial transfer that splits a single iovec element.)" So since we use a single iovec element, the returned size should either be 0 or size, and the do loop should not be needed here. We tried on various Linux kernels with size > 2 GB, and surprisingly, the returned value is always 0x7ffff000 (fwiw, it happens to be the size of the larger number of pages that fits a signed 32 bits integer). We do not know whether this is a bug from the kernel, the libc or even the man page, but for the time being, we do as is process_vm_readv() could return any value. Thanks Heiko Bauke for the bug report. Refs. open-mpi#4829 Signed-off-by: Gilles Gouaillardet <[email protected]>
3d6f714
to
9fedf28
Compare
@hjelmn I made the changes and will merge when CI completes |
Thanks Heiko Bauke for the bug report.
Refs. #4829
Signed-off-by: Gilles Gouaillardet [email protected]