-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer overflow in MPI-IO with external32 data representation #5885
Comments
@ashwinraghu This sounds familiar, likely it is fixed in #5163. Has that PR been made into the branch you are working with? If not, give it a try. |
Yeah, I suspected it was not easy to pick. Let me give it try (porting to 3.4.x) |
@ashwinraghu Give #5888 a try. |
@ashwinraghu Maybe I missed it, could you provide a reproducer program? |
Do you have the test noncontig_coll.c ? |
@ashwinraghu Got you. |
Meanwhile, with the user's benchmark I am able to reproduce the problem using branch 2203_pack_ext built with dataloop instead of yaksa.
Note that a similar build of 4.0.1 that uses dataloop works fine. Trying to build the branch 2203_pack_ext with yaksa data engine seems to have compilation errors. I'll probably do a clean build over the weekend. Errors are like these: src/mpi/datatype/typerep/src/typerep_yaksa_pack.c:34:14: error: too few arguments to function ‘yaksa_ipack’ |
@ashwinraghu Okay, I think the bug is a different one from what I thought. Could you try this PR #5907? |
A user reported a crash in an MPI-IO benchmark when using the "external32" data representation.
A sample stack trace of a process aborting due to heap corruption looks like this:
free(): invalid size
double free or corruption (out)
#5 0x00007f060c73925a in MPIR_Typerep_unpack ()
#6 0x00007f060af8092f in PMPI_Unpack ()
#7 0x00007f060cf02e6e in MPIU_write_external32_conversion_fn ()
#8 0x00007f060cf03179 in MPIU_external32_buffer_setup ()
#9 0x00007f060cefdc0a in MPIOI_File_write_all ()
I ran valgrind with memcheck on the test noncontig_coll.c after making this change:
int main(int argc, char **argv)
...
MPI_File_set_view(fh, 0, MPI_INT, newtype, "external32", MPI_INFO_NULL);
...
MPI_File_set_view(fh, 0, MPI_INT, newtype, "external32", MPI_INFO_NULL);
The valgrind report has these interesting traces of buffer overflows (read and write):
Invalid read of size 4
at yaksuri_seqi_pack_resized_blkhindx_hvector_blklen_1_int32_t (in ./anl-noncontigcoll)
by yaksuri_seq_ipack (in ./anl-noncontigcoll)
by ipup (in ./anl-noncontigcoll)
by yaksi_ipack_backend (in ./anl-noncontigcoll)
by yaksi_ipack (in ./anl-noncontigcoll)
by yaksa_ipack (in ./anl-noncontigcoll)
by MPIR_Typerep_pack (in ./anl-noncontigcoll)
by PMPI_Pack (in ./anl-noncontigcoll)
by MPIU_read_external32_conversion_fn (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Invalid write of size 4
at yaksuri_seqi_unpack_resized_blkhindx_hvector_blklen_1_int32_t (in ./anl-noncontigcoll)
by yaksuri_seq_iunpack (in ./anl-noncontigcoll)
by ipup (in ./anl-noncontigcoll)
by yaksi_iunpack_backend (in ./anl-noncontigcoll)
by yaksi_iunpack (in ./anl-noncontigcoll)
by yaksa_iunpack (in ./anl-noncontigcoll)
by MPIR_Typerep_unpack (in ./anl-noncontigcoll)
by PMPI_Unpack (in ./anl-noncontigcoll)
by MPIU_write_external32_conversion_fn (in ./anl-noncontigcoll)
by MPIU_external32_buffer_setup (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIU_external32_buffer_setup (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Invalid write of size 4
at ??? (in ./anl-noncontigcoll)
by ADIOI_Fill_user_buffer (in ./anl-noncontigcoll)
by ADIOI_R_Exchange_data (in ./anl-noncontigcoll)
by ADIOI_GEN_ReadStridedColl (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Invalid read of size 4
at ??? (in ./anl-noncontigcoll)
by ADIOI_Fill_send_buffer (in ./anl-noncontigcoll)
by ADIOI_W_Exchange_data (in ./anl-noncontigcoll)
by ADIOI_GEN_WriteStridedColl (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIU_external32_buffer_setup (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
A call stack that uses the dataloop abstraction is evident in the valgrind report on the original benchmark linked against another MPI implementation.
Invalid write of size 8
at vector_m2m (in ./benchio)
by MPII_Segment_manipulate (in ./benchio)
by MPIR_Segment_unpack (in ./benchio)
by MPIR_Typerep_unpack (in ./benchio)
by PMPI_Unpack (in ./benchio)
by MPIU_write_external32_conversion_fn (in ./benchio)
by MPIU_external32_buffer_setup (in ./benchio)
by MPIOI_File_write_all (in ./benchio)
by PMPI_File_write_all (in ./benchio)
by MPI_FILE_WRITE_ALL (in ./benchio)
by mpiiowrite$mpiio_ (in ./benchio)
by main (in ./benchio)
Address is 0 bytes after a block of size 136,318,928 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./benchio)
by MPIU_external32_buffer_setup (in ./benchio)
by MPIOI_File_write_all (in ./benchio)
by PMPI_File_write_all (in ./benchio)
by MPI_FILE_WRITE_ALL (in ./benchio)
by mpiiowrite$mpiio_ (in ./benchio)
by main (in ./benchio)
Note that I am not sure at this point as to where the actual problem is; whether there's an under-allocation in MPIU_external32_buffer_setup() or if the actual pack/unpack routines have a problem.
The text was updated successfully, but these errors were encountered: