Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer overflow in MPI-IO with external32 data representation #5885

Closed
ashwinraghu opened this issue Mar 7, 2022 · 9 comments · Fixed by #5907
Closed

Buffer overflow in MPI-IO with external32 data representation #5885

ashwinraghu opened this issue Mar 7, 2022 · 9 comments · Fixed by #5907

Comments

@ashwinraghu
Copy link

ashwinraghu commented Mar 7, 2022

A user reported a crash in an MPI-IO benchmark when using the "external32" data representation.
A sample stack trace of a process aborting due to heap corruption looks like this:
free(): invalid size
double free or corruption (out)
#5 0x00007f060c73925a in MPIR_Typerep_unpack ()
#6 0x00007f060af8092f in PMPI_Unpack ()
#7 0x00007f060cf02e6e in MPIU_write_external32_conversion_fn ()
#8 0x00007f060cf03179 in MPIU_external32_buffer_setup ()
#9 0x00007f060cefdc0a in MPIOI_File_write_all ()

I ran valgrind with memcheck on the test noncontig_coll.c after making this change:

int main(int argc, char **argv)
...
MPI_File_set_view(fh, 0, MPI_INT, newtype, "external32", MPI_INFO_NULL);
...
MPI_File_set_view(fh, 0, MPI_INT, newtype, "external32", MPI_INFO_NULL);

The valgrind report has these interesting traces of buffer overflows (read and write):

Invalid read of size 4
at yaksuri_seqi_pack_resized_blkhindx_hvector_blklen_1_int32_t (in ./anl-noncontigcoll)
by yaksuri_seq_ipack (in ./anl-noncontigcoll)
by ipup (in ./anl-noncontigcoll)
by yaksi_ipack_backend (in ./anl-noncontigcoll)
by yaksi_ipack (in ./anl-noncontigcoll)
by yaksa_ipack (in ./anl-noncontigcoll)
by MPIR_Typerep_pack (in ./anl-noncontigcoll)
by PMPI_Pack (in ./anl-noncontigcoll)
by MPIU_read_external32_conversion_fn (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)

Invalid write of size 4
at yaksuri_seqi_unpack_resized_blkhindx_hvector_blklen_1_int32_t (in ./anl-noncontigcoll)
by yaksuri_seq_iunpack (in ./anl-noncontigcoll)
by ipup (in ./anl-noncontigcoll)
by yaksi_iunpack_backend (in ./anl-noncontigcoll)
by yaksi_iunpack (in ./anl-noncontigcoll)
by yaksa_iunpack (in ./anl-noncontigcoll)
by MPIR_Typerep_unpack (in ./anl-noncontigcoll)
by PMPI_Unpack (in ./anl-noncontigcoll)
by MPIU_write_external32_conversion_fn (in ./anl-noncontigcoll)
by MPIU_external32_buffer_setup (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIU_external32_buffer_setup (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)

Invalid write of size 4
at ??? (in ./anl-noncontigcoll)
by ADIOI_Fill_user_buffer (in ./anl-noncontigcoll)
by ADIOI_R_Exchange_data (in ./anl-noncontigcoll)
by ADIOI_GEN_ReadStridedColl (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIOI_File_read_all (in ./anl-noncontigcoll)
by PMPI_File_read_at_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)

Invalid read of size 4
at ??? (in ./anl-noncontigcoll)
by ADIOI_Fill_send_buffer (in ./anl-noncontigcoll)
by ADIOI_W_Exchange_data (in ./anl-noncontigcoll)
by ADIOI_GEN_WriteStridedColl (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)
Address is 0 bytes after a block of size 4,194,300 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./anl-noncontigcoll)
by MPIU_external32_buffer_setup (in ./anl-noncontigcoll)
by MPIOI_File_write_all (in ./anl-noncontigcoll)
by PMPI_File_write_all (in ./anl-noncontigcoll)
by main (in ./anl-noncontigcoll)

A call stack that uses the dataloop abstraction is evident in the valgrind report on the original benchmark linked against another MPI implementation.

Invalid write of size 8
at vector_m2m (in ./benchio)
by MPII_Segment_manipulate (in ./benchio)
by MPIR_Segment_unpack (in ./benchio)
by MPIR_Typerep_unpack (in ./benchio)
by PMPI_Unpack (in ./benchio)
by MPIU_write_external32_conversion_fn (in ./benchio)
by MPIU_external32_buffer_setup (in ./benchio)
by MPIOI_File_write_all (in ./benchio)
by PMPI_File_write_all (in ./benchio)
by MPI_FILE_WRITE_ALL (in ./benchio)
by mpiiowrite$mpiio_ (in ./benchio)
by main (in ./benchio)
Address is 0 bytes after a block of size 136,318,928 alloc'd
at malloc (in vg_replace_malloc.c:306)
by ADIOI_Malloc_fn (in ./benchio)
by MPIU_external32_buffer_setup (in ./benchio)
by MPIOI_File_write_all (in ./benchio)
by PMPI_File_write_all (in ./benchio)
by MPI_FILE_WRITE_ALL (in ./benchio)
by mpiiowrite$mpiio_ (in ./benchio)
by main (in ./benchio)

Note that I am not sure at this point as to where the actual problem is; whether there's an under-allocation in MPIU_external32_buffer_setup() or if the actual pack/unpack routines have a problem.

@hzhou
Copy link
Contributor

hzhou commented Mar 7, 2022

@ashwinraghu This sounds familiar, likely it is fixed in #5163. Has that PR been made into the branch you are working with? If not, give it a try.

@ashwinraghu
Copy link
Author

@hzhou PR #5163 has not made it to the branch I am on. I had a look at the changes and they don't port over easily to our branch which is based off of 3.4.x. Any chance the fix can be ported to 3.4.x?

@hzhou
Copy link
Contributor

hzhou commented Mar 8, 2022

Yeah, I suspected it was not easy to pick. Let me give it try (porting to 3.4.x)

@hzhou
Copy link
Contributor

hzhou commented Mar 9, 2022

@ashwinraghu Give #5888 a try.

@hzhou
Copy link
Contributor

hzhou commented Mar 24, 2022

@ashwinraghu Maybe I missed it, could you provide a reproducer program?

@ashwinraghu
Copy link
Author

Do you have the test noncontig_coll.c ?
You'll need to modify the invocations of MPI_File_set_view() to use external32 like this for instance:
...
MPI_File_set_view(fh, 0, MPI_INT, newtype, "external32", MPI_INFO_NULL);
...

@hzhou
Copy link
Contributor

hzhou commented Mar 25, 2022

@ashwinraghu Got you.

@ashwinraghu
Copy link
Author

ashwinraghu commented Mar 25, 2022

Meanwhile, with the user's benchmark I am able to reproduce the problem using branch 2203_pack_ext built with dataloop instead of yaksa.
On running valgrind against this build I see several instances like this:

Invalid write of size 8
  at ??? (in ./benchio)
  by index_m2m (in looputil.c:644)
  by MPII_Segment_manipulate (in segment.c:664)
  by MPIR_Segment_unpack (in looputil.c:427)
  by MPIR_Typerep_unpack (in typerep_dataloop_pack.c:125)
  by PMPI_Unpack (in unpack.c:124)
  by MPIU_write_external32_conversion_fn (in mpiu_external32.c:58)
  by MPIU_external32_buffer_setup (in mpiu_external32.c:148)
  by MPIOI_File_write_all (in write_all.c:106)
  by PMPI_File_write_all (in write_all.c:54)
  by mpi_file_write_all (in file_write_allf.c:276)
  by main (in ./benchio)
Address is 0 bytes after a block of size 136,318,928 alloc'd
  at malloc (in vg_replace_malloc.c:306)
  by MPL_malloc (in mpl_trmem.h:272)
  by ADIOI_Malloc_fn (in malloc.c:41)
  by MPIU_external32_buffer_setup (in mpiu_external32.c:146)
  by MPIOI_File_write_all (in write_all.c:106)
  by PMPI_File_write_all (in write_all.c:54)
  by mpi_file_write_all (in file_write_allf.c:276)
  by main (in ./benchio)

Note that a similar build of 4.0.1 that uses dataloop works fine.

Trying to build the branch 2203_pack_ext with yaksa data engine seems to have compilation errors. I'll probably do a clean build over the weekend. Errors are like these:
CC src/mpi/datatype/typerep/src/lib_libmpi_la-typerep_yaksa_pack.lo
src/mpi/datatype/typerep/src/typerep_yaksa_pack.c: In function ‘MPIR_Typerep_copy’:
src/mpi/datatype/typerep/src/typerep_yaksa_pack.c:35:52: warning: passing argument 9 of ‘yaksa_ipack’ makes integer from pointer without a cast [-Wint-conver
sion]
&actual_pack_bytes, NULL, &request);
^
In file included from src/mpi/datatype/typerep/src/typerep_yaksa_pack.c:7:0:
/home/ashwinr/anlmpich/huiext32/mpich/modules/yaksa/src/frontend/include/yaksa.h:484:5: note: expected ‘yaksa_op_t {aka long unsigned int}’ but argument is o
f type ‘yaksa_request_t * {aka long unsigned int *}’
int yaksa_ipack(const void *inbuf, uintptr_t incount, yaksa_type_t type, uintptr_t inoffset,
^~~~~~~~~~~

src/mpi/datatype/typerep/src/typerep_yaksa_pack.c:34:14: error: too few arguments to function ‘yaksa_ipack’
rc = yaksa_ipack(inbuf, num_bytes, YAKSA_TYPE__BYTE, 0, outbuf, num_bytes,
^~~~~~~~~~~
In file included from src/mpi/datatype/typerep/src/typerep_yaksa_pack.c:7:0:
/home/ashwinr/anlmpich/huiext32/mpich/modules/yaksa/src/frontend/include/yaksa.h:484:5: note: declared here
int yaksa_ipack(const void *inbuf, uintptr_t incount, yaksa_type_t type, uintptr_t inoffset,
^~~~~~~~~~~

@hzhou
Copy link
Contributor

hzhou commented Mar 25, 2022

@ashwinraghu Okay, I think the bug is a different one from what I thought. Could you try this PR #5907?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants