Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vader fails IMB-EXT Unidir_Get test #3821

Closed
mathbird opened this issue Jul 6, 2017 · 34 comments · Fixed by #3844
Closed

vader fails IMB-EXT Unidir_Get test #3821

mathbird opened this issue Jul 6, 2017 · 34 comments · Fixed by #3844

Comments

@mathbird
Copy link

mathbird commented Jul 6, 2017

Hi, I am testing OMPI with Intel IMB test “IMB-EXT Unidir_Get” as following. With “vader”, it gave error. Without “vader” or adding openib as “-mca btl openib,vader,tcp,self”, the test passed well. Does “vader” need special configure option to build? Does it need other options to make it work?

Thanks,

Dahai

mpirun -n 2 \
 -mca btl **vader**,tcp,self \
 imb_src/IMB-EXT Unidir_Get
#bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.04         0.00
            4         1000         0.72         5.31
            8         1000         0.89         8.54
           16         1000         0.88        17.39
           32         1000         1.10        27.76
           64         1000         1.15        53.29
          128         1000         0.91       134.17
          256         1000         0.94       260.82
 Read -1, expected 18446744073709547536, errno = 22
 Read -1, expected 18446744073709547536, errno = 22
 Read -1, expected 18446744073709550864, errno = 22
          512         1000         0.95       516.29
         1024         1000         0.56      1731.87
         2048         1000         0.68      2892.14
@jjhursey jjhursey changed the title about openmpi vader vader fails IMB-EXT Unidir_Get test Jul 6, 2017
@jjhursey jjhursey added the bug label Jul 6, 2017
@jjhursey
Copy link
Member

jjhursey commented Jul 6, 2017

We are also digging into this issue, but any insight/help would be appreciated. We have tracked it down to a handful of commits and are currently bisecting further to narrow it down to one. But it looks like the rcache / mpool rework might be the source.

The current thinking is that it's memory corruption. Adding some additional context to the error reported in vader here shows the size parameter as negative thus the errno of 22 (Invalid argument).

@jjhursey
Copy link
Member

jjhursey commented Jul 6, 2017

I'm using master from yesterday configured with --enable-debug using the IMB 4.1 benchmark. Running on a single machine with -np 2 can reproduce. We have been able to reproduce on both ppc64le and x86_64 platforms.

  • Pass: -mca btl sm,self
  • Pass: -mca btl tcp,self
  • Fail: -mca btl vader,self

Here is the full trace of a failed run:

shell$ cd imb-4.1/src/
shell$  mpirun -np 2 -mca pml ob1 -mca btl vader,self ./IMB-EXT Unidir_get
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1, MPI-2 part    
#------------------------------------------------------------
# Date                  : Thu Jul  6 10:39:28 2017
# Machine               : ppc64le
# System                : Linux
# Release               : 3.10.0-510.el7.ppc64le
# Version               : #1 SMP Wed Sep 21 14:46:20 EDT 2016
# MPI Version           : 3.1
# MPI Thread Environment: 

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down 
# dynamically when a certain run time (per message size sample) 
# is expected to be exceeded. Time limit is defined by variable 
# "SECS_PER_SAMPLE" (=> IMB_settings.h) 
# or through the flag => -time 
  


# Calling sequence was: 

# ./IMB-EXT Unidir_get

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Unidir_Get

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.22         0.00
            4         1000         3.25         1.17
            8         1000         2.87         2.66
           16         1000         2.79         5.46
           32         1000         2.59        11.80
           64         1000         2.55        23.92
          128         1000         2.53        48.26
          256         1000         2.58        94.66
          512         1000         2.56       190.48
[c712f6n06:131799] Read -1, expected 18446744073709547536, errno = 22
[c712f6n06:131799] Read -1, expected 18446744073709547536, errno = 22
[c712f6n06:131799] Read -1, expected 18446744073709550864, errno = 22
         1024         1000         1.82       535.80
         2048         1000         2.03       963.12

@mathbird
Copy link
Author

mathbird commented Jul 6, 2017

we found Unidir_Put works well with different options. so how does the vader code handle get and put differently?

@bwbarrett
Copy link
Member

@jjhursey, are the patches in question also on the v3.0.x branch? I assume yes, but hoping no...

@jjhursey
Copy link
Member

jjhursey commented Jul 6, 2017

@bwbarrett yeah it impacts the v3.0.x branch. Here is the breakdown for the release branches (as of a build from yesterday):

  • Pass v2.0.x
  • Fail v2.x
  • Fail v3.0.x
  • Fail master

The leading suspect commit is the mpool/rcache rework (which is not in the v2.0.x branch):

Valgrind shows some warnings in the MPI_Alloc_mem and MPI_Free_mem paths that might be related (best I can tell the key field is not initialized):

==85568== Conditional jump or move depends on uninitialised value(s)
==85568==    at 0x47AD480: mca_mpool_base_tree_node_compare (mpool_base_tree.c:62)
==85568==    by 0x46BD513: btree_insert (opal_rb_tree.c:342)
==85568==    by 0x46BCC8F: opal_rb_tree_insert (opal_rb_tree.c:137)
==85568==    by 0x47AD9AF: mca_mpool_base_tree_insert (mpool_base_tree.c:110)
==85568==    by 0x47AC76B: mca_mpool_base_alloc (mpool_base_alloc.c:87)
==85568==    by 0x41599AB: PMPI_Alloc_mem (palloc_mem.c:85)
==85568==    by 0x100055A7: IMB_set_buf (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x1000631B: IMB_init_buffers_iter (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x10002123: main (in /home/me/imb-4.1/src/IMB-EXT)
==85568== 

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         9.22         0.00
==85568== Conditional jump or move depends on uninitialised value(s)
==85568==    at 0x47AD468: mca_mpool_base_tree_node_compare (mpool_base_tree.c:58)
==85568==    by 0x46BCF1B: opal_rb_tree_find_with (opal_rb_tree.c:191)
==85568==    by 0x47AD41B: opal_rb_tree_find (opal_rb_tree.h:156)
==85568==    by 0x47ADB17: mca_mpool_base_tree_find (mpool_base_tree.c:143)
==85568==    by 0x47AC7DB: mca_mpool_base_free (mpool_base_alloc.c:110)
==85568==    by 0x41812CB: PMPI_Free_mem (pfree_mem.c:53)
==85568==    by 0x1000554B: IMB_set_buf (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x1000631B: IMB_init_buffers_iter (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x10002123: main (in /home/me/imb-4.1/src/IMB-EXT)

@hjelmn
Copy link
Member

hjelmn commented Jul 6, 2017

Yeah, looks like the item key is not being set.

@hjelmn
Copy link
Member

hjelmn commented Jul 6, 2017

Not sure if it is the cause of this bug though.

@hjelmn
Copy link
Member

hjelmn commented Jul 6, 2017

Maybe try adding this @ mpool_base_alloc.c:78

        mpool_tree_item->key = mem;

@jjhursey
Copy link
Member

jjhursey commented Jul 6, 2017

@hjelmn No love on that change. Though it did fix the valgrind complaints.

@hjelmn
Copy link
Member

hjelmn commented Jul 6, 2017

Ok, so that is a required fix but there is still something else going on. Is this with CMA, KNEM, or XPMEM?

@jjhursey
Copy link
Member

jjhursey commented Jul 6, 2017

CMA seems to be the only one active:

opal_config.h:#define OPAL_BTL_VADER_HAVE_CMA 1
opal_config.h:#define OPAL_BTL_VADER_HAVE_KNEM 0
opal_config.h:#define OPAL_BTL_VADER_HAVE_XPMEM 0

@jjhursey
Copy link
Member

jjhursey commented Jul 6, 2017

Here is something. Adding -mca btl_vader_segment_size 8388608 allows it to pass (default is 4194304 or 4MB for my configuration). So maybe something in the fragmentation mechanism?

@hjelmn
Copy link
Member

hjelmn commented Jul 6, 2017

Could be. Maybe a bug in the fragment allocator?

@hjelmn
Copy link
Member

hjelmn commented Jul 6, 2017

Was able to reproduce the issue on my mac with both put and get. Digging into it now.

@mathbird
Copy link
Author

mathbird commented Jul 6, 2017

I think IMB_ones_mget and IMB_ones_mput in IMB_ones_unidir.c triggered the issue. There is no MPI_Win_fence for each MPI_win_Get/MPI_win_Put calls.

@jjhursey
Copy link
Member

jjhursey commented Jul 6, 2017

A breadcrumb...

So size is overflowing because here in ob1 the prev_sent can be greater than bytes_remaining. Since they are both size_t then it overflows resulting in a very larger number. I added a check at this location printing out these value before bytes_remaining is decremented. You can see it would go negative, and thus the next call would issue the errno = 22 at the read location.

[c712f6n06:19207] [mca_pml_ob1_recv_request_progress_rget:764] Warning: bytes_remaining [16], prev_sent [4096]
[c712f6n06:19207] [mca_pml_ob1_recv_request_progress_rget:764] Warning: bytes_remaining [16], prev_sent [4096]
[c712f6n06:19207] [mca_pml_ob1_recv_request_progress_rget:764] Warning: bytes_remaining [3344], prev_sent [4096]

The next question is why are we getting an frag->rdma_length greater than bytes_remaining?

I have to stop for today. But @wlepera and @mathbird will continue with this tomorrow. @hjelmn let us know if you turn up anything. Thanks!

@mathbird
Copy link
Author

mathbird commented Jul 7, 2017

my 2 cents this morning:

The input parameter size in the function mca_btl_vader_get_cma is not right whenever the error happened. It should equal to the bytes size of the data, but it showed as following:

--- i = 1024, iter = 983
--- i --- in mca_btl_vader_get_cma, size = 7440 // <--- should be 1024*4= 4096
--- in mca_btl_vader_get_cma, size = 3344
--- in mca_btl_vader_get_cma, size = -752

My next question is where size is calculated before mca_btl_vader_get_cma is called?

The following simple code can also catch the same error.


#include "mpi.h"
#include "stdio.h"
 
#define NMAX 1024*1024
#define NITER 1000
 
int main(int argc, char *argv[]) 
{
    int rank, nprocs, A[NMAX], i, j, k;
    MPI_Win win;
    MPI_Datatype column, xpose;
    int errs = 0;

    MPI_Init(&argc,&argv); 
    MPI_Comm_size(MPI_COMM_WORLD,&nprocs); 
    MPI_Comm_rank(MPI_COMM_WORLD,&rank); 

    if (nprocs != 2) {
        printf("Run this program with 2 processes\n");fflush(stdout);
        MPI_Abort(MPI_COMM_WORLD, 1);
    }
 
    if (rank == 0)
    {
        for (i=0; i<NMAX; i++) A[i] = -1;
 
        MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); 
        MPI_Win_fence(0, win); 

        for (i=1; i<=NMAX; i*=2) {
           printf(" --- i = %d, \n", i); 
           for (k=0; k<NITER; k++) {
              MPI_Get(A, i, MPI_INT, 1, 0, i, MPI_INT, win);
           }
           MPI_Win_fence(0, win); 
           for (k=0; k<i; k++) {
            if(A[k] != 115+k) printf(" A[%d]=%d \n",  k, A[k]);
           }
        }
        MPI_Win_free(&win); 
    } 
    else if (rank == 1)
    { /* rank = 1 */
        for (i=0; i<NMAX; i++) A[i] = 115 + i;
        MPI_Win_create(A, NMAX*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); 
        MPI_Win_fence(0, win); 

        for (i=1; i<=NMAX; i*=2) {
           //for (k=0; k<NITER; k++) 
            MPI_Win_fence(0, win); 
       }
       MPI_Win_free(&win); 
    }
 
    MPI_Finalize(); 
    return errs; 
} 



@wlepera
Copy link
Contributor

wlepera commented Jul 7, 2017

I think I've narrowed this down a bit more. In pml_ob1_recvreq.c, the value for frag->rdma_length is set to bytes_remaining, with a value of 8208. Next, a successful call to mca_pml_ob1_recv_request_get_frag(frag) is made. After this, frag->rdma_length has changed to 4096, which ultimately causes the error in the calculation referenced by @jjhursey. I've traced this as far as the call to mca_bml_base_get, which makes a call into btl->btl_get, passing a pointer to cbdata (which is the frag struct). I suspect the value is being changed in this function, though I have not been able to confirm this yet.

I'm attaching the output of the debug run, which includes filenames, line numbers, and values of the bytes_remaining, frag->rdma_length variables. The failed get is logged starting at line 1763 in the output file. The value flips from 8208 to 4096 between lines 1776 and 1778

out.0.txt

@bosilca
Copy link
Member

bosilca commented Jul 8, 2017

I think we have a conceptual flaw in the RGET implementation. Let's ignore for a minute the comment in the mca_pml_ob1_recv_request_progress_rget function that talks about fragmentation. The loop in mca_pml_ob1_recv_request_progress_rget assumes that the fragment itself is available to the PML for meddling with upon return from the mca_pml_ob1_recv_request_get_frag function.
This assumptions is not valid, because the entire functions chain that starts with mca_pml_ob1_recv_request_get_frag leads (at least in the case of vader) to calling mca_btl_vader_get_cma which trigger the completion callback (mca_pml_ob1_rget_completion), and this callback release the fragment. At this point the release fragment might rightfully be picked up and reused for any other purpose (such as another pending operation that was stuck in the pending list). Thus, upon return the fragment rdma_length has been set in the context of another request to a value valid for that request, but totally irrelevant for the request the PML is currently working on in the context of mca_pml_ob1_recv_request_progress_rget.

The solution is to completely get rid of the fragmentation in the mca_pml_ob1_recv_request_progress_rget function (fragmentation that cannot work anyway because there is no feedback mechanism from the BTL back into the PML about how much data has been retrieved). The current fragmentation implementation is a M.A.J.O.R flaw, with a drastic impact on the performance of Open MPI. As long as the fragment can be reused in another context (which is that case as long as we have any pending requests or fragments behind MCA_PML_OB1_PROGRESS_PENDING), the current implementation send way more data than needed (because it will see a rdma_length that is not related to the current operation but rather another ongoing operation). This can easily be seen using the following patch. When the assert trigger ii is always larger than 0, and this should never be the case.

diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
index ddd60f263c..9aa7783f3c 100644
--- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
@@ -718,7 +718,12 @@ void mca_pml_ob1_recv_request_progress_rget( mca_pml_ob1_recv_request_t* recvreq
      * get fragmentation internally. This is a reasonable solution since some btls do not
      * need any fragmentation (sm, vader, self, etc). Remove this loop if this ends up
      * being the case. */
+    int ii = 0;
+    size_t saved_offset[10] = {0}, saved_remaining[10] = {0};
+    size_t saved_length_before[10] = {0}, saved_length_after[10] = {0};
     while (bytes_remaining > 0) {
+        saved_offset[ii] = offset;
+        saved_remaining[ii] = bytes_remaining;
         /* allocate/initialize a fragment */
         MCA_PML_OB1_RDMA_FRAG_ALLOC(frag);
         if (OPAL_UNLIKELY(NULL == frag)) {
@@ -752,16 +757,18 @@ void mca_pml_ob1_recv_request_progress_rget( mca_pml_ob1_recv_request_t* recvreq
         } else {
             frag->rdma_length = bytes_remaining;
         }
-
+        saved_length_before[ii] = frag->rdma_length;
         /* NTH: TODO -- handle error conditions gracefully */
         rc = mca_pml_ob1_recv_request_get_frag(frag);
         if (OMPI_SUCCESS != rc) {
             break;
         }
-
+        saved_length_after[ii] = frag->rdma_length;
         prev_sent = frag->rdma_length;
+        assert(prev_sent <= bytes_remaining);
         bytes_remaining -= prev_sent;
         offset += prev_sent;
+        ii++;
     }
 }

As a side note, unlike all the other supported protocols the current implementation of the RGET completely ignores any pipelining setting provided to the OB1 PML.

@jsquyres
Copy link
Member

Per discussion at Chicago July 2017 meeting, @hjelmn will look at this in the immediate future.

@jjhursey
Copy link
Member

@hjelmn In case it helps, Aboorva found that if we disable the cma with (-mca btl_vader_single_copy_mechanism none) then this test passes clean.

@hjelmn
Copy link
Member

hjelmn commented Jul 11, 2017

Whats funny is this fails miserably for me on my mac :-/

@hjelmn
Copy link
Member

hjelmn commented Jul 11, 2017

No vader RDMA there.

@hjelmn
Copy link
Member

hjelmn commented Jul 11, 2017

setting btl_vader_single_copy_mechanism=none gives the same behavior on linux as on my mac.

@hjelmn
Copy link
Member

hjelmn commented Jul 11, 2017

My thinking is this is either an pml/ob1 or osc/pt2pt bug.

@hjelmn
Copy link
Member

hjelmn commented Jul 11, 2017

Though it does pass with tcp....

@bosilca
Copy link
Member

bosilca commented Jul 11, 2017

It works with SM. How to you get it to fail on your laptop ?

@hjelmn
Copy link
Member

hjelmn commented Jul 11, 2017

This looks like an old ob1 issue. I can provide a quick workaround in vader and work on a bigger fix for the ob1 issue later.

hjelmn added a commit to hjelmn/ompi that referenced this issue Jul 11, 2017
This commit fixes a bug that occurs when the btl callback happens before
the rget returns. In this case the fragment has been returned and is no
longer valid. This commit saves the size before calling rget. This is
valid since the BTL is not allowed to change the read size.

Fixes open-mpi#3821

Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit that referenced this issue Jul 11, 2017
This commit fixes a bug that occurs when the btl callback happens before
the rget returns. In this case the fragment has been returned and is no
longer valid. This commit saves the size before calling rget. This is
valid since the BTL is not allowed to change the read size.

Fixes #3821

Signed-off-by: Nathan Hjelm <[email protected]>
@nmorey
Copy link
Contributor

nmorey commented Jul 12, 2017

This should be backported to the v2.x branch too

@nmorey
Copy link
Contributor

nmorey commented Jul 12, 2017

I don't know about 3.x but I backported the patch to 2.1.1 (only conflict is the copyright in header) and it's still broken:

[(master) nmorey@portia:openmpi]$ mpirun -np 2 --mca mtl psm2 --mca btl sm,self /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT  Unidir_Get
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2017 update 2, MPI-2 part    
#------------------------------------------------------------
# Date                  : Wed Jul 12 14:53:14 2017
# Machine               : x86_64
# System                : Linux
# Release               : 4.4.72-18.12-default
# Version               : #1 SMP Mon Jun 19 14:11:41 UTC 2017 (9c03296)
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT Unidir_Get

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Unidir_Get

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.06         0.00
            4         1000         0.62         6.44
            8         1000         0.57        13.96
           16         1000         0.57        28.26
           32         1000         0.56        57.29
           64         1000         0.56       113.99
          128         1000         0.69       184.81
          256         1000         0.58       442.15
          512         1000         0.63       810.94
         1024         1000         0.64      1605.01
         2048         1000         0.78      2638.19
         4096         1000         1.56      2624.61
         8192         1000         3.07      2665.04
        16384         1000         4.52      3622.42
        32768         1000         7.98      4106.21
        65536          640        12.99      5046.38
       131072          320        45.78      2862.79
       262144          160        50.59      5182.11
       524288           80        83.54      6275.61
      1048576           40       136.04      7707.79
      2097152           20       248.62      8435.32
      4194304           10       547.99      7653.95

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: NON-AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0          100         1.26         0.00
            4          100         4.88         0.82
            8          100         1.96         4.08
           16          100         2.00         8.01
           32          100         1.96        16.33
           64          100         1.93        33.08
          128          100         2.00        63.99
          256          100         2.42       105.91
          512          100         2.03       252.53
         1024          100         2.06       496.17
         2048          100         2.63       778.22
         4096          100         3.42      1199.08
         8192          100         6.13      1335.92
        16384          100         5.56      2946.34
        32768          100         7.12      4603.82
        65536          100        12.19      5376.75
       131072          100        21.41      6120.61
       262144          100        37.71      6951.39
       524288           80        72.40      7241.67
      1048576           40       137.73      7613.30
      2097152           20       308.24      6803.73
      4194304           10       752.77      5571.79


# All processes entering MPI_Finalize

[(master) nmorey@portia:openmpi]$ mpirun -np 2 --mca mtl psm2 --mca btl vader,self /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT  Unidir_Get
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2017 update 2, MPI-2 part    
#------------------------------------------------------------
# Date                  : Wed Jul 12 14:53:21 2017
# Machine               : x86_64
# System                : Linux
# Release               : 4.4.72-18.12-default
# Version               : #1 SMP Mon Jun 19 14:11:41 UTC 2017 (9c03296)
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT Unidir_Get

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Unidir_Get

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.08         0.00
            4         1000         1.34         2.98
            8         1000         1.34         5.99
           16         1000         1.38        11.62
           32         1000         1.01        31.70
           64         1000         1.31        48.82
          128         1000         9.58        13.36
          256         1000         1.46       175.63
          512         1000         1.30       394.64
         1024         1000         0.55      1858.98
         2048         1000        12.22       167.53
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709550864, errno = 22
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709550864, errno = 22
         4096         1000        10.30       397.49
[portia:19304] Read -1, expected 18446744073709543440, errno = 22
[portia:19304] Read -1, expected 18446744073709543440, errno = 22
[portia:19304] Read -1, expected 18446744073709550864, errno = 22

@wlepera
Copy link
Contributor

wlepera commented Jul 12, 2017

@nmorey, did you take both #3844 and #3846? I think both are needed. Our tests with both allow the testcase to pass.

@jjhursey
Copy link
Member

I'm setting up some builds this morning to verify for the release branches. It should impact v3.0.x, v2.x but not v2.0.x from my previous testing.

@nmorey
Copy link
Contributor

nmorey commented Jul 12, 2017

@wlepera I only took #3844 as #3846 was not referenced in this bug.
I'll try with both fixes.

Anyway both will need to be backported to v2.x

hjelmn added a commit to hjelmn/ompi that referenced this issue Jul 12, 2017
This commit fixes a bug that occurs when the btl callback happens before
the rget returns. In this case the fragment has been returned and is no
longer valid. This commit saves the size before calling rget. This is
valid since the BTL is not allowed to change the read size.

Fixes open-mpi#3821

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit e73ab93)
Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Jul 12, 2017
This commit fixes a bug that occurs when the btl callback happens before
the rget returns. In this case the fragment has been returned and is no
longer valid. This commit saves the size before calling rget. This is
valid since the BTL is not allowed to change the read size.

Fixes open-mpi#3821

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit e73ab93)
Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Jul 12, 2017
This commit fixes a bug that occurs when the btl callback happens before
the rget returns. In this case the fragment has been returned and is no
longer valid. This commit saves the size before calling rget. This is
valid since the BTL is not allowed to change the read size.

Fixes open-mpi#3821

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit e73ab93)
Signed-off-by: Nathan Hjelm <[email protected]>
@jjhursey
Copy link
Member

@nmorey There is a comment here about the pair of commits: #3845 (comment)

Nathan has filed PRs to push this fix to the release branches:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment