vader fails IMB-EXT Unidir_Get test #3821

mathbird · 2017-07-06T14:05:09Z

Hi, I am testing OMPI with Intel IMB test “IMB-EXT Unidir_Get” as following. With “vader”, it gave error. Without “vader” or adding openib as “-mca btl openib,vader,tcp,self”, the test passed well. Does “vader” need special configure option to build? Does it need other options to make it work?

Thanks,

Dahai

mpirun -n 2 \
 -mca btl **vader**,tcp,self \
 imb_src/IMB-EXT Unidir_Get

#bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.04         0.00
            4         1000         0.72         5.31
            8         1000         0.89         8.54
           16         1000         0.88        17.39
           32         1000         1.10        27.76
           64         1000         1.15        53.29
          128         1000         0.91       134.17
          256         1000         0.94       260.82
 Read -1, expected 18446744073709547536, errno = 22
 Read -1, expected 18446744073709547536, errno = 22
 Read -1, expected 18446744073709550864, errno = 22
          512         1000         0.95       516.29
         1024         1000         0.56      1731.87
         2048         1000         0.68      2892.14

jjhursey · 2017-07-06T14:25:16Z

We are also digging into this issue, but any insight/help would be appreciated. We have tracked it down to a handful of commits and are currently bisecting further to narrow it down to one. But it looks like the rcache / mpool rework might be the source.

The current thinking is that it's memory corruption. Adding some additional context to the error reported in vader here shows the size parameter as negative thus the errno of 22 (Invalid argument).

jjhursey · 2017-07-06T14:44:01Z

I'm using master from yesterday configured with --enable-debug using the IMB 4.1 benchmark. Running on a single machine with -np 2 can reproduce. We have been able to reproduce on both ppc64le and x86_64 platforms.

Pass: -mca btl sm,self
Pass: -mca btl tcp,self
Fail: -mca btl vader,self

Here is the full trace of a failed run:

shell$ cd imb-4.1/src/
shell$  mpirun -np 2 -mca pml ob1 -mca btl vader,self ./IMB-EXT Unidir_get
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1, MPI-2 part    
#------------------------------------------------------------
# Date                  : Thu Jul  6 10:39:28 2017
# Machine               : ppc64le
# System                : Linux
# Release               : 3.10.0-510.el7.ppc64le
# Version               : #1 SMP Wed Sep 21 14:46:20 EDT 2016
# MPI Version           : 3.1
# MPI Thread Environment: 

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down 
# dynamically when a certain run time (per message size sample) 
# is expected to be exceeded. Time limit is defined by variable 
# "SECS_PER_SAMPLE" (=> IMB_settings.h) 
# or through the flag => -time 
  


# Calling sequence was: 

# ./IMB-EXT Unidir_get

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Unidir_Get

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.22         0.00
            4         1000         3.25         1.17
            8         1000         2.87         2.66
           16         1000         2.79         5.46
           32         1000         2.59        11.80
           64         1000         2.55        23.92
          128         1000         2.53        48.26
          256         1000         2.58        94.66
          512         1000         2.56       190.48
[c712f6n06:131799] Read -1, expected 18446744073709547536, errno = 22
[c712f6n06:131799] Read -1, expected 18446744073709547536, errno = 22
[c712f6n06:131799] Read -1, expected 18446744073709550864, errno = 22
         1024         1000         1.82       535.80
         2048         1000         2.03       963.12

mathbird · 2017-07-06T15:23:44Z

we found Unidir_Put works well with different options. so how does the vader code handle get and put differently?

bwbarrett · 2017-07-06T16:41:09Z

@jjhursey, are the patches in question also on the v3.0.x branch? I assume yes, but hoping no...

jjhursey · 2017-07-06T16:54:09Z

@bwbarrett yeah it impacts the v3.0.x branch. Here is the breakdown for the release branches (as of a build from yesterday):

Pass v2.0.x
Fail v2.x
Fail v3.0.x
Fail master

The leading suspect commit is the mpool/rcache rework (which is not in the v2.0.x branch):

d4afb16

Valgrind shows some warnings in the MPI_Alloc_mem and MPI_Free_mem paths that might be related (best I can tell the key field is not initialized):

==85568== Conditional jump or move depends on uninitialised value(s)
==85568==    at 0x47AD480: mca_mpool_base_tree_node_compare (mpool_base_tree.c:62)
==85568==    by 0x46BD513: btree_insert (opal_rb_tree.c:342)
==85568==    by 0x46BCC8F: opal_rb_tree_insert (opal_rb_tree.c:137)
==85568==    by 0x47AD9AF: mca_mpool_base_tree_insert (mpool_base_tree.c:110)
==85568==    by 0x47AC76B: mca_mpool_base_alloc (mpool_base_alloc.c:87)
==85568==    by 0x41599AB: PMPI_Alloc_mem (palloc_mem.c:85)
==85568==    by 0x100055A7: IMB_set_buf (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x1000631B: IMB_init_buffers_iter (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x10002123: main (in /home/me/imb-4.1/src/IMB-EXT)
==85568== 

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         9.22         0.00
==85568== Conditional jump or move depends on uninitialised value(s)
==85568==    at 0x47AD468: mca_mpool_base_tree_node_compare (mpool_base_tree.c:58)
==85568==    by 0x46BCF1B: opal_rb_tree_find_with (opal_rb_tree.c:191)
==85568==    by 0x47AD41B: opal_rb_tree_find (opal_rb_tree.h:156)
==85568==    by 0x47ADB17: mca_mpool_base_tree_find (mpool_base_tree.c:143)
==85568==    by 0x47AC7DB: mca_mpool_base_free (mpool_base_alloc.c:110)
==85568==    by 0x41812CB: PMPI_Free_mem (pfree_mem.c:53)
==85568==    by 0x1000554B: IMB_set_buf (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x1000631B: IMB_init_buffers_iter (in /home/me/imb-4.1/src/IMB-EXT)
==85568==    by 0x10002123: main (in /home/me/imb-4.1/src/IMB-EXT)

hjelmn · 2017-07-06T17:30:02Z

Yeah, looks like the item key is not being set.

hjelmn · 2017-07-06T17:30:25Z

Not sure if it is the cause of this bug though.

hjelmn · 2017-07-06T17:31:24Z

Maybe try adding this @ mpool_base_alloc.c:78

        mpool_tree_item->key = mem;

jjhursey · 2017-07-06T17:42:17Z

@hjelmn No love on that change. Though it did fix the valgrind complaints.

hjelmn · 2017-07-06T17:44:26Z

Ok, so that is a required fix but there is still something else going on. Is this with CMA, KNEM, or XPMEM?

jjhursey · 2017-07-06T18:26:16Z

CMA seems to be the only one active:

opal_config.h:#define OPAL_BTL_VADER_HAVE_CMA 1
opal_config.h:#define OPAL_BTL_VADER_HAVE_KNEM 0
opal_config.h:#define OPAL_BTL_VADER_HAVE_XPMEM 0

jjhursey · 2017-07-06T18:41:38Z

Here is something. Adding -mca btl_vader_segment_size 8388608 allows it to pass (default is 4194304 or 4MB for my configuration). So maybe something in the fragmentation mechanism?

hjelmn · 2017-07-06T18:49:23Z

Could be. Maybe a bug in the fragment allocator?

hjelmn · 2017-07-06T20:19:08Z

Was able to reproduce the issue on my mac with both put and get. Digging into it now.

mathbird · 2017-07-06T21:13:24Z

I think IMB_ones_mget and IMB_ones_mput in IMB_ones_unidir.c triggered the issue. There is no MPI_Win_fence for each MPI_win_Get/MPI_win_Put calls.

jjhursey · 2017-07-06T21:55:50Z

A breadcrumb...

So size is overflowing because here in ob1 the prev_sent can be greater than bytes_remaining. Since they are both size_t then it overflows resulting in a very larger number. I added a check at this location printing out these value before bytes_remaining is decremented. You can see it would go negative, and thus the next call would issue the errno = 22 at the read location.

[c712f6n06:19207] [mca_pml_ob1_recv_request_progress_rget:764] Warning: bytes_remaining [16], prev_sent [4096]
[c712f6n06:19207] [mca_pml_ob1_recv_request_progress_rget:764] Warning: bytes_remaining [16], prev_sent [4096]
[c712f6n06:19207] [mca_pml_ob1_recv_request_progress_rget:764] Warning: bytes_remaining [3344], prev_sent [4096]

The next question is why are we getting an frag->rdma_length greater than bytes_remaining?

I have to stop for today. But @wlepera and @mathbird will continue with this tomorrow. @hjelmn let us know if you turn up anything. Thanks!

mathbird · 2017-07-07T13:58:09Z

my 2 cents this morning:

The input parameter size in the function mca_btl_vader_get_cma is not right whenever the error happened. It should equal to the bytes size of the data, but it showed as following:

--- i = 1024, iter = 983
--- i --- in mca_btl_vader_get_cma, size = 7440 // <--- should be 1024*4= 4096
--- in mca_btl_vader_get_cma, size = 3344
--- in mca_btl_vader_get_cma, size = -752

My next question is where size is calculated before mca_btl_vader_get_cma is called?

The following simple code can also catch the same error.


#include "mpi.h"
#include "stdio.h"
 
#define NMAX 1024*1024
#define NITER 1000
 
int main(int argc, char *argv[]) 
{
    int rank, nprocs, A[NMAX], i, j, k;
    MPI_Win win;
    MPI_Datatype column, xpose;
    int errs = 0;

    MPI_Init(&argc,&argv); 
    MPI_Comm_size(MPI_COMM_WORLD,&nprocs); 
    MPI_Comm_rank(MPI_COMM_WORLD,&rank); 

    if (nprocs != 2) {
        printf("Run this program with 2 processes\n");fflush(stdout);
        MPI_Abort(MPI_COMM_WORLD, 1);
    }
 
    if (rank == 0)
    {
        for (i=0; i<NMAX; i++) A[i] = -1;
 
        MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); 
        MPI_Win_fence(0, win); 

        for (i=1; i<=NMAX; i*=2) {
           printf(" --- i = %d, \n", i); 
           for (k=0; k<NITER; k++) {
              MPI_Get(A, i, MPI_INT, 1, 0, i, MPI_INT, win);
           }
           MPI_Win_fence(0, win); 
           for (k=0; k<i; k++) {
            if(A[k] != 115+k) printf(" A[%d]=%d \n",  k, A[k]);
           }
        }
        MPI_Win_free(&win); 
    } 
    else if (rank == 1)
    { /* rank = 1 */
        for (i=0; i<NMAX; i++) A[i] = 115 + i;
        MPI_Win_create(A, NMAX*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); 
        MPI_Win_fence(0, win); 

        for (i=1; i<=NMAX; i*=2) {
           //for (k=0; k<NITER; k++) 
            MPI_Win_fence(0, win); 
       }
       MPI_Win_free(&win); 
    }
 
    MPI_Finalize(); 
    return errs; 
}

wlepera · 2017-07-07T22:35:26Z

I think I've narrowed this down a bit more. In pml_ob1_recvreq.c, the value for frag->rdma_length is set to bytes_remaining, with a value of 8208. Next, a successful call to mca_pml_ob1_recv_request_get_frag(frag) is made. After this, frag->rdma_length has changed to 4096, which ultimately causes the error in the calculation referenced by @jjhursey. I've traced this as far as the call to mca_bml_base_get, which makes a call into btl->btl_get, passing a pointer to cbdata (which is the frag struct). I suspect the value is being changed in this function, though I have not been able to confirm this yet.

I'm attaching the output of the debug run, which includes filenames, line numbers, and values of the bytes_remaining, frag->rdma_length variables. The failed get is logged starting at line 1763 in the output file. The value flips from 8208 to 4096 between lines 1776 and 1778

out.0.txt

bosilca · 2017-07-08T22:10:06Z

I think we have a conceptual flaw in the RGET implementation. Let's ignore for a minute the comment in the mca_pml_ob1_recv_request_progress_rget function that talks about fragmentation. The loop in mca_pml_ob1_recv_request_progress_rget assumes that the fragment itself is available to the PML for meddling with upon return from the mca_pml_ob1_recv_request_get_frag function.
This assumptions is not valid, because the entire functions chain that starts with mca_pml_ob1_recv_request_get_frag leads (at least in the case of vader) to calling mca_btl_vader_get_cma which trigger the completion callback (mca_pml_ob1_rget_completion), and this callback release the fragment. At this point the release fragment might rightfully be picked up and reused for any other purpose (such as another pending operation that was stuck in the pending list). Thus, upon return the fragment rdma_length has been set in the context of another request to a value valid for that request, but totally irrelevant for the request the PML is currently working on in the context of mca_pml_ob1_recv_request_progress_rget.

The solution is to completely get rid of the fragmentation in the mca_pml_ob1_recv_request_progress_rget function (fragmentation that cannot work anyway because there is no feedback mechanism from the BTL back into the PML about how much data has been retrieved). The current fragmentation implementation is a M.A.J.O.R flaw, with a drastic impact on the performance of Open MPI. As long as the fragment can be reused in another context (which is that case as long as we have any pending requests or fragments behind MCA_PML_OB1_PROGRESS_PENDING), the current implementation send way more data than needed (because it will see a rdma_length that is not related to the current operation but rather another ongoing operation). This can easily be seen using the following patch. When the assert trigger ii is always larger than 0, and this should never be the case.

diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
index ddd60f263c..9aa7783f3c 100644
--- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
@@ -718,7 +718,12 @@ void mca_pml_ob1_recv_request_progress_rget( mca_pml_ob1_recv_request_t* recvreq
      * get fragmentation internally. This is a reasonable solution since some btls do not
      * need any fragmentation (sm, vader, self, etc). Remove this loop if this ends up
      * being the case. */
+    int ii = 0;
+    size_t saved_offset[10] = {0}, saved_remaining[10] = {0};
+    size_t saved_length_before[10] = {0}, saved_length_after[10] = {0};
     while (bytes_remaining > 0) {
+        saved_offset[ii] = offset;
+        saved_remaining[ii] = bytes_remaining;
         /* allocate/initialize a fragment */
         MCA_PML_OB1_RDMA_FRAG_ALLOC(frag);
         if (OPAL_UNLIKELY(NULL == frag)) {
@@ -752,16 +757,18 @@ void mca_pml_ob1_recv_request_progress_rget( mca_pml_ob1_recv_request_t* recvreq
         } else {
             frag->rdma_length = bytes_remaining;
         }
-
+        saved_length_before[ii] = frag->rdma_length;
         /* NTH: TODO -- handle error conditions gracefully */
         rc = mca_pml_ob1_recv_request_get_frag(frag);
         if (OMPI_SUCCESS != rc) {
             break;
         }
-
+        saved_length_after[ii] = frag->rdma_length;
         prev_sent = frag->rdma_length;
+        assert(prev_sent <= bytes_remaining);
         bytes_remaining -= prev_sent;
         offset += prev_sent;
+        ii++;
     }
 }

As a side note, unlike all the other supported protocols the current implementation of the RGET completely ignores any pipelining setting provided to the OB1 PML.

jsquyres · 2017-07-11T14:39:01Z

Per discussion at Chicago July 2017 meeting, @hjelmn will look at this in the immediate future.

jjhursey · 2017-07-11T18:20:44Z

@hjelmn In case it helps, Aboorva found that if we disable the cma with (-mca btl_vader_single_copy_mechanism none) then this test passes clean.

hjelmn · 2017-07-11T18:24:47Z

Whats funny is this fails miserably for me on my mac :-/

hjelmn · 2017-07-11T18:24:56Z

No vader RDMA there.

hjelmn · 2017-07-11T18:26:15Z

setting btl_vader_single_copy_mechanism=none gives the same behavior on linux as on my mac.

hjelmn · 2017-07-11T18:32:55Z

My thinking is this is either an pml/ob1 or osc/pt2pt bug.

hjelmn · 2017-07-11T18:33:52Z

Though it does pass with tcp....

bosilca · 2017-07-11T18:39:35Z

It works with SM. How to you get it to fail on your laptop ?

hjelmn · 2017-07-11T18:58:30Z

This looks like an old ob1 issue. I can provide a quick workaround in vader and work on a bigger fix for the ob1 issue later.

This commit fixes a bug that occurs when the btl callback happens before the rget returns. In this case the fragment has been returned and is no longer valid. This commit saves the size before calling rget. This is valid since the BTL is not allowed to change the read size. Fixes open-mpi#3821 Signed-off-by: Nathan Hjelm <[email protected]>

This commit fixes a bug that occurs when the btl callback happens before the rget returns. In this case the fragment has been returned and is no longer valid. This commit saves the size before calling rget. This is valid since the BTL is not allowed to change the read size. Fixes #3821 Signed-off-by: Nathan Hjelm <[email protected]>

nmorey · 2017-07-12T08:02:13Z

This should be backported to the v2.x branch too

nmorey · 2017-07-12T12:54:11Z

I don't know about 3.x but I backported the patch to 2.1.1 (only conflict is the copyright in header) and it's still broken:

[(master) nmorey@portia:openmpi]$ mpirun -np 2 --mca mtl psm2 --mca btl sm,self /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT  Unidir_Get
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2017 update 2, MPI-2 part    
#------------------------------------------------------------
# Date                  : Wed Jul 12 14:53:14 2017
# Machine               : x86_64
# System                : Linux
# Release               : 4.4.72-18.12-default
# Version               : #1 SMP Mon Jun 19 14:11:41 UTC 2017 (9c03296)
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT Unidir_Get

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Unidir_Get

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.06         0.00
            4         1000         0.62         6.44
            8         1000         0.57        13.96
           16         1000         0.57        28.26
           32         1000         0.56        57.29
           64         1000         0.56       113.99
          128         1000         0.69       184.81
          256         1000         0.58       442.15
          512         1000         0.63       810.94
         1024         1000         0.64      1605.01
         2048         1000         0.78      2638.19
         4096         1000         1.56      2624.61
         8192         1000         3.07      2665.04
        16384         1000         4.52      3622.42
        32768         1000         7.98      4106.21
        65536          640        12.99      5046.38
       131072          320        45.78      2862.79
       262144          160        50.59      5182.11
       524288           80        83.54      6275.61
      1048576           40       136.04      7707.79
      2097152           20       248.62      8435.32
      4194304           10       547.99      7653.95

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: NON-AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0          100         1.26         0.00
            4          100         4.88         0.82
            8          100         1.96         4.08
           16          100         2.00         8.01
           32          100         1.96        16.33
           64          100         1.93        33.08
          128          100         2.00        63.99
          256          100         2.42       105.91
          512          100         2.03       252.53
         1024          100         2.06       496.17
         2048          100         2.63       778.22
         4096          100         3.42      1199.08
         8192          100         6.13      1335.92
        16384          100         5.56      2946.34
        32768          100         7.12      4603.82
        65536          100        12.19      5376.75
       131072          100        21.41      6120.61
       262144          100        37.71      6951.39
       524288           80        72.40      7241.67
      1048576           40       137.73      7613.30
      2097152           20       308.24      6803.73
      4194304           10       752.77      5571.79


# All processes entering MPI_Finalize

[(master) nmorey@portia:openmpi]$ mpirun -np 2 --mca mtl psm2 --mca btl vader,self /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT  Unidir_Get
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2017 update 2, MPI-2 part    
#------------------------------------------------------------
# Date                  : Wed Jul 12 14:53:21 2017
# Machine               : x86_64
# System                : Linux
# Release               : 4.4.72-18.12-default
# Version               : #1 SMP Mon Jun 19 14:11:41 UTC 2017 (9c03296)
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/mpi/gcc/openmpi2/tests/IMB/IMB-EXT Unidir_Get

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Unidir_Get

#---------------------------------------------------
# Benchmarking Unidir_Get 
# #processes = 2 
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.08         0.00
            4         1000         1.34         2.98
            8         1000         1.34         5.99
           16         1000         1.38        11.62
           32         1000         1.01        31.70
           64         1000         1.31        48.82
          128         1000         9.58        13.36
          256         1000         1.46       175.63
          512         1000         1.30       394.64
         1024         1000         0.55      1858.98
         2048         1000        12.22       167.53
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709550864, errno = 22
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709547536, errno = 22
[portia:19304] Read -1, expected 18446744073709550864, errno = 22
         4096         1000        10.30       397.49
[portia:19304] Read -1, expected 18446744073709543440, errno = 22
[portia:19304] Read -1, expected 18446744073709543440, errno = 22
[portia:19304] Read -1, expected 18446744073709550864, errno = 22

wlepera · 2017-07-12T13:28:01Z

@nmorey, did you take both #3844 and #3846? I think both are needed. Our tests with both allow the testcase to pass.

jjhursey · 2017-07-12T13:54:04Z

I'm setting up some builds this morning to verify for the release branches. It should impact v3.0.x, v2.x but not v2.0.x from my previous testing.

nmorey · 2017-07-12T13:59:15Z

@wlepera I only took #3844 as #3846 was not referenced in this bug.
I'll try with both fixes.

Anyway both will need to be backported to v2.x

This commit fixes a bug that occurs when the btl callback happens before the rget returns. In this case the fragment has been returned and is no longer valid. This commit saves the size before calling rget. This is valid since the BTL is not allowed to change the read size. Fixes open-mpi#3821 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit e73ab93) Signed-off-by: Nathan Hjelm <[email protected]>

jjhursey · 2017-07-12T15:10:13Z

@nmorey There is a comment here about the pair of commits: #3845 (comment)

Nathan has filed PRs to push this fix to the release branches:

Fix v3.0.x ob1 issues #3852
Fix v2.x ob1 issues #3850
Fix v2.0.x ob1 issues #3851 (this one might not be needed)

jjhursey changed the title ~~about openmpi vader~~ vader fails IMB-EXT Unidir_Get test Jul 6, 2017

jjhursey added the bug label Jul 6, 2017

jjhursey assigned hjelmn, jjhursey, nysal, mathbird and wlepera Jul 6, 2017

bwbarrett added Severity: blocker Target: v3.0.x labels Jul 6, 2017

bwbarrett added this to the v3.0.0 milestone Jul 6, 2017

bwbarrett added the Target: v2.x label Jul 6, 2017

bwbarrett modified the milestones: v2.1.2, v3.0.0 Jul 6, 2017

jsquyres added Target: v2.0.x Target: v3.1.x labels Jul 11, 2017

hjelmn mentioned this issue Jul 11, 2017

pml/ob1: do not access fragment after calling btl rget #3844

Merged

hjelmn closed this as completed in #3844 Jul 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vader fails IMB-EXT Unidir_Get test #3821

vader fails IMB-EXT Unidir_Get test #3821

mathbird commented Jul 6, 2017 •

edited by jjhursey

Loading

jjhursey commented Jul 6, 2017

jjhursey commented Jul 6, 2017

mathbird commented Jul 6, 2017

bwbarrett commented Jul 6, 2017

jjhursey commented Jul 6, 2017

hjelmn commented Jul 6, 2017

hjelmn commented Jul 6, 2017

hjelmn commented Jul 6, 2017

jjhursey commented Jul 6, 2017

hjelmn commented Jul 6, 2017

jjhursey commented Jul 6, 2017

jjhursey commented Jul 6, 2017

hjelmn commented Jul 6, 2017

hjelmn commented Jul 6, 2017

mathbird commented Jul 6, 2017

jjhursey commented Jul 6, 2017

mathbird commented Jul 7, 2017 •

edited

Loading

wlepera commented Jul 7, 2017

bosilca commented Jul 8, 2017

jsquyres commented Jul 11, 2017

jjhursey commented Jul 11, 2017

hjelmn commented Jul 11, 2017

hjelmn commented Jul 11, 2017

hjelmn commented Jul 11, 2017 •

edited

Loading

hjelmn commented Jul 11, 2017

hjelmn commented Jul 11, 2017

bosilca commented Jul 11, 2017

hjelmn commented Jul 11, 2017

nmorey commented Jul 12, 2017

nmorey commented Jul 12, 2017

wlepera commented Jul 12, 2017

jjhursey commented Jul 12, 2017

nmorey commented Jul 12, 2017

jjhursey commented Jul 12, 2017

vader fails IMB-EXT Unidir_Get test #3821

vader fails IMB-EXT Unidir_Get test #3821

Comments

mathbird commented Jul 6, 2017 • edited by jjhursey Loading

jjhursey commented Jul 6, 2017

jjhursey commented Jul 6, 2017

mathbird commented Jul 6, 2017

bwbarrett commented Jul 6, 2017

jjhursey commented Jul 6, 2017

hjelmn commented Jul 6, 2017

hjelmn commented Jul 6, 2017

hjelmn commented Jul 6, 2017

jjhursey commented Jul 6, 2017

hjelmn commented Jul 6, 2017

jjhursey commented Jul 6, 2017

jjhursey commented Jul 6, 2017

hjelmn commented Jul 6, 2017

hjelmn commented Jul 6, 2017

mathbird commented Jul 6, 2017

jjhursey commented Jul 6, 2017

mathbird commented Jul 7, 2017 • edited Loading

wlepera commented Jul 7, 2017

bosilca commented Jul 8, 2017

jsquyres commented Jul 11, 2017

jjhursey commented Jul 11, 2017

hjelmn commented Jul 11, 2017

hjelmn commented Jul 11, 2017

hjelmn commented Jul 11, 2017 • edited Loading

hjelmn commented Jul 11, 2017

hjelmn commented Jul 11, 2017

bosilca commented Jul 11, 2017

hjelmn commented Jul 11, 2017

nmorey commented Jul 12, 2017

nmorey commented Jul 12, 2017

wlepera commented Jul 12, 2017

jjhursey commented Jul 12, 2017

nmorey commented Jul 12, 2017

jjhursey commented Jul 12, 2017

mathbird commented Jul 6, 2017 •

edited by jjhursey

Loading

mathbird commented Jul 7, 2017 •

edited

Loading

hjelmn commented Jul 11, 2017 •

edited

Loading