Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpirun hangs when networkingMode=mirrored #11215

Closed
1 of 2 tasks
dhtzs opened this issue Feb 27, 2024 · 9 comments
Closed
1 of 2 tasks

mpirun hangs when networkingMode=mirrored #11215

dhtzs opened this issue Feb 27, 2024 · 9 comments
Labels

Comments

@dhtzs
Copy link

dhtzs commented Feb 27, 2024

Windows Version

Microsoft Windows [Version 10.0.22631.3155]

WSL Version

2.1.3.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.146.1-2

Distro Version

Ubuntu 23.04

Other Software

mpirun (Open MPI) 4.1.4

Repro Steps

  1. Create .wslconfig file with the following content:
[wsl2]
dnsTunneling=true
networkingMode=mirrored
firewall=true
autoProxy=true

[experimental]
autoMemoryReclaim=dropcache
  1. Terminate WSL using wsl.exe --shutdown and start a new WSL instance.

  2. Compile the following C program using mpicc mpi.c -o mpi:

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define ARRAY_SIZE 16

int main(int argc, char *argv[]) {
    int rank, size;
    int A[ARRAY_SIZE] = {2,5,6,1,6,7,3,9,5,0,6,8,3,5,6,1};
    int chunk_size, remainder, local_chunk_size, local_start, local_sum, total_sum;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    chunk_size = ARRAY_SIZE / size;
    remainder = ARRAY_SIZE % size;

    if (rank < remainder) {
        local_chunk_size = chunk_size + 1;
        local_start = rank * local_chunk_size;
    } else {
        local_chunk_size = chunk_size;
        local_start = rank * chunk_size + remainder;
    }

    local_sum = 0;
    for (int i = local_start; i < local_start + local_chunk_size; i++) {
        local_sum += A[i];
    }

    MPI_Reduce(&local_sum, &total_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

    if (rank == 0) {
        printf("Total sum: %d\n", total_sum);
    }

    MPI_Finalize();
    return 0;
}
  1. Execute the program using mpirun -np 2 ./mpi and notice that mpirun hangs.

Expected Behavior

screenshot01

Actual Behavior

screenshot02

Diagnostic Logs

No response

@sadaisystems
Copy link

Same problem here

@anak1st
Copy link

anak1st commented May 15, 2024

Same problem

@chanpreetdhanjal
Copy link

Hi. Can you please collect networking logs by following the instructions below?
https://github.com/microsoft/WSL/blob/master/CONTRIBUTING.md#collect-wsl-logs-for-networking-issues

@dhtzs
Copy link
Author

dhtzs commented Jun 19, 2024

Hi @chanpreetdhanjal, thanks for the heads up. I reproduced the problem and collected the networking logs following the provided instructions. I hope they can help resolve the issue.

WslNetworkingLogs-2024-06-19_12-15-44.zip

Copy link

Diagnostic information
.wslconfig found
Detected appx version: 2.2.4.0
optional-components.txt not found

@CatalinFetoiu
Copy link
Collaborator

hello @dhtzs . thanks for reporting the issue and sending the logs! I was able to repro the issue

this is the same symptom as #10855. MPI appears to connect to 127.0.0.1:6001, because of the RST issue mentioned in the other issue, the connection hangs

I will resolve it as a duplicate of that issue

@CatalinFetoiu
Copy link
Collaborator

/dupe 10855

@wang1zhen
Copy link

Though it might be a dupe, it's still a existing problem and I feel maybe it would be better to leave it open?

@BoqianBIT
Copy link

Same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants