Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of the transposition of CSR matrices #798

Merged

Conversation

GuilloteauQ
Copy link
Contributor

Related Issue: #794

Description

This PR implements the transposition of CSR matrix as done by Scipy: https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h#L419-L462

The objective is to improve the performance of the tranpose operation for sparse (CSR) matrices.

Performance

  • current is the current implementation of transpose of CSR matrices
  • new is this scipy-based implementation
  • I tried with 2 scheduling schemes (STATIC and MFSC)
  • several matrix sizes (1_000 x 1_000, to 1_000_000 x 1_000_000 )
  • several sparsity values ($10^{-4}$ to $10^{-10}$)

grafik

Reproduce

The experiments have been executed on a cluster with Slurm.
The workflow is managed by Snakemake (can be installed via pip install snakemake).
The software environment is managed by Singularity and the daphne-dev Docker image

After copying the following files on your cluster and chmoding and adapting sbatch.bash to your cluster, you can start the workflow:

snakemake --latency-wait 60 -c <NUMBER_OF_TASKS_IN_PARALLEL>

where NUMBER_OF_TASKS_IN_PARALLEL is the maximum number of Snakemake tasks in parallel (and thus Slurm jobs, because the workflow submits Slurm jobs).
--latency-wait is in case there is latency on the NFS of the cluster.

At the completion of the workflow, the file data/all.csv contains all the experimental data with the following structure

implem scheme sparsity matrix_size exec_time
current STATIC 0.0001 1000 0.0137305
... ... ... ... ...

Snakefile

MATRIX_SIZES = [
      1_000,
     10_000,
    100_000,
  1_000_000,
]

SPARSITY = [
  "0.0001",
  "0.00001",
  "0.000001",
  "0.0000001",
  "0.00000001",
  "0.000000001",
  "0.0000000001",
]

SCHEMES = [
  "STATIC",
  "MFSC"
]

LAYOUT = "CENTRALIZED"

MAX_ITER = 3
ITERS = range(0, MAX_ITER)

IMPLEMS = [
  "current",
  "new"
]

COMMITS = {
  "current": "4e96943453c635a12898a1af55ca3efd819b81d9",
  "new": "47162f344834bc63b1d10acabf29b588092d7a22"
}

rule all:
  input:
    "data/all.csv",
    expand("data/{implem}/{scheme}/{sparsity}/{matrix_size}/{iter}.out",\
           implem=IMPLEMS,\
           scheme=SCHEMES,\
           sparsity=SPARSITY,\
           matrix_size=MATRIX_SIZES,\
           iter=ITERS)

rule run_expe:
  input:
    bin="daphne-src/{implem}/bin/daphne",
    script="bench.daph",
    sbatch="sbatch.bash",
  output:
    "data/{implem}/{scheme}/{sparsity}/{matrix_size}/{iter}.out"
  shell:
    "sbatch {input.sbatch} {wildcards.implem} {wildcards.scheme} {wildcards.sparsity} {wildcards.matrix_size} {input.bin} {input.script} {output}"


rule download_and_compile:
  input:
    singularity="daphne-dev.sif"
  output:
    "daphne-src/{implem}/bin/daphne"
  params:
    commit = lambda w: COMMITS[w.implem]
  shell:
    """
      rm -rf daphne-src/{wildcards.implem}
      mkdir -p daphne-src
      git clone https://github.com/GuilloteauQ/daphne daphne-src/{wildcards.implem}
      cd daphne-src/{wildcards.implem}
      git checkout {params.commit}
      cd ../..
      singularity exec {input.singularity} bash -c "cd daphne-src/{wildcards.implem}; ./build.sh --no-deps --installPrefix /usr/local"
    """

rule build_container:
  output:
    "daphne-dev.sif"
  shell:
    """
    singularity build {output} docker://daphneeu/daphne-dev:v0.3-rc0_X86-64_BASE_ubuntu20.04
    """

rule generate_group_csv:
  input:
    expand("data/{{implem}}/{{scheme}}/{{sparsity}}/{{matrix_size}}/{iter}.out",\
           iter=ITERS)
  output:
    "data/all/{implem}_{scheme}_{sparsity}_{matrix_size}.csv"
  shell:
    """
    cat {input} | awk '{{print "{wildcards.implem}, {wildcards.scheme}, {wildcards.sparsity}, {wildcards.matrix_size}, " $1}}' > {output}
    """

rule generate_csv:
  input:
    expand("data/all/{implem}_{scheme}_{sparsity}_{matrix_size}.csv",\
           implem=IMPLEMS,\
           scheme=SCHEMES,\
           sparsity=SPARSITY,\
           matrix_size=MATRIX_SIZES)
  output:
    "data/all.csv"
  shell:
    "cat {input} > {output}"

bench.daph

size=$size;
sparsity=$sparsity;
A = rand(size, size, 1.0, 1.0, sparsity, -1);
start = now();
B = t(A);
end = now();
print((end-start) / 1000000000.0);
x = mean(B);

sbatch.bash

#!/bin/bash

#SBATCH --job-name=daphne-csr-transpose
#SBATCH --nodes=1
#SBATCH --partition=xeon
#SBATCH --exclude=cl-node[001-004]
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#              d-hh:mm:ss
#SBATCH --time=0-00:30:00
#SBATCH --wait

set -ex

IMPLEM=$1
SCHEME=$2
SPARSITY=$3
MATRIX_SIZE=$4
BIN=$5
SCRIPT=$6
OUTPUT=$7

function signal_handler {
  echo -1 > ${SLURM_SUBMIT_DIR}/${OUTPUT}
  exit
}

trap signal_handler TERM

srun --cpus-per-task=20 singularity exec ${SLURM_SUBMIT_DIR}/daphne-dev.sif ${BIN} --vec \
                                                    --num-threads=20 \
                                                    --select-matrix-repr \
                                                    --partitioning=${SCHEME}\
                                                    --queue_layout=CENTRALIZED\
                                                    --pin-workers \
                                                    --args size=${MATRIX_SIZE}\
                                                    --args sparsity=${SPARSITY}\
                                                    ${SLURM_SUBMIT_DIR}/${SCRIPT} > ${SLURM_SUBMIT_DIR}/${OUTPUT}

exit 0

GuilloteauQ added a commit to GuilloteauQ/daphne that referenced this pull request Jul 30, 2024
Copy link
Collaborator

@corepointer corepointer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for this improvement @GuilloteauQ and the extended presentation of how to reproduce!
The code LGTM. I'll merge it later today if there's nothing more to add.

src/runtime/local/kernels/Transpose.h Outdated Show resolved Hide resolved
@corepointer corepointer linked an issue Aug 1, 2024 that may be closed by this pull request
@corepointer corepointer added the performance label for PRs of perf++ and issues of perf-- label Aug 1, 2024
@corepointer corepointer added this to the v0.3 milestone Aug 1, 2024
@corepointer
Copy link
Collaborator

Maybe we should add a note where you ported that implementation from?

@corepointer corepointer linked an issue Aug 1, 2024 that may be closed by this pull request
…R matrices

This PR implements the transposition of CSR matrix as done by Scipy: https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h#L419-L462

The objective is to improve the performance of the tranpose operation for sparse (CSR) matrices.

Closes daphne-eu#794
Closes daphne-eu#798
@corepointer corepointer force-pushed the 794_csr_transpose_performance branch from f56d582 to d260761 Compare August 7, 2024 16:52
@corepointer corepointer merged commit d260761 into daphne-eu:main Aug 7, 2024
1 of 2 checks passed
@corepointer
Copy link
Collaborator

Thx for addressing the minor items discussed. I tested, squashed and merged 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance label for PRs of perf++ and issues of perf--
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue for the transposition of a CSR matrix
2 participants