Improve performance of the transposition of CSR matrices #798

GuilloteauQ · 2024-07-30T08:35:50Z

Related Issue: #794

Description

This PR implements the transposition of CSR matrix as done by Scipy: https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h#L419-L462

The objective is to improve the performance of the tranpose operation for sparse (CSR) matrices.

Performance

current is the current implementation of transpose of CSR matrices
new is this scipy-based implementation
I tried with 2 scheduling schemes (STATIC and MFSC)
several matrix sizes (1_000 x 1_000, to 1_000_000 x 1_000_000 )
several sparsity values ($10^{-4}$ to $10^{-10}$)

Reproduce

The experiments have been executed on a cluster with Slurm.
The workflow is managed by Snakemake (can be installed via pip install snakemake).
The software environment is managed by Singularity and the daphne-dev Docker image

After copying the following files on your cluster and chmoding and adapting sbatch.bash to your cluster, you can start the workflow:

snakemake --latency-wait 60 -c <NUMBER_OF_TASKS_IN_PARALLEL>

where NUMBER_OF_TASKS_IN_PARALLEL is the maximum number of Snakemake tasks in parallel (and thus Slurm jobs, because the workflow submits Slurm jobs).
--latency-wait is in case there is latency on the NFS of the cluster.

At the completion of the workflow, the file data/all.csv contains all the experimental data with the following structure

implem	scheme	sparsity	matrix_size	exec_time
current	STATIC	0.0001	1000	0.0137305
...	...	...	...	...

`Snakefile`

MATRIX_SIZES = [
      1_000,
     10_000,
    100_000,
  1_000_000,
]

SPARSITY = [
  "0.0001",
  "0.00001",
  "0.000001",
  "0.0000001",
  "0.00000001",
  "0.000000001",
  "0.0000000001",
]

SCHEMES = [
  "STATIC",
  "MFSC"
]

LAYOUT = "CENTRALIZED"

MAX_ITER = 3
ITERS = range(0, MAX_ITER)

IMPLEMS = [
  "current",
  "new"
]

COMMITS = {
  "current": "4e96943453c635a12898a1af55ca3efd819b81d9",
  "new": "47162f344834bc63b1d10acabf29b588092d7a22"
}

rule all:
  input:
    "data/all.csv",
    expand("data/{implem}/{scheme}/{sparsity}/{matrix_size}/{iter}.out",\
           implem=IMPLEMS,\
           scheme=SCHEMES,\
           sparsity=SPARSITY,\
           matrix_size=MATRIX_SIZES,\
           iter=ITERS)

rule run_expe:
  input:
    bin="daphne-src/{implem}/bin/daphne",
    script="bench.daph",
    sbatch="sbatch.bash",
  output:
    "data/{implem}/{scheme}/{sparsity}/{matrix_size}/{iter}.out"
  shell:
    "sbatch {input.sbatch} {wildcards.implem} {wildcards.scheme} {wildcards.sparsity} {wildcards.matrix_size} {input.bin} {input.script} {output}"


rule download_and_compile:
  input:
    singularity="daphne-dev.sif"
  output:
    "daphne-src/{implem}/bin/daphne"
  params:
    commit = lambda w: COMMITS[w.implem]
  shell:
    """
      rm -rf daphne-src/{wildcards.implem}
      mkdir -p daphne-src
      git clone https://github.com/GuilloteauQ/daphne daphne-src/{wildcards.implem}
      cd daphne-src/{wildcards.implem}
      git checkout {params.commit}
      cd ../..
      singularity exec {input.singularity} bash -c "cd daphne-src/{wildcards.implem}; ./build.sh --no-deps --installPrefix /usr/local"
    """

rule build_container:
  output:
    "daphne-dev.sif"
  shell:
    """
    singularity build {output} docker://daphneeu/daphne-dev:v0.3-rc0_X86-64_BASE_ubuntu20.04
    """

rule generate_group_csv:
  input:
    expand("data/{{implem}}/{{scheme}}/{{sparsity}}/{{matrix_size}}/{iter}.out",\
           iter=ITERS)
  output:
    "data/all/{implem}_{scheme}_{sparsity}_{matrix_size}.csv"
  shell:
    """
    cat {input} | awk '{{print "{wildcards.implem}, {wildcards.scheme}, {wildcards.sparsity}, {wildcards.matrix_size}, " $1}}' > {output}
    """

rule generate_csv:
  input:
    expand("data/all/{implem}_{scheme}_{sparsity}_{matrix_size}.csv",\
           implem=IMPLEMS,\
           scheme=SCHEMES,\
           sparsity=SPARSITY,\
           matrix_size=MATRIX_SIZES)
  output:
    "data/all.csv"
  shell:
    "cat {input} > {output}"

`bench.daph`

size=$size;
sparsity=$sparsity;
A = rand(size, size, 1.0, 1.0, sparsity, -1);
start = now();
B = t(A);
end = now();
print((end-start) / 1000000000.0);
x = mean(B);

`sbatch.bash`

#!/bin/bash

#SBATCH --job-name=daphne-csr-transpose
#SBATCH --nodes=1
#SBATCH --partition=xeon
#SBATCH --exclude=cl-node[001-004]
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#              d-hh:mm:ss
#SBATCH --time=0-00:30:00
#SBATCH --wait

set -ex

IMPLEM=$1
SCHEME=$2
SPARSITY=$3
MATRIX_SIZE=$4
BIN=$5
SCRIPT=$6
OUTPUT=$7

function signal_handler {
  echo -1 > ${SLURM_SUBMIT_DIR}/${OUTPUT}
  exit
}

trap signal_handler TERM

srun --cpus-per-task=20 singularity exec ${SLURM_SUBMIT_DIR}/daphne-dev.sif ${BIN} --vec \
                                                    --num-threads=20 \
                                                    --select-matrix-repr \
                                                    --partitioning=${SCHEME}\
                                                    --queue_layout=CENTRALIZED\
                                                    --pin-workers \
                                                    --args size=${MATRIX_SIZE}\
                                                    --args sparsity=${SPARSITY}\
                                                    ${SLURM_SUBMIT_DIR}/${SCRIPT} > ${SLURM_SUBMIT_DIR}/${OUTPUT}

exit 0

corepointer

Thx for this improvement @GuilloteauQ and the extended presentation of how to reproduce!
The code LGTM. I'll merge it later today if there's nothing more to add.

src/runtime/local/kernels/Transpose.h

corepointer · 2024-08-01T09:59:59Z

Maybe we should add a note where you ported that implementation from?

…R matrices This PR implements the transposition of CSR matrix as done by Scipy: https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h#L419-L462 The objective is to improve the performance of the tranpose operation for sparse (CSR) matrices. Closes daphne-eu#794 Closes daphne-eu#798

corepointer · 2024-08-08T20:22:37Z

Thx for addressing the minor items discussed. I tested, squashed and merged 👍

GuilloteauQ added a commit to GuilloteauQ/daphne that referenced this pull request Jul 30, 2024

now using transpose from daphne-eu#798

2786394

GuilloteauQ mentioned this pull request Jul 31, 2024

Missing Kernel for CSR/CSR Matrix Multiplication #793

Closed

corepointer approved these changes Aug 1, 2024

View reviewed changes

src/runtime/local/kernels/Transpose.h Outdated Show resolved Hide resolved

corepointer linked an issue Aug 1, 2024 that may be closed by this pull request

Performance issue for the transposition of a CSR matrix #794

Closed

corepointer removed a link to an issue Aug 1, 2024

Performance issue for the transposition of a CSR matrix #794

Closed

corepointer added the performance label for PRs of perf++ and issues of perf-- label Aug 1, 2024

corepointer added this to the v0.3 milestone Aug 1, 2024

corepointer linked an issue Aug 1, 2024 that may be closed by this pull request

Performance issue for the transposition of a CSR matrix #794

Closed

corepointer force-pushed the 794_csr_transpose_performance branch from f56d582 to d260761 Compare August 7, 2024 16:52

corepointer merged commit d260761 into daphne-eu:main Aug 7, 2024
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of the transposition of CSR matrices #798

Improve performance of the transposition of CSR matrices #798

GuilloteauQ commented Jul 30, 2024

corepointer left a comment

corepointer commented Aug 1, 2024

corepointer commented Aug 8, 2024

Improve performance of the transposition of CSR matrices #798

Improve performance of the transposition of CSR matrices #798

Conversation

GuilloteauQ commented Jul 30, 2024

Description

Performance

Reproduce

Snakefile

bench.daph

sbatch.bash

corepointer left a comment

Choose a reason for hiding this comment

corepointer commented Aug 1, 2024

corepointer commented Aug 8, 2024

`Snakefile`

`bench.daph`

`sbatch.bash`