-
Notifications
You must be signed in to change notification settings - Fork 718
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #18500 from Flamefire/20230808155858_new_pr_PyTorc…
…h1131 add patches to fix PyTorch 1.13.1 w/ foss/2022a on POWER + fix flaky `test_jit_legacy` test
- Loading branch information
Showing
4 changed files
with
102 additions
and
0 deletions.
There are no files selected for viewing
25 changes: 25 additions & 0 deletions
25
easybuild/easyconfigs/p/PyTorch/PyTorch-1.11.0_fix-fp16-quantization-without-fbgemm.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
Fix use-after free leading to random failures in nn/test_embedding | ||
on e.g. POWER platforms where FBGEMM isn't used | ||
|
||
From https://github.com/pytorch/pytorch/pull/84750 | ||
|
||
Author: Alexander Grund (TU Dresden) | ||
|
||
diff --git a/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp b/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp | ||
index 224a66f8abf..f4d018007bf 100644 | ||
--- a/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp | ||
+++ b/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp | ||
@@ -252,9 +252,10 @@ Tensor& qembeddingbag_byte_prepack_out(Tensor& output, const Tensor& weight) { | ||
} | ||
|
||
#else | ||
- const auto weight_data = weight_contig->scalar_type() == at::ScalarType::Half | ||
- ? weight_contig->to(at::ScalarType::Float).data_ptr<float>() | ||
- : weight_contig->data_ptr<float>(); | ||
+ const Tensor& float_weight = weight_contig->scalar_type() == at::ScalarType::Half | ||
+ ? weight_contig->to(at::ScalarType::Float) | ||
+ : *weight_contig; | ||
+ const auto weight_data = float_weight.data_ptr<float>(); | ||
constexpr float kEpsilon = 1e-8f; | ||
for (auto row : c10::irange(embedding_rows)) { | ||
const float* input_row = weight_data + row * embedding_cols; |
48 changes: 48 additions & 0 deletions
48
easybuild/easyconfigs/p/PyTorch/PyTorch-1.12.0_fix-EmbeddingBag-without-fbgemm.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
There is a bug in the fallback path for the case where FBGEMM isn't available (e.g. on POWER) | ||
which leads to a race condition: | ||
Data is "copied" for the full buffer while it is processed in chunks by different threads. | ||
This a) duplicates the work and b) might write incomplete/wrong data to the output. | ||
|
||
Found in failing test_embedding_bag_half_cpu_* of nn/test_embedding: | ||
ERROR: test_embedding_bag_half_cpu_int32_int32 (__main__.TestEmbeddingNNDeviceTypeCPU) | ||
---------------------------------------------------------------------- | ||
Traceback (most recent call last): | ||
File "/dev/shm/s3248973-EasyBuild/PyTorch/1.13.1/foss-2022a/pytorch-v1.13.1/test/nn/test_embedding.py", line 936, in _test_EmbeddingBag_vs_Embedding | ||
self.assertEqual(output, ref_output, atol=dtype2prec_DONTUSE[wdtype], rtol=0) | ||
File "/tmp/eb-tmp-2022a/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2470, in assertEqual | ||
assert_equal( | ||
File "/tmp/eb-tmp-2022a/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1093, in assert_equal | ||
raise error_metas[0].to_error(msg) | ||
AssertionError: Tensor-likes are not close! | ||
|
||
Mismatched elements: 1 / 4 (25.0%) | ||
Greatest absolute difference: 1.18359375 at index (1, 1) (up to 0.01 allowed) | ||
Greatest relative difference: 1.0 at index (1, 1) (up to 0 allowed) | ||
|
||
|
||
Introduced by https://github.com/pytorch/pytorch/pull/74844 | ||
|
||
Author: Alexander Grund (TU Dresden) | ||
|
||
diff --git a/aten/src/ATen/native/EmbeddingBag.cpp b/aten/src/ATen/native/EmbeddingBag.cpp | ||
index 6d8cea26f52..604ea16bace 100644 | ||
--- a/aten/src/ATen/native/EmbeddingBag.cpp | ||
+++ b/aten/src/ATen/native/EmbeddingBag.cpp | ||
@@ -246,7 +246,7 @@ index_select_add(const Tensor &select_indices, | ||
/*scale_bias=*/nullptr, | ||
/*normalize_by_lengths=*/false, | ||
/*out=*/output_data_fp32 + start_idx * ddim); | ||
- for (const auto i : c10::irange(output_size)) { | ||
+ for (const auto i : c10::irange(start_idx, end_idx)) { | ||
// Convert FP32 intermediate buffer result back to FP16 for output dtype | ||
for (const auto d : c10::irange(ddim)) { | ||
(output_data + i * ddim)[d] = static_cast<at::Half>((output_data_fp32 + ddim * i)[d]); | ||
@@ -590,7 +590,7 @@ index_select_scale_add(const Tensor &select_indices, | ||
/*scale_bias=*/nullptr, | ||
/*normalize_by_lengths=*/false, | ||
/*out=*/output_data_fp32 + start_idx * ddim); | ||
- for (const auto i : c10::irange(output_size)) { | ||
+ for (const auto i : c10::irange(start_idx, end_idx)) { | ||
// Convert FP32 intermediate buffer result back to FP16 for output dtype | ||
for (const auto d : c10::irange(ddim)) { | ||
(output_data + i * ddim)[d] = static_cast<at::Half>((output_data_fp32 + ddim * i)[d]); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
21 changes: 21 additions & 0 deletions
21
easybuild/easyconfigs/p/PyTorch/PyTorch-1.13.1_fix-flaky-jit-test.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
Especially `test_jit_legacy` seems to be flaky. | ||
https://github.com/pytorch/pytorch/commit/316ba9e6fc9e2c309c1b3785e35393b4a727b918 | ||
makes the JIT tests run serially avoiding potential races. | ||
So backport that commit. | ||
|
||
Author: Alexander Grund (TU Dresden) | ||
|
||
diff --git a/test/run_test.py b/test/run_test.py | ||
index f7c80f3f0a6..9d9b0c553e9 100755 | ||
--- a/test/run_test.py | ||
+++ b/test/run_test.py | ||
@@ -994,7 +994,8 @@ def must_serial(file: str) -> bool: | ||
"distributed" in file or | ||
file in CUSTOM_HANDLERS or | ||
file in RUN_PARALLEL_BLOCKLIST or | ||
- file in CI_SERIAL_LIST | ||
+ file in CI_SERIAL_LIST or | ||
+ file in JIT_EXECUTOR_TESTS | ||
) | ||
|
||
|