Develop upstream sync 20240513 #2543

i-chaochen · 2024-05-13T22:59:13Z

update rocm_solvers.cc [ROCm] fixed rocm_solvers build tensorflow/tensorflow#68249
two CPU unit tests due to python version e0aef0f
ClipByValue on DT_INT64 https://github.com/ROCm/frameworks-internal/issues/8064

PiperOrigin-RevId: 631863025

PiperOrigin-RevId: 631864481

PiperOrigin-RevId: 631870090

PiperOrigin-RevId: 631871121

Field inplace_operator wasn't being initialized, most likely an omission. PiperOrigin-RevId: 631875486

…mizing-move]" error when compiling gpu_fused_mha_test with clang. PiperOrigin-RevId: 631876424

Updates LLVM usage to match [3a8316216807](llvm/llvm-project@3a8316216807) PiperOrigin-RevId: 631878634

To prevent too much parallelism for non-parallel computations, we add a enqueue event to make sure next computation won't be enqueued until last one is done per user thread. In `~PyArray_Storage()`, we now release the Python GIL then destroy the underlying buffer to prevent deadlock caused by interactions between argument donations and host callbacks on CPU backend. PiperOrigin-RevId: 631887633

PiperOrigin-RevId: 631894583

PiperOrigin-RevId: 631897585

PiperOrigin-RevId: 631903263

PiperOrigin-RevId: 631915115

PiperOrigin-RevId: 631919121

PiperOrigin-RevId: 631920259

…call registration instead of always linking it. PiperOrigin-RevId: 631928144

…ffle_dataset_op.cc to ensure the iterator can generate the same result after restoring PiperOrigin-RevId: 631932883

PiperOrigin-RevId: 631935435

…edSymbol. PiperOrigin-RevId: 631936160

PiperOrigin-RevId: 631940651

Fixes keras-team/tf-keras#777 PiperOrigin-RevId: 631941146

PiperOrigin-RevId: 631947318

PiperOrigin-RevId: 631952018

because the hlo->sharding() and operand->sharding() will have conflicting values for IsManualSubgroup(). PiperOrigin-RevId: 631952777

PiperOrigin-RevId: 631959980

PiperOrigin-RevId: 631967585

…e propagation PiperOrigin-RevId: 631967931

PiperOrigin-RevId: 631968244

PiperOrigin-RevId: 631973755

…:Span` by passing as a `std::vector` instead PiperOrigin-RevId: 631977962

PiperOrigin-RevId: 631979198

PiperOrigin-RevId: 633100257

I thought I could get away with not doing this, but I finally gave up. PiperOrigin-RevId: 633120834

This just matches on index switches with only unused results. Ideally this would become a canonicalization pattern instead. index_switch will be used in the reduction group lowering. PiperOrigin-RevId: 633121827

PiperOrigin-RevId: 633125572

…ump_hlo_as_* are set Imported from GitHub PR openxla/xla#12253 Without this change then when `--xla_dump_hlo_as_proto=true --xla_dump_to=/some/real/path` is set, the autotuner dumps PTX to stdout. Copybara import of the project: -- 9149682143373bd17d4dc57635ba34b146e3b358 by Olli Lupton <[email protected]>: Do not dump autotuning PTX to stdout when --xla_dump_hlo_as_* are set Merging this change closes tensorflow#12253 PiperOrigin-RevId: 633126913

This should make Scatter with BF16 type faster. PiperOrigin-RevId: 633131419

i-chaochen · 2024-05-17T10:02:42Z

retest Ubuntu-GPU-single please
retest Ubuntu-GPU-multi please

…ekly-sync due to openxla/xla#9666

…nternal#8066

i-chaochen · 2024-05-19T12:49:16Z

third_party/xla/xla/service/gpu/tests/gemm_rewrite_test.cc

+  bool IsRocm() {
+    return std::holds_alternative<se::RocmComputeCapability>(Capability());
+  }
+


This is unnessary function and we need to remove this in next weekly-sync

i-chaochen · 2024-05-19T20:11:50Z

retest Ubuntu-GPU-multi please

i-chaochen · 2024-05-19T22:06:21Z

retest Ubuntu-GPU-multi please

i-chaochen · 2024-05-19T23:52:09Z

retest Ubuntu-GPU-multi please

tensorflower-gardener and others added 30 commits May 8, 2024 11:22

Reverts 604adec

4382439

PiperOrigin-RevId: 631863025

Update ops-related pbtxt files.

7a49df9

PiperOrigin-RevId: 631864481

Fix asan error due to additional error logging

eb93fb5

PiperOrigin-RevId: 631870090

Add pattern match and composite outlining for GELU.

9589a74

PiperOrigin-RevId: 631871121

Make sure all TfLiteOperator's fields are initialized at construction

3926a51

Field inplace_operator wasn't being initialized, most likely an omission. PiperOrigin-RevId: 631875486

Fix "moving a temporary object prevents copy elision [-Werror,-Wpessi…

bc89f9b

…mizing-move]" error when compiling gpu_fused_mha_test with clang. PiperOrigin-RevId: 631876424

Integrate LLVM at llvm/llvm-project@3a8316216807

8190442

Updates LLVM usage to match [3a8316216807](llvm/llvm-project@3a8316216807) PiperOrigin-RevId: 631878634

Internal change, only affects BUILD files.

c148954

PiperOrigin-RevId: 631894583

Fix stale comment on CreateSplitIntoIslandPerOpPass.

d7fcb9d

PiperOrigin-RevId: 631897585

Update ops-related pbtxt files.

064e8eb

PiperOrigin-RevId: 631903263

[PJRT] Remove method alias PjRtClient::GetFullTopologyForCompilation.

e494302

PiperOrigin-RevId: 631915115

Reverts 4382439

e8960ea

PiperOrigin-RevId: 631919121

Add missing absl random bit_gen_ref target

f5bd515

PiperOrigin-RevId: 631920259

[XLA:SPMD] Use the default way to call once and shard barrier custom-…

69b50ff

…call registration instead of always linking it. PiperOrigin-RevId: 631928144

#tf-data Restores the seed_generator_ and three seeds in global_shu…

b0def2a

…ffle_dataset_op.cc to ensure the iterator can generate the same result after restoring PiperOrigin-RevId: 631932883

[xla] NFC: Add tests for async custom call buffer assignment

3be6dbf

PiperOrigin-RevId: 631935435

Unify StreamExecutorInterface::GetSymbol and StreamExecutor::GetUntyp…

f551e4a

…edSymbol. PiperOrigin-RevId: 631936160

Update ops-related pbtxt files.

12fd812

PiperOrigin-RevId: 631940651

Configure autograph to not convert tf_keras code.

79ae121

Fixes keras-team/tf-keras#777 PiperOrigin-RevId: 631941146

Extends the TFLite Benchmarker to support multiple signatures.

50ef93f

PiperOrigin-RevId: 631947318

No public description

ff862d8

PiperOrigin-RevId: 631952018

For nested full-to-shard we must also exclude SPMDShardToFullShape

c656774

because the hlo->sharding() and operand->sharding() will have conflicting values for IsManualSubgroup(). PiperOrigin-RevId: 631952777

Remove mlir roundtrip.

1a6d038

PiperOrigin-RevId: 631959980

Update flatbuffers to 24.3.25

c17d64d

PiperOrigin-RevId: 631967585

[xla] NFC: Add tests for async custom call host offloader memory spac…

0f73d4c

…e propagation PiperOrigin-RevId: 631967931

Introduce std::complex<T> as a new DataType

2ee6025

PiperOrigin-RevId: 631968244

Update ops-related pbtxt files.

66abacb

PiperOrigin-RevId: 631973755

Fixes heap use after free in passing a python list() as an `absl:…

ed12cfa

…:Span` by passing as a `std::vector` instead PiperOrigin-RevId: 631977962

Extend internal Span with more member functions

b337e59

PiperOrigin-RevId: 631979198

tensorflower-gardener and others added 11 commits May 12, 2024 23:23

Update ops-related pbtxt files.

27222f1

PiperOrigin-RevId: 633100257

Support nested tuples.

4f9ebb3

I thought I could get away with not doing this, but I finally gave up. PiperOrigin-RevId: 633120834

Add a pattern for removing results of index_switch ops without users.

591095e

This just matches on index switches with only unused results. Ideally this would become a canonicalization pattern instead. index_switch will be used in the reduction group lowering. PiperOrigin-RevId: 633121827

Update ops-related pbtxt files.

4fd667c

PiperOrigin-RevId: 633125572

Use atomic add for BF16 types on cuda devices which support it.

d3a235f

This should make Scatter with BF16 type faster. PiperOrigin-RevId: 633131419

init merge

49e847a

resolve merge conflict

4ac7a50

fixed rocm_solvers due to upstream rocm_executor refactoring

7f4f96d

Fixed issue with semver test.

e0aef0f

skip ClipByValue on DT_INT64 due to mismatch

be1cc56

i-chaochen added 5 commits May 17, 2024 17:51

disable cudnn_fused_conv_rewriter_test but this can be enable next we…

07b3278

…ekly-sync due to openxla/xla#9666

disable gpu_offloading_test due to ROCm/frameworks-internal#7981

dcac50d

disable mlir fusion one due to ROCm/frameworks-internal#8063

f5c6f7c

disable triton_fusion_numerics_verifier_test due to ROCm/frameworks-i…

ec39c37

…nternal#8066

disable dot_bf16.hlo due to ROCm/frameworks-internal#8061

eeccd81

i-chaochen requested review from draganmladjenovic, hsharsha and jayfurmanek May 17, 2024 18:14

hsharsha approved these changes May 17, 2024

View reviewed changes

i-chaochen force-pushed the develop-upstream-sync-20240513 branch from 965d9e6 to bbdb15b Compare May 18, 2024 22:29

disable hlo ones due to ROCm/frameworks-internal#7567

27d0e2e

i-chaochen force-pushed the develop-upstream-sync-20240513 branch from bbdb15b to 27d0e2e Compare May 18, 2024 23:20

disable fp8 subtest due to ROCm/frameworks-internal#7659

cfe441c

i-chaochen commented May 19, 2024

View reviewed changes

i-chaochen merged commit f802114 into develop-upstream May 20, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop upstream sync 20240513 #2543

Develop upstream sync 20240513 #2543

i-chaochen commented May 13, 2024 •

edited

Loading

i-chaochen commented May 17, 2024

i-chaochen May 19, 2024

i-chaochen commented May 19, 2024

i-chaochen commented May 19, 2024

i-chaochen commented May 19, 2024

Develop upstream sync 20240513 #2543

Develop upstream sync 20240513 #2543

Conversation

i-chaochen commented May 13, 2024 • edited Loading

i-chaochen commented May 17, 2024

i-chaochen May 19, 2024

Choose a reason for hiding this comment

i-chaochen commented May 19, 2024

i-chaochen commented May 19, 2024

i-chaochen commented May 19, 2024

i-chaochen commented May 13, 2024 •

edited

Loading