forked from tensorflow/tensorflow
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop upstream sync 20240513 #2543
Merged
i-chaochen
merged 1,259 commits into
develop-upstream
from
develop-upstream-sync-20240513
May 20, 2024
Merged
Develop upstream sync 20240513 #2543
i-chaochen
merged 1,259 commits into
develop-upstream
from
develop-upstream-sync-20240513
May 20, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
i-chaochen
commented
May 13, 2024
•
edited
Loading
edited
- update rocm_solvers.cc [ROCm] fixed rocm_solvers build tensorflow/tensorflow#68249
- two CPU unit tests due to python version e0aef0f
- ClipByValue on DT_INT64 https://github.com/ROCm/frameworks-internal/issues/8064
PiperOrigin-RevId: 631864481
PiperOrigin-RevId: 631870090
PiperOrigin-RevId: 631871121
Field inplace_operator wasn't being initialized, most likely an omission. PiperOrigin-RevId: 631875486
…mizing-move]" error when compiling gpu_fused_mha_test with clang. PiperOrigin-RevId: 631876424
Updates LLVM usage to match [3a8316216807](llvm/llvm-project@3a8316216807) PiperOrigin-RevId: 631878634
To prevent too much parallelism for non-parallel computations, we add a enqueue event to make sure next computation won't be enqueued until last one is done per user thread. In `~PyArray_Storage()`, we now release the Python GIL then destroy the underlying buffer to prevent deadlock caused by interactions between argument donations and host callbacks on CPU backend. PiperOrigin-RevId: 631887633
PiperOrigin-RevId: 631894583
PiperOrigin-RevId: 631897585
PiperOrigin-RevId: 631903263
PiperOrigin-RevId: 631915115
PiperOrigin-RevId: 631920259
…call registration instead of always linking it. PiperOrigin-RevId: 631928144
…ffle_dataset_op.cc to ensure the iterator can generate the same result after restoring PiperOrigin-RevId: 631932883
PiperOrigin-RevId: 631935435
…edSymbol. PiperOrigin-RevId: 631936160
PiperOrigin-RevId: 631940651
Fixes keras-team/tf-keras#777 PiperOrigin-RevId: 631941146
PiperOrigin-RevId: 631947318
PiperOrigin-RevId: 631952018
because the hlo->sharding() and operand->sharding() will have conflicting values for IsManualSubgroup(). PiperOrigin-RevId: 631952777
PiperOrigin-RevId: 631959980
PiperOrigin-RevId: 631967585
…e propagation PiperOrigin-RevId: 631967931
PiperOrigin-RevId: 631968244
PiperOrigin-RevId: 631973755
…:Span` by passing as a `std::vector` instead PiperOrigin-RevId: 631977962
PiperOrigin-RevId: 631979198
PiperOrigin-RevId: 633100257
I thought I could get away with not doing this, but I finally gave up. PiperOrigin-RevId: 633120834
This just matches on index switches with only unused results. Ideally this would become a canonicalization pattern instead. index_switch will be used in the reduction group lowering. PiperOrigin-RevId: 633121827
PiperOrigin-RevId: 633125572
…ump_hlo_as_* are set Imported from GitHub PR openxla/xla#12253 Without this change then when `--xla_dump_hlo_as_proto=true --xla_dump_to=/some/real/path` is set, the autotuner dumps PTX to stdout. Copybara import of the project: -- 9149682143373bd17d4dc57635ba34b146e3b358 by Olli Lupton <[email protected]>: Do not dump autotuning PTX to stdout when --xla_dump_hlo_as_* are set Merging this change closes tensorflow#12253 PiperOrigin-RevId: 633126913
This should make Scatter with BF16 type faster. PiperOrigin-RevId: 633131419
retest Ubuntu-GPU-single please |
hsharsha
approved these changes
May 17, 2024
965d9e6
to
bbdb15b
Compare
bbdb15b
to
27d0e2e
Compare
i-chaochen
commented
May 19, 2024
bool IsRocm() { | ||
return std::holds_alternative<se::RocmComputeCapability>(Capability()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnessary function and we need to remove this in next weekly-sync
retest Ubuntu-GPU-multi please |
2 similar comments
retest Ubuntu-GPU-multi please |
retest Ubuntu-GPU-multi please |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.