Skip to content

Commit

Permalink
[TRANSFORM] Pipeline (triton-lang#11)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jokeren authored Jul 20, 2023
1 parent f5aebb5 commit 5802f75
Show file tree
Hide file tree
Showing 3 changed files with 840 additions and 662 deletions.
11 changes: 10 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@ https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1
https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Dialect/Triton/IR/Traits.cpp#L10
* Don't call arith to LLVM conversion from MLIR
* linearize/delinearize helper have been duplicated (most likely due to layering problems). This should be merged
* Try scf.if in pipeline to replace remui when indices are at the boundaries
* Clean up waitIdx, phase, and other indices. Now there are a bunch of loop-carried variables.
https://github.com/openai/triton-hopper/blob/9453151688804ebaf8bebca38a62ada5bb343d3c/lib/Dialect/TritonGPU/Transforms/Pipeline.cpp#L166
* Get rid of the hacky mode variable in pipeline
https://github.com/openai/triton-hopper/blob/9453151688804ebaf8bebca38a62ada5bb343d3c/lib/Dialect/TritonGPU/Transforms/Pipeline.cpp#L180
* Pipeline shouldn't have special handling for hopper, ideally it is agnostic to the architecture
https://github.com/openai/triton-hopper/blob/9453151688804ebaf8bebca38a62ada5bb343d3c/lib/Dialect/TritonGPU/Transforms/Pipeline.cpp#L226

## bug fixes

Expand All @@ -25,4 +32,6 @@ https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1
* The IR output of `make_tensor_ptr` is wrong. `!tt.ptr` is ignored in the output.
https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/include/triton/Dialect/Triton/IR/TritonOps.td#L562
* We rely on the `cuda-python` package currently, which prevents us from building triton on any node without CUDA installed. We should invoke TMA related functions in our thin CUDA wrapper.
https://github.com/openai/triton-hopper/blob/b6a6b32b0ee79e93247d20c95f15fd75039a40b9/python/triton/compiler/utils.py#L3
https://github.com/openai/triton-hopper/blob/b6a6b32b0ee79e93247d20c95f15fd75039a40b9/python/triton/compiler/utils.py#L3
* Pipeline doesn't handle block ptrs correctly
* Pipeline doesn't handle TMAs correctly
Loading

0 comments on commit 5802f75

Please sign in to comment.