Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dmitriim/rebase main #217

Draft
wants to merge 165 commits into
base: dev-upstream_main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
165 commits
Select commit Hold shift + click to select a range
38ada4a
Short preamble for the README, explaining why this clone exists
bertmaher Apr 17, 2024
8600c20
OSS Automated Fix: Addition of Code of Conduct (#1)
facebook-github-bot May 1, 2024
875b15f
[BACKEND][CPU] Initial plumbing for cpu backend (#2)
minjang May 2, 2024
015e50d
[BACKEND][CPU] Create TritonCPU and conversion dialects (#3)
minjang May 6, 2024
3df2a1d
Update README.md
minjang May 6, 2024
d8a8211
Convert tt.func and tt.return (#4)
minjang May 13, 2024
2752fa8
[BACKEND][CPU] Convert tt.get_program_id and tt.print (Hello World) (#1)
minjang May 14, 2024
1f61335
Quick patches to make it work after rebasing (#3)
minjang May 16, 2024
7b56e5a
Support basic lowering through vector dialect in CPU backend.
ienkovich May 2, 2024
e603b00
Revert unreviewed changes. (#5)
shanenay May 17, 2024
31ad8a1
Add a workaround for LLVM bug in codegen for bf16 vector cast. (#4)
ienkovich May 17, 2024
9ec6fa8
Prototype of the Triton CPU backend with basic compilation and execut…
ienkovich May 24, 2024
e1dd10e
Add support for tl.cat operation. (#9)
ienkovich May 28, 2024
7b183cf
[BACKEND][CPU] Make it buildable and runnable in a different environm…
minjang May 28, 2024
d6df9c1
Add support for simple reductions. (#10)
ienkovich May 29, 2024
ad823a3
Support tl.histogram for CPU. (#12)
ienkovich May 29, 2024
61a99a0
Fix merge and compile errors (#13)
minjang May 30, 2024
1c0986c
[CPU] Support flexible active driver + update vector-add tutorial (#11)
minjang May 31, 2024
0db8651
Added a simple workflow to run on self-hosted intel runner (#16)
gshimansky Jun 7, 2024
bdc9462
Fixed build and test workflow for intel self-hosted runner (#17)
gshimansky Jun 9, 2024
7922469
[CPU] Add an OpenMP-based CPU launcher (#15)
minjang Jun 10, 2024
508dff5
Support generic reduction and scan cases. (#14)
ienkovich Jun 10, 2024
e93ef5a
[CPU] Dump human-readable asm code in TRITON_CACHE_DIR (#19)
minjang Jun 11, 2024
ff40f16
Added g++ installation after switching to ubuntu-22.04 (#21)
gshimansky Jun 11, 2024
dbc68ed
Support atomic ops for CPU. (#20)
ienkovich Jun 11, 2024
2af366d
[TUTORIAL] Add unmasked matrix multiply example to triton-cpu (#23)
Kuigesi Jun 14, 2024
b8334a4
Update matrix-multiplication-cpu tutorial, use preallocated output bu…
Kuigesi Jun 15, 2024
68c9780
Fixes for x86 CI workflow (#26)
ienkovich Jun 18, 2024
45d02ad
Use static compilation for kernels. (#29)
ienkovich Jun 20, 2024
f768deb
Move byte manipulation ops from elwise ops conversion. (#28)
ienkovich Jun 20, 2024
33e4a0b
[TUTORIAL] Add the non-persistent softmax and make it for CPU (#22)
minjang Jun 20, 2024
054f1f3
Enable few more core tests for CPU. (#31)
ienkovich Jun 20, 2024
78851cb
Support tt.split for CPU. (#30)
ienkovich Jun 20, 2024
f13afde
[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1.…
Kuigesi Jun 25, 2024
6122eaf
[CPU] Add conversion for unsupported BF16 ops via target-specific sta…
ienkovich Jun 25, 2024
bc568a4
Enabled simple build&test workflow, disabled old Integration Tests wo…
gshimansky Jun 25, 2024
57bce46
[BACKEND][CPU] Specify CPU target to native for GNU/Linux Arm (#34)
Kuigesi Jun 25, 2024
b0ef7b9
Add conversions for mixed precision matmuls. (#32)
ienkovich Jul 2, 2024
ad60606
[Op support] Support 'get_num_programs' (#39)
Devjiu Jul 3, 2024
6534a26
Add fast-math option: allow fp reduction reassociation
Kuigesi Jul 8, 2024
e7bb5dc
Change the lowering option for vector.multi_reduction from InnerParal…
Kuigesi Jul 8, 2024
d713fad
Fix: TrapUnreachable is not controled by fast-math, we set it uncondi…
Kuigesi Jul 9, 2024
6a323ab
[so] Compile asm to .so as part of staged lowering (#53)
int3 Jul 17, 2024
0992b4d
Add libdevice for CPU. (#52)
ienkovich Jul 17, 2024
caf43d0
[Op support] Dot3D support (#43)
Devjiu Jul 17, 2024
848e43e
Support FP8 conversions for CPU. (#40)
ienkovich Jul 17, 2024
f578a97
[CPU] Support device_print for scalar types first (#54)
minjang Jul 18, 2024
e576346
[TUTORIAL] Add matrix vector multiplication tutorial (#46)
Kuigesi Jul 19, 2024
d845272
Fix FuncOp lowering. (#61)
ienkovich Jul 19, 2024
a587403
[CPU] Easy: remove the old initial boilerplate code (#59)
minjang Jul 19, 2024
75142b0
[Scf If types] Support conversion of types for scf::if (#45)
Devjiu Jul 22, 2024
1f0ca4a
[WA for fp16 torch.matmul] Replace torch.matmul with np.matmul (#44)
Devjiu Jul 23, 2024
dd5e3e2
[cpu] Have MulhiUI lowering support scalars (#64)
int3 Jul 23, 2024
c4ea761
[cpu] Fix formatting (#65)
int3 Jul 23, 2024
fe9f0cd
[cpu] Support tl.load(..., padding="nan") (#69)
int3 Jul 23, 2024
60690e9
[cpu] Use helpers from OptCommon.h to simplify code (#67)
int3 Jul 23, 2024
bd0ea60
[cpu] Follow up to #69 (#70)
int3 Jul 23, 2024
9508a2f
[cpu] Add runtime library for CPU kernels (#73)
int3 Jul 25, 2024
cf21e44
[FP8 tests] Enable several fp8 tests (#49)
Devjiu Jul 25, 2024
34cd5d4
[cpu] Make runtime library build on Linux too (#75)
int3 Jul 25, 2024
ffc885a
[cpu] Get more of test_random.py working (#77)
int3 Jul 26, 2024
bb57572
[FIX Pytest] Resolve 'importlib' issue (#78)
Devjiu Jul 29, 2024
679a88b
Fix importlib issues (#80)
int3 Jul 29, 2024
b3e6597
[cpu] Add test_annotations.py to CI (#81)
int3 Jul 30, 2024
21ab56b
Reduce/disable some tests on CPU for faster CI runs. (#83)
ienkovich Aug 1, 2024
af0fc06
[cpu] Don't reuse shuffle dummies (#88)
int3 Aug 5, 2024
c8b43fe
Utilize vector math functions from libmvec. (#55)
ienkovich Aug 5, 2024
e35936d
Make tl.debug_barrier() a no-op on CPU (#89)
int3 Aug 6, 2024
ee88d7e
ConvertMemoryOps should not use cf dialect (#91)
int3 Aug 6, 2024
a95b8eb
Remove registered pipelines in favor of explicit lists in python. (#93)
ienkovich Aug 7, 2024
dcc69d2
Don't use cf dialect in ConvertAtomicOps (#94)
int3 Aug 7, 2024
9aa8757
atomic_rmw ops should return original value (#95)
int3 Aug 7, 2024
9d4200e
Compute a scalar pointer for vector load instead of extracting it fro…
ienkovich Aug 7, 2024
7729074
Add pass to optimize masked loads and stores. (#96)
ienkovich Aug 7, 2024
76d9e65
Fix incorrect casts in mask optimization. (#101)
ienkovich Aug 8, 2024
668866f
Add conversion for scf.while (#103)
int3 Aug 8, 2024
cf99e44
[TUTORIAL] Add bf16 matrix vector multiplication tutorial (#90)
Kuigesi Aug 8, 2024
2c6453b
Add an option to use sleef instead of libmvec. (#104)
ienkovich Aug 9, 2024
f727802
Enable fast math by default. (#108)
ienkovich Aug 9, 2024
b6009eb
Add more libdevice lowerings (#97)
int3 Aug 9, 2024
97341f0
Enable rsqrt and floor for BF16. (#109)
ienkovich Aug 9, 2024
9e94a21
Remove specific dwarf version from -g option. (#110)
ienkovich Aug 9, 2024
29b9fdb
Enable `min_dot_size`
Devjiu Aug 13, 2024
8ff792f
[Formatting] Apply formating
Devjiu Aug 13, 2024
8e0d331
Remove is_cpu arg from do_bench. (#113)
ienkovich Aug 13, 2024
0e5fb8f
Enable few more tutorials for CPU (#114)
ienkovich Aug 13, 2024
869b5ab
Pass device type to do_bench in autotuner. (#115)
ienkovich Aug 13, 2024
96fc92e
Fix indices extraction from block pointer. (#116)
ienkovich Aug 14, 2024
d1e748c
[cpu] Rework device_print with triton_cpu.print and 1D vector printin…
minjang Aug 14, 2024
1f97f14
[Pytests] Add several suits (#106)
Devjiu Aug 14, 2024
89d1b42
Identify dot product pattern (mul followed by a sum) for bf16, and co…
Kuigesi Aug 15, 2024
e2247f2
Add optional packing for converting bf16 dot product. (#118)
Kuigesi Aug 16, 2024
5ae0b70
Add load/store scalarization through loops. (#119)
ienkovich Aug 21, 2024
b46f085
Fix typo. (#122)
ienkovich Aug 21, 2024
f83cece
Add lit tests for load/store scalarization. (#121)
ienkovich Aug 22, 2024
544ccf2
[cpu][easy] Fix compiler error on clang (#120)
minjang Aug 22, 2024
439867b
Offload a part of masks optimization to the canonicalizer.
ienkovich Aug 22, 2024
25d5d3e
Implement get_module_map for cpu backend
int3 Aug 29, 2024
eb759af
Make CPU runtime lib lookup work for Python 3.8 (#129)
int3 Aug 29, 2024
ff4b347
Implement device_assert (#126)
int3 Aug 30, 2024
75f512e
Implement isnan, isinf, signbit (#127)
int3 Aug 30, 2024
58c7d51
Vendor sleef as a submodule (#130)
int3 Aug 31, 2024
ebada24
Add test_debug_dump.py to CI (#131)
int3 Sep 1, 2024
538ed7f
Refactor MathToLibmvec pass (#135)
int3 Sep 4, 2024
24d4baf
[CPU] Add unit test for print with isSigned and several fixes (#132)
minjang Sep 4, 2024
050a5ea
Refactor math tests + select vector lib backend via kernel option (#136)
int3 Sep 4, 2024
bc99529
Vectorize expm1, sqrt, and floor using sleef (#137)
int3 Sep 4, 2024
ae0810a
Fix infinite optimization loop for mask optimization. (#138)
ienkovich Sep 4, 2024
a3ef9d4
Implement libdevice.trunc (#140)
int3 Sep 5, 2024
9a68cf0
Remove old LLVM bug workaround. (#141)
ienkovich Sep 6, 2024
6609482
Add kernel execution time measurement using hooks for do_bench (#139)
ienkovich Sep 9, 2024
9606a34
Use llvm_unreachable in cpu_runtime.cpp (#145)
minjang Sep 9, 2024
77b2d2e
Fix undefined symbole error in libTritonCPURuntime.so (#146)
minjang Sep 9, 2024
f1e3f68
[Dot3D test] Enable with lower block size (#117)
Devjiu Sep 18, 2024
ab1f1aa
Add an option to choose between default reduction lowering and our ow…
ienkovich Sep 20, 2024
d3f1b31
Fix regressions due to rebasing to the latest upstream
minjang Sep 21, 2024
819ea43
Update build-test.yml for pybind11
minjang Sep 22, 2024
73e909d
[FP8 support] Enable Float8 tests failed after rebase (#151)
Devjiu Sep 23, 2024
539e6be
Use 1-D vector reduction op to convert reduce op (#152)
Sep 27, 2024
e4588ea
[Keep materialization] Turn on meterialization (#154)
Devjiu Sep 27, 2024
1ef9001
[Scalarization/Loops generation] Refactor and new pass/interfaces int…
Devjiu Sep 30, 2024
358d3af
Lower memory ops with vector gather and scatter (#158)
Oct 14, 2024
d80f30a
Introduce DotOp lowering to AMX (#157)
ienkovich Oct 17, 2024
9fac57a
Implement more libdevice functions using extern_elementwise (#161)
int3 Oct 21, 2024
175d629
Fix compilation when ARCH_REQ_XCOMP_PERM isn't defined (#163)
int3 Oct 21, 2024
d466759
[CPU] Drop MLIR prefix in ScalarizeInterface (#164)
minjang Oct 21, 2024
efa03d9
Pad size 2 vectors to size 4 when lowering extern_elementwise ops (#162)
int3 Oct 22, 2024
4b83250
Rebase onto upstream triton ff306da26b and fix regressions
minjang Oct 22, 2024
6cf9ff0
Simple fixes to build on MacOSx (#165)
digantdesai Oct 23, 2024
e24a63a
Fix trailing null char in ulpSuffix (#166)
digantdesai Oct 23, 2024
b2f8c99
Rebase onto upstream triton 4a5431159a and fix regressions
minjang Oct 24, 2024
76b3225
[Test][Autotuner] Skip use_cuda_graph for non cuda devices (#169)
Devjiu Oct 25, 2024
c8c4bce
Add num_threads option to control threading per kernel invocation. (#…
ienkovich Oct 28, 2024
8573886
[TTC Print Memref] Simplify further multidimensional tensor printing …
Devjiu Oct 28, 2024
3d528f7
Small fixes for autotuner on CPU (#172)
ienkovich Oct 30, 2024
4a778e6
Small fixes for clang + macosx (#173)
digantdesai Oct 30, 2024
3073466
Support multi-dimensional tensor prints in CPU runtime. (#174)
ienkovich Oct 30, 2024
fbdcbfc
Fix linux-aarch64 build (#176)
desertfire Nov 11, 2024
8f5b245
Fix math tests for armv8 (#178)
digantdesai Nov 27, 2024
c0cbf97
Allow using local omp with Apple clang (#181)
digantdesai Dec 4, 2024
217591b
Add pytest.mark.cpu to two more already-passing tests (#183)
int3 Dec 6, 2024
f1a54c4
Move libdevice to third_party (#182)
int3 Dec 6, 2024
682cc03
Introduce triton_cpu.DotOp.
ienkovich Nov 22, 2024
74a3488
Fixes to use the latest LLVM.
ienkovich Oct 2, 2024
ad46864
Fix pybind11 build issue for TritonCPU.
ienkovich Dec 6, 2024
8ca15da
Use mlir::amx::TileType.
ienkovich Oct 2, 2024
fcada66
Fix formatting
ienkovich Dec 6, 2024
f2d3208
Fix test_tl_range.
ienkovich Dec 6, 2024
4361d34
Fix test_conversions.
ienkovich Dec 6, 2024
dccfd79
Disable test_block_copy with lower bound check.
ienkovich Dec 6, 2024
a509dd9
Fix isSigned and add float16 in PrintOp (#191)
minjang Dec 9, 2024
df38430
Add TritonCPU canonicalizer. (#192)
ienkovich Dec 10, 2024
90908d1
Introduce FMA lowering for DotOp. (#193)
ienkovich Dec 12, 2024
ee1bdc9
AMX lowering improvements (#194)
ienkovich Dec 12, 2024
220b95a
Fix extra-store in matmul tutorial. (#198)
ienkovich Dec 17, 2024
485d709
Remove unnecessary bounds checks. (#199)
ienkovich Dec 19, 2024
561c962
Enable armv8 CI (#195)
digantdesai Dec 21, 2024
5b430ee
Fix isSigned usage for scalar prints. (#201)
ienkovich Dec 23, 2024
5846858
Support VNNI pre-encoded input in AMX lowering. (#210)
ienkovich Jan 13, 2025
b812067
Update default target selection logic (#212)
digantdesai Jan 24, 2025
b0015d3
[OneDNN] Ukernel Backend interface (#197)
Devjiu Feb 18, 2025
ac48b1f
Add missing headers toruntime (#215)
Devjiu Feb 19, 2025
f46cf95
Allign with new LLVM version and remove deprecated calls.
Devjiu Feb 20, 2025
5f23d7b
rebase fixes
Devjiu Feb 21, 2025
95e6cbe
rebase issues
Devjiu Feb 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
name: Build and test
run-name: ${{ inputs.run_name }}

on:
workflow_dispatch:
pull_request:
branches:
- main
# You can name your branch dev-foo to get CI runs.
- 'dev-**'
push:
branches:
- main

jobs:
pre-commit:
name: Pre-commit checks
runs-on:
- glados
- intel
- x86
steps:
- name: Print inputs
run: |
echo "${{ toJSON(github.event.inputs) }}"
echo INSTALL_IPEX=${{ env.INSTALL_IPEX }}

- name: Checkout repository
uses: actions/checkout@v4

- name: Install Python 3.11
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Run pre-commit checks
run: |
pip install --upgrade pre-commit

# TODO: ignore the first yapf failure until https://github.com/google/yapf/issues/1164 is fixed
python3 -m pre_commit run --all-files --verbose yapf &> /dev/null || true
# If first run of yapf worked and made changes reset the tree to the original state
git reset --hard

python3 -m pre_commit run --show-diff-on-failure --color=always --all-files --verbose

build-test:
name: Build and test on ${{ matrix.config.runner }}
runs-on: ${{ matrix.config.runs_on }}
strategy:
matrix:
python: ['3.11']
config:
- {runner: 'Ubuntu Intel x86', runs_on: ['glados', 'intel', 'x86'], target-os: 'ubuntu', arch: 'x86'}
- {runner: 'MacOS-latest ARM64', runs_on: ['macos-latest'], target-os: 'macos', arch: 'arm64'}
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: recursive

- name: Install Python ${{ matrix.python }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}

- name: Install pip and apt dependencies
env:
RUNNER_TARGET_OS: ${{ matrix.config.target-os }}
run: |
echo "RUNNER_TARGET_OS: ${RUNNER_TARGET_OS}"
python3 -m pip install --upgrade pip
python3 -m pip install wheel cmake==3.24 ninja pytest-xdist lit pybind11
if [[ "${RUNNER_TARGET_OS}" == "ubuntu" ]]; then
sudo apt-get update
sudo apt-get install -y zlib1g-dev g++
fi
pip install torch==2.1.2

- name: Install Triton
run: |
echo "PATH is '$PATH'"
cd python
python3 -m pip install --no-build-isolation -vvv '.[tests]'

- name: Run python unit tests for MacOS Arm64
if: matrix.config.target-os == 'macos'
run: |
export CC=$(which clang)
export TRITON_DISABLE_OPENMP=1 # temporary
export TRITON_CPU_BACKEND=1

# Document some versions/flags
echo "xcode-select:"; xcode-select -p
echo "CC: ${CC}"
clang --version
echo "TRITON_DISABLE_OPENMP=${TRITON_DISABLE_OPENMP}"
echo "TRITON_CPU_BACKEND=${TRITON_CPU_BACKEND}"

# Skip bfloat16 tests for now
# We are generating bfcvt for bfloat16 tests when converting to fp32.
# This is only for Clang15, works OK for Clang16
# TODO - fix this using driver flags.
python -m pytest -s -n 32 --device cpu \
python/test/unit/language/test_core.py -m cpu -k "not bfloat16"
python -m pytest -s -n 32 --device cpu \
python/test/unit/cpu/test_math.py \
python/test/unit/cpu/test_opt.py \
python/test/unit/language/test_annotations.py \
python/test/unit/language/test_block_pointer.py \
python/test/unit/language/test_compile_errors.py \
python/test/unit/language/test_conversions.py \
python/test/unit/language/test_decorator.py \
python/test/unit/language/test_pipeliner.py \
python/test/unit/language/test_random.py \
python/test/unit/language/test_standard.py \
python/test/unit/runtime/test_autotuner.py \
python/test/unit/runtime/test_bindings.py \
python/test/unit/runtime/test_cache.py \
python/test/unit/runtime/test_driver.py \
python/test/unit/runtime/test_jit.py \
python/test/unit/runtime/test_launch.py \
python/test/unit/runtime/test_subproc.py \
python/test/unit/test_debug_dump.py \
-k "not bfloat16"

- name: Run python unit tests for Intel
if: matrix.config.target-os == 'ubuntu'
run: |
python -m pytest -s -n 32 --device cpu python/test/unit/language/test_core.py -m cpu
python -m pytest -s -n 32 --device cpu \
python/test/unit/cpu/test_math.py \
python/test/unit/cpu/test_opt.py \
python/test/unit/language/test_annotations.py \
python/test/unit/language/test_block_pointer.py \
python/test/unit/language/test_compile_errors.py \
python/test/unit/language/test_conversions.py \
python/test/unit/language/test_decorator.py \
python/test/unit/language/test_pipeliner.py \
python/test/unit/language/test_random.py \
python/test/unit/language/test_standard.py \
python/test/unit/runtime/test_autotuner.py \
python/test/unit/runtime/test_bindings.py \
python/test/unit/runtime/test_cache.py \
python/test/unit/runtime/test_driver.py \
python/test/unit/runtime/test_jit.py \
python/test/unit/runtime/test_launch.py \
python/test/unit/runtime/test_subproc.py \
python/test/unit/test_debug_dump.py

- name: Run lit tests
run: |
cd python
LIT_TEST_DIR="build/$(ls build | grep -i cmake)/test"
if [ ! -d "${LIT_TEST_DIR}" ]; then
echo "Could not find '${LIT_TEST_DIR}'" ; exit -1
fi
lit -v "${LIT_TEST_DIR}/TritonCPU"
17 changes: 10 additions & 7 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,16 @@
name: Integration Tests
on:
workflow_dispatch:
pull_request:
branches-ignore: ['llvm-**']
merge_group:
branches: [main, 'dev-**']
types: [checks_requested]
push:
branches: [main]
# Disabled automatic triggers because tests in this workflow fail to run.
# pull_request:
# # You can name your branch dev-foo to get CI runs.
# branches-ignore: ['llvm-**']
# merge_group:
# branches: [main, 'dev-**']
# types: [checks_requested]
# push:
# branches: [main]

concurrency:
group: ${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
Expand Down
16 changes: 9 additions & 7 deletions .github/workflows/integration-tests.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ name: Integration Tests

on:
workflow_dispatch:
pull_request:
branches-ignore: ['llvm-**']
merge_group:
branches: [main, 'dev-**']
types: [checks_requested]
push:
branches: [main]
# Disabled automatic triggers because tests in this workflow fail to run.
# pull_request:
# # You can name your branch dev-foo to get CI runs.
# branches-ignore: ['llvm-**']
# merge_group:
# branches: [main, 'dev-**']
# types: [checks_requested]
# push:
# branches: [main]

concurrency:
group: ${{ github.ref }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ python/triton*.egg-info/

python/triton/_C/*.pyd
python/triton/_C/*.so
python/triton/_C/*.so.*
python/triton/_C/*.dylib
python/triton/_C/*.pdb
python/triton/_C/*.exe
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "sleef"]
path = third_party/sleef
url = https://github.com/shibatch/sleef
80 changes: 80 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to make participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.

This Code of Conduct also applies outside the project spaces when there is a
reasonable belief that an individual's behavior may have a negative impact on
the project or its community.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at <[email protected]>. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
# Triton-CPU

A long-lived development branch to build an experimental CPU backend for [Triton](https://github.com/openai/triton).

This repository clones the main Triton repository, but we intend to minimize
divergences in the core (and ideally upstream anything that needs to change and
isn't too CPU-specific). Most of the CPU work should be in a backend
subdirectory (similar to how GPU vendors are supported today). We're starting
with a clone to give ourselves maximum development flexibility as this project
gets off the ground!

# How to use it?

Build it like a normal Triton, but just pass TRITON_CPU_BACKEND=1 to use the CPU backend over a GPU backend, if any.

```
TRITON_CPU_BACKEND=1 python3 tutorials/01-vector-add.py
```

**NOTE: It's still work in progress.**

---

# Upstream README

<div align="center">
<img src="https://lh5.googleusercontent.com/wzQKEsTFkrgNQO9JjhGH5wFvslJr1saLtLaJ_a6Fp_gNENpvt3VG7BmztwngU9hFJaU4CPwGiw1opQtDvTkLrxWRbO_a12Q-pdESWHgtmheIHcPbOL5ZMC4TSiJVe5ty1w=w3517" alt="Triton logo">
</div>
Expand Down
15 changes: 15 additions & 0 deletions bin/RegisterTritonDialects.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "third_party/nvidia/include/Dialect/NVGPU/IR/Dialect.h"
#include "third_party/proton/dialect/include/Dialect/Proton/IR/Dialect.h"
#include "triton/Dialect/Triton/IR/Dialect.h"
#include "triton/Dialect/TritonCPU/IR/Dialect.h"
#include "triton/Dialect/TritonGPU/IR/Dialect.h"
#include "triton/Dialect/TritonNvidiaGPU/IR/Dialect.h"

Expand All @@ -16,12 +17,17 @@
#include "triton/Dialect/TritonGPU/Transforms/Passes.h"
#include "triton/Dialect/TritonNvidiaGPU/Transforms/Passes.h"

#include "cpu/include/ScalarizePass/ScalarizeInterfaceImpl.h"
#include "cpu/include/TritonCPUToLLVM/Passes.h"
#include "cpu/include/TritonCPUTransforms/Passes.h"
#include "cpu/include/TritonToTritonCPU/Passes.h"
#include "nvidia/include/NVGPUToLLVM/Passes.h"
#include "nvidia/include/TritonNVIDIAGPUToLLVM/Passes.h"
#include "triton/Conversion/TritonGPUToLLVM/Passes.h"
#include "triton/Conversion/TritonToTritonGPU/Passes.h"
#include "triton/Target/LLVMIR/Passes.h"

#include "mlir/Dialect/AMX/AMXDialect.h"
#include "mlir/Dialect/LLVMIR/NVVMDialect.h"
#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
#include "mlir/InitAllPasses.h"
Expand Down Expand Up @@ -68,12 +74,21 @@ inline void registerTritonDialects(mlir::DialectRegistry &registry) {
mlir::triton::registerTritonAMDGPUInsertInstructionSchedHints();
mlir::triton::registerTritonAMDGPULowerInstructionSchedHints();

// CPU passes
mlir::triton::cpu::registerTritonToTritonCPUPasses();
mlir::triton::cpu::registerTritonCPUTransformsPasses();
mlir::triton::cpu::registerTritonCPUToLLVMPasses();
mlir::triton::cpu::registerTritonOpScalarizeExternalModels(registry);

// TODO: register Triton & TritonGPU passes
registry
.insert<mlir::triton::TritonDialect, mlir::cf::ControlFlowDialect,
mlir::triton::cpu::TritonCPUDialect,
mlir::triton::nvidia_gpu::TritonNvidiaGPUDialect,
mlir::triton::gpu::TritonGPUDialect, mlir::math::MathDialect,
mlir::arith::ArithDialect, mlir::scf::SCFDialect,
mlir::memref::MemRefDialect, mlir::vector::VectorDialect,
mlir::amx::AMXDialect, mlir::tensor::TensorDialect,
mlir::gpu::GPUDialect, mlir::LLVM::LLVMDialect,
mlir::NVVM::NVVMDialect, mlir::triton::nvgpu::NVGPUDialect,
mlir::triton::amdgpu::TritonAMDGPUDialect,
Expand Down
1 change: 1 addition & 0 deletions include/triton/Analysis/Utility.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Support/LLVM.h"
#include "triton/Dialect/Triton/IR/Dialect.h"
#include "triton/Dialect/TritonCPU/IR/Dialect.h"
#include "triton/Dialect/TritonGPU/IR/Dialect.h"
#include "triton/Tools/LinearLayout.h"

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
set(LLVM_TARGET_DEFINITIONS Passes.td)
mlir_tablegen(Passes.h.inc -gen-pass-decls --name TritonToTritonGPU)
add_public_tablegen_target(TritonConversionPassIncGen)
add_public_tablegen_target(TritonConversionToGPUPassIncGen)
4 changes: 2 additions & 2 deletions include/triton/Conversion/TritonToTritonGPU/Passes.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#ifndef TRITON_CONVERSION_PASSES_H
#define TRITON_CONVERSION_PASSES_H
#ifndef TRITON_CONVERSION_TO_GPU_PASSES_H
#define TRITON_CONVERSION_TO_GPU_PASSES_H

#include "triton/Conversion/TritonToTritonGPU/TritonToTritonGPUPass.h"

Expand Down
Loading
Loading