Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: GPU Core dump when running CK-W8A8GEMM Kernel on GPU ID 1,2,3,4,5,6,7 #89

Closed
tjtanaa opened this issue Feb 5, 2025 · 3 comments
Assignees

Comments

@tjtanaa
Copy link

tjtanaa commented Feb 5, 2025

Problem Description

When trying to run the kernel on inputs of GPU ID of non-zero. E.g. 1,2,3,4,5,6,7. It will throw the following error.

Memory access fault by GPU node-2 (Agent handle: 0x9b15d70) on address 0x7ee42d200000. Reason: Unknown.
tensor(False, device='cuda:1')
GPU core dump created: gpucore.10171
Aborted
root@tw024:/app# python ex.py 
Memory access fault by GPU node-2 (Agent handle: 0xa5f71a0) on address 0x7f532b800000. Reason: Unknown.
GPU core dump created: gpucore.10255
Aborted

Operating System

Ubuntu 22.04.4 LTS (Jammy Jellyfish)

CPU

AMD EPYC 9654 96-Core Processor

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.3.1

ROCm Component

composable_kernel

Steps to Reproduce

  1. Install aiter from main branch.
  2. Run the following script
from aiter.ops.gemm_op_a8w8 import gemm_a8w8_CK


import torch


SIZE_LIST = [
   (3840, 16384, 16384),
   (56, 8192, 7392)
   ]




def main():
   for size in SIZE_LIST:
       M, N, K = size
       A = torch.rand(size=(M, K), device="cuda:1").to(torch.int8)
       B = torch.rand(size=(K, N), device="cuda:1").to(torch.int8)
       scale_a = torch.ones((M, 1), device="cuda:1").to(torch.int32)
       scale_b = torch.ones((N, 1), device="cuda:1").to(torch.int32)
       result = gemm_a8w8_CK(A, B.t(), scale_a, scale_b, dtype=torch.bfloat16)


if __name__ == "__main__":
   main()

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@mawong-amd
Copy link
Contributor

A workaround for now is to call torch.cuda.set_device("cuda:1") before calling gemm_a8w8_CK.

It seems a proper fix would be to add device guards as in here.

@valarLip
Copy link
Collaborator

valarLip commented Feb 7, 2025

A workaround for now is to call torch.cuda.set_device("cuda:1") before calling gemm_a8w8_CK.

It seems a proper fix would be to add device guards as in here.

yes, this is the way i planed to fix it... thanks you did it

@tjtanaa
Copy link
Author

tjtanaa commented Feb 7, 2025

Thank you. Let us test these fixes on our end as well.

It works seemlessly now.

@tjtanaa tjtanaa closed this as completed Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants