Fused mlp causes assertion error #179

sgsdxzy · 2023-04-15T12:07:21Z

After c90adef, when fused_mlp is enabled, I got the following error:

python: /opt/conda/conda-bld/torchtriton_1677881345124/work/lib/Analysis/Allocation.cpp:42: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(const mlir::Attribute&, const mlir::Attribute&): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
Aborted (core dumped)

My gpu is 2080 Ti, which as a Turing, so I think it's not the same as #174

The text was updated successfully, but these errors were encountered:

TitanSneaker · 2023-04-25T08:07:51Z

Same problem：

CUDA_VISIBLE_DEVICES=0 python llama_inference.py ./llama-hf/llama-7b --load llama7b-4bit-128g.pt --text "this is llama" --wbits 4 --groupsize 128
Loading model ...
Found 3 unique KN Linear values.
Warming up autotune cache ...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:30<00:00,  2.52s/it]
Found 1 unique fused mlp KN values.
Warming up autotune cache ...
  0%|                                                                                                                                                                                 | 0/12 [00:00<?, ?it/s]
python: /project/lib/Analysis/Allocation.cpp:42: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(const mlir::Attribute&, const mlir::Attribute&): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
Aborted (core dumped)

penlu · 2023-04-27T21:09:36Z

I experience the same problem (identical error message) running 5168950 on 2080 Ti. Disabling fused_mlp succeeds as a workaround for me.

929359291 · 2023-05-31T06:36:07Z

I experience the same problem (identical error message) running 5168950 on 2080 Ti. Disabling fused_mlp succeeds as a workaround for me.

hi man，how Disabling fused_mlp? my system is centos.

ereish64 · 2023-06-01T13:33:28Z

I experience the same problem (identical error message) running 5168950 on 2080 Ti. Disabling fused_mlp succeeds as a workaround for me.

hi man，how Disabling fused_mlp? my system is centos.

At line 279 in llama.py, change fused_mlp=True in load_quant to fused_mlp=False

shirley-wu · 2023-06-02T12:12:21Z

Same problem. Disabling fused_mlp works for me. Note: use .pt file, not .safetensors; for some reason .safetensors still triggers the error

This was referenced Apr 15, 2023

It's very likely that GPTQ's code won't work. oobabooga/text-generation-webui#1224

Closed

Update to support GPTQ triton commit c90adef oobabooga/text-generation-webui#1229

Merged

edwardzjl mentioned this issue May 4, 2023

Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed. #217

Open

Ted8000 mentioned this issue Jun 14, 2023

run quant model error, mma layout conversion lm-sys/FastChat#1684

Closed

alanxmay mentioned this issue Jul 1, 2023

Failed to load GPTQ-for-LLaMa lm-sys/FastChat#1804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fused mlp causes assertion error #179

Fused mlp causes assertion error #179

sgsdxzy commented Apr 15, 2023

TitanSneaker commented Apr 25, 2023 •

edited

Loading

penlu commented Apr 27, 2023

929359291 commented May 31, 2023

ereish64 commented Jun 1, 2023

shirley-wu commented Jun 2, 2023

Fused mlp causes assertion error #179

Fused mlp causes assertion error #179

Comments

sgsdxzy commented Apr 15, 2023

TitanSneaker commented Apr 25, 2023 • edited Loading

penlu commented Apr 27, 2023

929359291 commented May 31, 2023

ereish64 commented Jun 1, 2023

shirley-wu commented Jun 2, 2023

TitanSneaker commented Apr 25, 2023 •

edited

Loading