Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems #174

DragonLiu1995 · 2023-04-14T09:19:34Z

Got the same error as issue 142 -
AttributeError: module ‘triton.compiler’ has no attribute ‘OutOfResources’- after @geekypathak21's solution(see PR 1505) on getting around the problem of matmul issue of prevolta nvidia gpus with Triton library.

clxyder · 2023-04-18T02:29:37Z

Can you provide steps to reproduce the issue?

fcolecumberri · 2023-05-06T11:31:21Z

I got the same issue, my logtrace is:

INFO:Found the following quantized model: models/Aitrepreneur_stable-vicuna-13B-GPTQ-4bit-128g/stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
INFO:Using the following device map for the quantized model:
INFO:Loaded the model in 2.55 seconds.
/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/generation/utils.py:1405: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 72, in _bench
    return triton.testing.do_bench(kernel_call, percentiles=(0.5, 0.2, 0.8), rep=40)
TypeError: do_bench() got an unexpected keyword argument 'percentiles'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/USER/git_projects/text-generation-webui/modules/callbacks.py", line 73, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/home/USER/git_projects/text-generation-webui/modules/text_generation.py", line 251, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 375, in forward
    out = QuantLinearFunction.apply(x.reshape(-1, x.shape[-1]), self.qweight, self.scales, self.qzeros, self.g_idx, self.bits, self.maxq)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 287, in forward
    output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq)
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 267, in matmul248
    matmul_248_kernel[grid](input, qweight, output, scales, qzeros, g_idx, input.shape[0], qweight.shape[1], input.shape[1], bits, maxq, input.stride(0), input.stride(1), qweight.stride(0),
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 90, in run
    timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 90, in <dictcomp>
    timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 73, in _bench
    except triton.compiler.OutOfResources:
AttributeError: module 'triton.compiler' has no attribute 'OutOfResources'

I gave the model 3584MiB for vram and 32768MiB of ram.

Ph0rk0z · 2023-05-13T12:30:40Z

This is me too. And I have 24gb of ram and 96 of system ram.. I am not out of ram.

psinger · 2023-06-20T13:39:31Z

did anyone find a solution to this?

AttributeError: module 'triton.compiler' has no attribute 'OutOfResources'

yds1024 · 2023-07-19T12:17:41Z

I reinstalled triton==2.0.0 which solved the problem

sgsdxzy mentioned this issue Apr 15, 2023

Fused mlp causes assertion error #179

Open

grmrgecko mentioned this issue Jun 21, 2023

[Feature] Support Chatbot to use other LLM models such as ChatGLM-6B toverainc/willow-inference-server#84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems #174

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems #174

DragonLiu1995 commented Apr 14, 2023 •

edited

Loading

clxyder commented Apr 18, 2023

fcolecumberri commented May 6, 2023

Ph0rk0z commented May 13, 2023

psinger commented Jun 20, 2023

yds1024 commented Jul 19, 2023

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems #174

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems #174

Comments

DragonLiu1995 commented Apr 14, 2023 • edited Loading

clxyder commented Apr 18, 2023

fcolecumberri commented May 6, 2023

Ph0rk0z commented May 13, 2023

psinger commented Jun 20, 2023

yds1024 commented Jul 19, 2023

DragonLiu1995 commented Apr 14, 2023 •

edited

Loading