Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems #174

Open
DragonLiu1995 opened this issue Apr 14, 2023 · 5 comments

Comments

@DragonLiu1995
Copy link

DragonLiu1995 commented Apr 14, 2023

Got the same error as issue 142 -
AttributeError: module ‘triton.compiler’ has no attribute ‘OutOfResources’- after @geekypathak21's solution(see PR 1505) on getting around the problem of matmul issue of prevolta nvidia gpus with Triton library.

@clxyder
Copy link

clxyder commented Apr 18, 2023

Can you provide steps to reproduce the issue?

@fcolecumberri
Copy link

I got the same issue, my logtrace is:

INFO:Found the following quantized model: models/Aitrepreneur_stable-vicuna-13B-GPTQ-4bit-128g/stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
INFO:Using the following device map for the quantized model:
INFO:Loaded the model in 2.55 seconds.
/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/generation/utils.py:1405: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 72, in _bench
    return triton.testing.do_bench(kernel_call, percentiles=(0.5, 0.2, 0.8), rep=40)
TypeError: do_bench() got an unexpected keyword argument 'percentiles'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/USER/git_projects/text-generation-webui/modules/callbacks.py", line 73, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/home/USER/git_projects/text-generation-webui/modules/text_generation.py", line 251, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 375, in forward
    out = QuantLinearFunction.apply(x.reshape(-1, x.shape[-1]), self.qweight, self.scales, self.qzeros, self.g_idx, self.bits, self.maxq)
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/USER/git_projects/text-generation-webui/venv/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 287, in forward
    output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq)
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 267, in matmul248
    matmul_248_kernel[grid](input, qweight, output, scales, qzeros, g_idx, input.shape[0], qweight.shape[1], input.shape[1], bits, maxq, input.stride(0), input.stride(1), qweight.stride(0),
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 90, in run
    timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 90, in <dictcomp>
    timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
  File "/home/USER/git_projects/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 73, in _bench
    except triton.compiler.OutOfResources:
AttributeError: module 'triton.compiler' has no attribute 'OutOfResources'

I gave the model 3584MiB for vram and 32768MiB of ram.

@Ph0rk0z
Copy link

Ph0rk0z commented May 13, 2023

This is me too. And I have 24gb of ram and 96 of system ram.. I am not out of ram.

@psinger
Copy link

psinger commented Jun 20, 2023

did anyone find a solution to this?

AttributeError: module 'triton.compiler' has no attribute 'OutOfResources'

@yds1024
Copy link

yds1024 commented Jul 19, 2023

I reinstalled triton==2.0.0 which solved the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants