Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot generate text on GPU #199

Open
congson1293 opened this issue Jan 2, 2024 · 3 comments
Open

Cannot generate text on GPU #199

congson1293 opened this issue Jan 2, 2024 · 3 comments

Comments

@congson1293
Copy link

I load the model to GPU like this:

llm = AutoModelForCausalLM.from_pretrained("LLM-model", 
                                            model_file="vinallama-7b-chat_q5_0.gguf",
                                            config=config, torch_dtype=torch.float16, hf=True,
                                            gpu_layers = 100, device_map='cuda')

and generate code like this:

generated_ids = llm.generate(**model_inputs,
                              max_new_tokens=4096,
                              # early_stopping=True,
                              repetition_penalty=1.1,
                              # no_repeat_ngram_size=2,
                              temperature=0.6,
                              do_sample=True,
                              # top_k=5,
                              top_p=0.9,
                              eos_token_id=tokenizer.eos_token_id,
                              pad_token_id=tokenizer.pad_token_id,
                              use_cache=True)

when I ran this code, the model got 6.6Gb on GPU. But I've got the exception: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cuda, whereas the model is on cpu. when ran generate method.
Does anyone know the way to fix it?

@Michielo1
Copy link

What device is model_inputs on?

@congson1293
Copy link
Author

I use T4 on Google Colab.

@davidearlyoung
Copy link

Double check your code parts against the gold standard examples at: https://github.com/marella/ctransformers?tab=readme-ov-file#classmethod-automodelforcausallmfrom_pretrained

Do the same for the generate method of your llm class. Here is the gold standard reference for that as well: https://github.com/marella/ctransformers?tab=readme-ov-file#classmethod-automodelforcausallmfrom_pretrained

It looks like you have a lot of unnecessary arguments that you are mimicking from other libraries. Especially for the from_pretrained method call.

Hope that this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants