Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyLlama compilation failed with tensor parallel degree 1 for neuron sdk 2.21 #1100

Open
sindhuvahinis opened this issue Jan 30, 2025 · 4 comments

Comments

@sindhuvahinis
Copy link

Trying TinyLlama model with tp_degree 1 causes the below error, when I try it with upgraded neuron sdk version 2.21. But the same code was working with neuron 2.20 SDK.

But this same code works with tp_degree 2.

from transformers_neuronx.llama.model import LlamaForSampling
from transformers_neuronx import NeuronAutoModelForCausalLM
from transformers_neuronx.config import NeuronConfig, ContinuousBatchingConfig

neuron_config_dict = {"on_device_embedding": True, "continuous_batching": ContinuousBatchingConfig(batch_size_for_shared_caches=32)}
neuron_config = NeuronConfig(**neuron_config_dict)
model_kwargs={'batch_size': 32, 'tp_degree': 1, 'n_positions': [1024], 'context_length_estimate': [128, 1024]}
model = NeuronAutoModelForCausalLM.from_pretrained("/opt/ml/model/test_neuron_vllm/local-models/TinyLlama/TinyLlama-1.1B-Chat-v1.0", neuron_config=neuron_config, **model_kwargs)
model.to_neuron()
model.save("/opt/ml/model/test_neuron_vllm/local-models/TinyLlama/TinyLlama-1.1B-Chat-v1.0/neuron-compiled-artifacts", sharded_weights=True)

Error I got

  warnings.warn(SyntaxWarning(
2025-01-30 22:02:45.000040:  209  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/no-user/neuroncc_compile_workdir/cc926b2e-1267-4379-8ad1-15f19a1b08c8/model.MODULE_1b9040045ec9af3b543d+54293761.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/cc926b2e-1267-4379-8ad1-15f19a1b08c8/model.MODULE_1b9040045ec9af3b543d+54293761.neff --target=trn1 --logfile /tmp/compile.log --temp-dir=/tmp --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
2025-01-30 22:02:45.000040:  210  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.neff --target=trn1 --logfile /tmp/compile.log --temp-dir=/tmp --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:196: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2025-01-30 22:02:45.000045:  208  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/no-user/neuroncc_compile_workdir/0e4840b3-092e-424b-99b5-9b1673035637/model.MODULE_16876035ac486243dd8b+54293761.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/0e4840b3-092e-424b-99b5-9b1673035637/model.MODULE_16876035ac486243dd8b+54293761.neff --target=trn1 --logfile /tmp/compile.log --temp-dir=/tmp --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
.........root = neuronxcc/starfish/penguin/targets/codegen/BirCodeGenLoop.py
root = neuronxcc/starfish/penguin/targets/codegen
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

2025-01-30 22:03:38.000507:  210  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.neff', '--target=trn1', '--logfile', '/tmp/compile.log', '--temp-dir=/tmp', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2025-01-30T22:03:38Z [TEN404] (_multiply.314) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[32],+,0}[2],+,32}[4],+,0}[2] - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

2025-01-30 22:03:38.000507:  210  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.hlo_module.pb after 0 retries.
@aws-rishyraj
Copy link
Contributor

Hi @sindhuvahinis ,

Thanks for filing the issue. We will take a look and get back to you.

@aws-rishyraj
Copy link
Contributor

We were able to reproduce the issue and the compiler team is taking a further look.

@aws-rishyraj
Copy link
Contributor

Hi @sindhuvahinis,

Compiler Team has a fix for this issue, which will be included in a future release.

@sindhuvahinis
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants