You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying TinyLlama model with tp_degree 1 causes the below error, when I try it with upgraded neuron sdk version 2.21. But the same code was working with neuron 2.20 SDK.
But this same code works with tp_degree 2.
from transformers_neuronx.llama.model import LlamaForSampling
from transformers_neuronx import NeuronAutoModelForCausalLM
from transformers_neuronx.config import NeuronConfig, ContinuousBatchingConfig
neuron_config_dict = {"on_device_embedding": True, "continuous_batching": ContinuousBatchingConfig(batch_size_for_shared_caches=32)}
neuron_config = NeuronConfig(**neuron_config_dict)
model_kwargs={'batch_size': 32, 'tp_degree': 1, 'n_positions': [1024], 'context_length_estimate': [128, 1024]}
model = NeuronAutoModelForCausalLM.from_pretrained("/opt/ml/model/test_neuron_vllm/local-models/TinyLlama/TinyLlama-1.1B-Chat-v1.0", neuron_config=neuron_config, **model_kwargs)
model.to_neuron()
model.save("/opt/ml/model/test_neuron_vllm/local-models/TinyLlama/TinyLlama-1.1B-Chat-v1.0/neuron-compiled-artifacts", sharded_weights=True)
Error I got
warnings.warn(SyntaxWarning(
2025-01-30 22:02:45.000040: 209 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/no-user/neuroncc_compile_workdir/cc926b2e-1267-4379-8ad1-15f19a1b08c8/model.MODULE_1b9040045ec9af3b543d+54293761.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/cc926b2e-1267-4379-8ad1-15f19a1b08c8/model.MODULE_1b9040045ec9af3b543d+54293761.neff --target=trn1 --logfile /tmp/compile.log --temp-dir=/tmp --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
2025-01-30 22:02:45.000040: 210 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.neff --target=trn1 --logfile /tmp/compile.log --temp-dir=/tmp --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:196: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2025-01-30 22:02:45.000045: 208 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/no-user/neuroncc_compile_workdir/0e4840b3-092e-424b-99b5-9b1673035637/model.MODULE_16876035ac486243dd8b+54293761.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/0e4840b3-092e-424b-99b5-9b1673035637/model.MODULE_16876035ac486243dd8b+54293761.neff --target=trn1 --logfile /tmp/compile.log --temp-dir=/tmp --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
.........root = neuronxcc/starfish/penguin/targets/codegen/BirCodeGenLoop.py
root = neuronxcc/starfish/penguin/targets/codegen
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish
2025-01-30 22:03:38.000507: 210 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.neff', '--target=trn1', '--logfile', '/tmp/compile.log', '--temp-dir=/tmp', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2025-01-30T22:03:38Z [TEN404] (_multiply.314) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[32],+,0}[2],+,32}[4],+,0}[2] - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
2025-01-30 22:03:38.000507: 210 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/3641f671-574b-4ae8-82a3-5c3c1c65d2d1/model.MODULE_fddffb1728e66550d0a6+54293761.hlo_module.pb after 0 retries.
The text was updated successfully, but these errors were encountered:
Trying TinyLlama model with tp_degree 1 causes the below error, when I try it with upgraded neuron sdk version 2.21. But the same code was working with neuron 2.20 SDK.
But this same code works with tp_degree 2.
Error I got
The text was updated successfully, but these errors were encountered: