You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
08:17:47.942 | logger.py:89 | ℹ️ INFO | Loading model weights took 0.7099 GB
08:17:48.407 | logger.py:89 | ℹ️ INFO | Memory profiling results: total_gpu_memory=23.64GiB initial_memory_usage=3.06GiB peak_torch_memory=1.00GiB memory_usage_post_profile=3.09GiB non_torch_memory=2.38GiB kv_cache_size=0.34GiB gpu_memory_utilization=0.16
08:17:48.625 | logger.py:89 | ℹ️ INFO | # GPU blocks: 186, # CPU blocks: 2184
08:17:48.625 | logger.py:89 | ℹ️ INFO | Maximum concurrency for 1047 tokens per request: 2.84x
08:17:56.146 | two_phase_scheduler.py:131 | ℹ️ INFO | Starting request 6c8e757e69c446f9b8589d77404d5056
08:18:00.478 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_0.
08:18:00.478 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_1.
08:18:02.471 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_1.
08:18:02.480 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_1_logits.
08:18:02.525 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_1_logits.
08:18:09.060 | performance.py:142 | ℹ️ INFO | Generation metrics | Throughput: 0.03 req/s | 3.1 tokens/s | Latency: 6875ms per second of audio generated
08:18:12.433 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_0.
08:18:12.434 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_0_logits.
08:18:12.482 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_0_logits.
08:18:12.711 | two_phase_scheduler.py:140 | ℹ️ INFO | Request 6c8e757e69c446f9b8589d77404d5056 completed
Execution Time: 18.08 seconds
[rank0]:[W1222 08:18:13.491462637 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
Error Logs
[Paste relevant error logs here, ensuring the logging level is set to DEBUG]
Environment
RunPod Pytorch 2.4.0
ID: 2obsncspfqngvl
1 x RTX 4090
9 vCPU 50 GB RAM
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
On-Demand - Community Cloud
# OS Information
uname -a
# Python version
python --version
# Installed Python packages
pip list
# GPU Information (if applicable)
nvidia-smi
# CUDA version (if applicable)
nvcc --version
Possible Solutions
[If you have ideas on how to solve the issue, include them here]
Additional Information
[Any other information you think might be helpful for diagnosing the issue]
The text was updated successfully, but these errors were encountered:
Bug Description
[Provide a clear and concise description of the bug]
Minimal Reproducible Example
Expected Behavior
[Describe what you expected to happen]
Actual Behavior
root@0dc92bcb0092:/workspace/Auralis# python 1.py⚠️ WARNING | To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
08:17:40.755 | XTTSv2.py:75 | ℹ️ INFO | Initializing XTTSv2Engine...
08:17:42.205 | XTTSv2.py:229 | ℹ️ INFO | Initializing VLLM engine with args: AsyncEngineArgs(model='AstraMindAI/xtts2-gpt', served_model_name=None, tokenizer='AstraMindAI/xtts2-gpt', task='auto', skip_tokenizer_init=False, tokenizer_mode='auto', chat_template_text_format='string', trust_remote_code=True, allowed_local_media_path='', download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, seed=0, max_model_len=1047, worker_use_ray=False, distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=True, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.1571125854166545, max_num_batched_tokens=10470, max_num_seqs=10, max_logprobs=20, disable_log_stats=True, revision=None, code_revision=None, rope_scaling=None, rope_theta=None, hf_overrides=None, tokenizer_revision=None, quantization=None, enforce_eager=True, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt={'audio': 1}, mm_processor_kwargs=None, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, fully_sharded_loras=False, lora_extra_vocab_size=256, long_lora_scaling_factors=None, lora_dtype='auto', max_cpu_loras=None, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config=None, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, guided_decoding_backend='outlines', speculative_model=None, speculative_model_quantization=None, speculative_draft_tensor_parallel_size=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, qlora_adapter_name_or_path=None, disable_logprobs_during_spec_decoding=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, disable_log_requests=False)
08:17:43.615 | logger.py:89 | ℹ️ INFO | Downcasting torch.float32 to torch.float16.
08:17:43.616 | logger.py:89 |
08:17:43.616 | logger.py:89 | ℹ️ INFO | Initializing an LLM engine (v0.6.4.post1) with config: model='AstraMindAI/xtts2-gpt', speculative_config=None, tokenizer='AstraMindAI/xtts2-gpt', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=1047, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=AstraMindAI/xtts2-gpt, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=False, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
08:17:44.640 | logger.py:89 | ℹ️ INFO | Using Flash Attention backend.
08:17:44.868 | logger.py:89 | ℹ️ INFO | Starting to load model AstraMindAI/xtts2-gpt...
08:17:45.139 | logger.py:89 | ℹ️ INFO | Using model weights format ['*.safetensors']
08:17:45.366 | logger.py:89 | ℹ️ INFO | No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.27s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.27s/it]
08:17:47.942 | logger.py:89 | ℹ️ INFO | Loading model weights took 0.7099 GB
08:17:48.407 | logger.py:89 | ℹ️ INFO | Memory profiling results: total_gpu_memory=23.64GiB initial_memory_usage=3.06GiB peak_torch_memory=1.00GiB memory_usage_post_profile=3.09GiB non_torch_memory=2.38GiB kv_cache_size=0.34GiB gpu_memory_utilization=0.16
08:17:48.625 | logger.py:89 | ℹ️ INFO | # GPU blocks: 186, # CPU blocks: 2184
08:17:48.625 | logger.py:89 | ℹ️ INFO | Maximum concurrency for 1047 tokens per request: 2.84x
08:17:56.146 | two_phase_scheduler.py:131 | ℹ️ INFO | Starting request 6c8e757e69c446f9b8589d77404d5056
08:18:00.478 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_0.
08:18:00.478 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_1.
08:18:02.471 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_1.
08:18:02.480 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_1_logits.
08:18:02.525 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_1_logits.
08:18:09.060 | performance.py:142 | ℹ️ INFO | Generation metrics | Throughput: 0.03 req/s | 3.1 tokens/s | Latency: 6875ms per second of audio generated
08:18:12.433 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_0.
08:18:12.434 | logger.py:89 | ℹ️ INFO | Added request 6c8e757e69c446f9b8589d77404d5056_0_logits.
08:18:12.482 | logger.py:89 | ℹ️ INFO | Finished request 6c8e757e69c446f9b8589d77404d5056_0_logits.
08:18:12.711 | two_phase_scheduler.py:140 | ℹ️ INFO | Request 6c8e757e69c446f9b8589d77404d5056 completed
Execution Time: 18.08 seconds
[rank0]:[W1222 08:18:13.491462637 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
Error Logs
Environment
RunPod Pytorch 2.4.0
ID: 2obsncspfqngvl
1 x RTX 4090
9 vCPU 50 GB RAM
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
On-Demand - Community Cloud
Possible Solutions
[If you have ideas on how to solve the issue, include them here]
Additional Information
[Any other information you think might be helpful for diagnosing the issue]
The text was updated successfully, but these errors were encountered: