[Bug]: Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'} #6897

boxiaowave · 2024-07-29T13:04:42Z

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

When using vllm 0.5.3 and 0.5.3.post1 to deploy deepseek coder 6.7b, which has a rope_scaling configuration in the config.json, a message appears "Unrecognized keys in rope_scaling for 'rope_type'='linear': {'type'}", not sure which script raised it.

serve code:

/usr/bin/python3 -m vllm.entrypoints.openai.api_server
--host ::
--port "${PORT0}"
--model $SERVER_PATH/"${MODEL_NAME}"
--served-model-name $SERVED_MODEL_NAME
--tensor-parallel-size "${GPU_NUM}"
--tokenizer $SERVER_PATH/"${MODEL_NAME}"
--max-model-len $MAX_LENGTH
--gpu-memory-utilization 0.9
--speculative-model "[ngram]"
--ngram-prompt-lookup-max 3
--ngram-prompt-lookup-min 1
--num-speculative-tokens 5
--use-v2-block-manager
--enable-prefix-caching
--trust-remote-code
--dtype auto

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-07-29T14:25:24Z

Please update your version of transformers.

boxiaowave · 2024-07-30T02:20:20Z

Please update your version of transformers.

I have already used the latest version of 4.43.3

DarkLight1337 · 2024-07-30T02:33:08Z

Can you post the full stack trace so we can investigate?

boxiaowave · 2024-07-30T04:44:52Z

Can you post the full stack trace so we can investigate?

According to the log, the rope_scaling has 3 keys: type, factor, rope_type, however, not sure which code check it.

INFO 07-30 12:08:56 api_server.py:219] vLLM API server version 0.5.3
INFO 07-30 12:08:56 api_server.py:220] args: Namespace(host='::', port=9377, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model=‘xx’, tokenizer=‘xx’, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=6000, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=True, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling={'type': 'linear', 'factor': 4.0}, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model='[ngram]', num_speculative_tokens=5, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=3, ngram_prompt_lookup_min=1, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['ds_model'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'}
INFO 07-30 12:08:56 config.py:68] Updating rope_scaling from {'factor': 4.0, 'type': 'linear', 'rope_type': 'linear'} to {'type': 'linear', 'factor': 4.0}
INFO 07-30 12:08:56 gptq_marlin.py:87] The model is convertible to gptq_marlin during runtime. Using gptq_marlin kernel.
INFO 07-30 12:08:56 llm_engine.py:176] Initializing an LLM engine (v0.5.3) with config: model=‘xx’, speculative_config=SpeculativeConfig(draft_model='[ngram]', num_spec_tokens=5), tokenizer=‘xx’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling={'type': 'linear', 'factor': 4.0}, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=6000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq_marlin, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=ds_model, use_v2_block_manager=True, enable_prefix_caching=True)
Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'}
INFO 07-30 12:08:57 spec_decode_worker.py:153] Configuring SpecDecodeWorker with proposer=<class 'vllm.spec_decode.ngram_worker.NGramWorker'>
INFO 07-30 12:08:57 spec_decode_worker.py:167] Configuring SpecDecodeWorker with sampler=<class 'vllm.model_executor.layers.rejection_sampler.RejectionSampler'>

DarkLight1337 · 2024-07-30T05:12:36Z

I think this is a warning message that can be safely ignored (see huggingface/transformers#32182). The model still works, right?

boxiaowave · 2024-07-30T05:23:26Z

I think this is a warning message that can be safely ignored (see huggingface/transformers#32182). The model still works, right?

Ok, now I can confirm the rope scaling is used correctly. The model works well, thanks for your help.

boxiaowave added the bug Something isn't working label Jul 29, 2024

boxiaowave closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'} #6897

[Bug]: Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'} #6897

boxiaowave commented Jul 29, 2024

DarkLight1337 commented Jul 29, 2024

boxiaowave commented Jul 30, 2024

DarkLight1337 commented Jul 30, 2024

boxiaowave commented Jul 30, 2024 •

edited by DarkLight1337

Loading

DarkLight1337 commented Jul 30, 2024

boxiaowave commented Jul 30, 2024

[Bug]: Unrecognized keys in rope_scaling for 'rope_type'='linear': {'type'} #6897

[Bug]: Unrecognized keys in rope_scaling for 'rope_type'='linear': {'type'} #6897

Comments

boxiaowave commented Jul 29, 2024

Your current environment

🐛 Describe the bug

DarkLight1337 commented Jul 29, 2024

boxiaowave commented Jul 30, 2024

DarkLight1337 commented Jul 30, 2024

boxiaowave commented Jul 30, 2024 • edited by DarkLight1337 Loading

DarkLight1337 commented Jul 30, 2024

boxiaowave commented Jul 30, 2024

[Bug]: Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'} #6897

[Bug]: Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'} #6897

boxiaowave commented Jul 30, 2024 •

edited by DarkLight1337

Loading