Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unrecognized keys in rope_scaling for 'rope_type'='linear': {'type'} #6897

Closed
boxiaowave opened this issue Jul 29, 2024 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@boxiaowave
Copy link

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

When using vllm 0.5.3 and 0.5.3.post1 to deploy deepseek coder 6.7b, which has a rope_scaling configuration in the config.json, a message appears "Unrecognized keys in rope_scaling for 'rope_type'='linear': {'type'}", not sure which script raised it.

serve code:

/usr/bin/python3 -m vllm.entrypoints.openai.api_server
--host ::
--port "${PORT0}"
--model $SERVER_PATH/"${MODEL_NAME}"
--served-model-name $SERVED_MODEL_NAME
--tensor-parallel-size "${GPU_NUM}"
--tokenizer $SERVER_PATH/"${MODEL_NAME}"
--max-model-len $MAX_LENGTH
--gpu-memory-utilization 0.9
--speculative-model "[ngram]"
--ngram-prompt-lookup-max 3
--ngram-prompt-lookup-min 1
--num-speculative-tokens 5
--use-v2-block-manager
--enable-prefix-caching
--trust-remote-code
--dtype auto

@boxiaowave boxiaowave added the bug Something isn't working label Jul 29, 2024
@DarkLight1337
Copy link
Member

Please update your version of transformers.

@boxiaowave
Copy link
Author

Please update your version of transformers.

I have already used the latest version of 4.43.3

@DarkLight1337
Copy link
Member

Can you post the full stack trace so we can investigate?

@boxiaowave
Copy link
Author

boxiaowave commented Jul 30, 2024

Can you post the full stack trace so we can investigate?

According to the log, the rope_scaling has 3 keys: type, factor, rope_type, however, not sure which code check it.

INFO 07-30 12:08:56 api_server.py:219] vLLM API server version 0.5.3
INFO 07-30 12:08:56 api_server.py:220] args: Namespace(host='::', port=9377, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model=‘xx’, tokenizer=‘xx’, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=6000, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=True, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling={'type': 'linear', 'factor': 4.0}, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model='[ngram]', num_speculative_tokens=5, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=3, ngram_prompt_lookup_min=1, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['ds_model'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'}
INFO 07-30 12:08:56 config.py:68] Updating rope_scaling from {'factor': 4.0, 'type': 'linear', 'rope_type': 'linear'} to {'type': 'linear', 'factor': 4.0}
INFO 07-30 12:08:56 gptq_marlin.py:87] The model is convertible to gptq_marlin during runtime. Using gptq_marlin kernel.
INFO 07-30 12:08:56 llm_engine.py:176] Initializing an LLM engine (v0.5.3) with config: model=‘xx’, speculative_config=SpeculativeConfig(draft_model='[ngram]', num_spec_tokens=5), tokenizer=‘xx’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling={'type': 'linear', 'factor': 4.0}, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=6000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq_marlin, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=ds_model, use_v2_block_manager=True, enable_prefix_caching=True)
Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'}
INFO 07-30 12:08:57 spec_decode_worker.py:153] Configuring SpecDecodeWorker with proposer=<class 'vllm.spec_decode.ngram_worker.NGramWorker'>
INFO 07-30 12:08:57 spec_decode_worker.py:167] Configuring SpecDecodeWorker with sampler=<class 'vllm.model_executor.layers.rejection_sampler.RejectionSampler'>

@DarkLight1337
Copy link
Member

I think this is a warning message that can be safely ignored (see huggingface/transformers#32182). The model still works, right?

@boxiaowave
Copy link
Author

I think this is a warning message that can be safely ignored (see huggingface/transformers#32182). The model still works, right?

Ok, now I can confirm the rope scaling is used correctly. The model works well, thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants