ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1060 3GB (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
llm_load_tensors: ggml ctx size =    0.30 MiB
ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(1431568384)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +1.33 GiB device at 0x1e48b38af00. Total device: 1.33 GiB, total host: 0 B
llm_load_tensors: offloading 8 repeating layers to GPU
llm_load_tensors: offloaded 8/33 layers to GPU
llm_load_tensors: NVIDIA GeForce GTX 1060 3GB buffer size =  1365.25 MiB
llm_load_tensors:        CPU buffer size =  6282.97 MiB
ggml_vulkan memory: ggml_vk_ensure_sync_staging_buffer(16384)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +16.00 KiB host at 0x1e48b38a600. Total device: 1.33 GiB, total host: 16.00 KiB
ggml_vulkan memory: ggml_vk_ensure_sync_staging_buffer(13762560)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: -16.00 KiB host at 0x1e48b38a600. Total device: 1.33 GiB, total host: 0 B
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +13.12 MiB host at 0x1e48b38aa80. Total device: 1.33 GiB, total host: 13.12 MiB
ggml_vulkan memory: ggml_vk_ensure_sync_staging_buffer(48168960)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: -13.12 MiB host at 0x1e48b38aa80. Total device: 1.33 GiB, total host: 0 B
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +45.94 MiB host at 0x1e48b38a300. Total device: 1.33 GiB, total host: 45.94 MiB
.........................................................................................
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 4096
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(268435456)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +256.00 MiB device at 0x1e48b38aa80. Total device: 1.58 GiB, total host: 45.94 MiB
llama_kv_cache_init: NVIDIA GeForce GTX 1060 3GB KV buffer size =   256.00 MiB
ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(805306368)
ggml_vulkan memory: ggml_vk_host_malloc(805306400)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +768.00 MiB host at 0x1e48b38a900. Total device: 1.58 GiB, total host: 813.94 MiB
llama_kv_cache_init: Vulkan_Host KV buffer size =   768.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(513024)
ggml_vulkan memory: ggml_vk_host_malloc(513056)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +501.03 KiB host at 0x1e48b38a180. Total device: 1.58 GiB, total host: 814.43 MiB
llama_new_context_with_model: Vulkan_Host  output buffer size =     0.49 MiB
ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(701997056)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +669.48 MiB device at 0x1e48b38a600. Total device: 2.24 GiB, total host: 814.43 MiB
ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(58726400)
ggml_vulkan memory: ggml_vk_host_malloc(58726432)
ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +56.01 MiB host at 0x1e48b38a480. Total device: 2.24 GiB, total host: 870.43 MiB
llama_new_context_with_model: NVIDIA GeForce GTX 1060 3GB compute buffer size =   669.48 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =    56.01 MiB
llama_new_context_with_model: graph nodes  = 903
llama_new_context_with_model: graph splits = 262
warming up the model with an empty run