ggml_vulkan: Found 1 Vulkan devices: Vulkan0: NVIDIA GeForce GTX 1060 3GB (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32 llm_load_tensors: ggml ctx size = 0.30 MiB ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(1431568384) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +1.33 GiB device at 0x1e48b38af00. Total device: 1.33 GiB, total host: 0 B llm_load_tensors: offloading 8 repeating layers to GPU llm_load_tensors: offloaded 8/33 layers to GPU llm_load_tensors: NVIDIA GeForce GTX 1060 3GB buffer size = 1365.25 MiB llm_load_tensors: CPU buffer size = 6282.97 MiB ggml_vulkan memory: ggml_vk_ensure_sync_staging_buffer(16384) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +16.00 KiB host at 0x1e48b38a600. Total device: 1.33 GiB, total host: 16.00 KiB ggml_vulkan memory: ggml_vk_ensure_sync_staging_buffer(13762560) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: -16.00 KiB host at 0x1e48b38a600. Total device: 1.33 GiB, total host: 0 B ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +13.12 MiB host at 0x1e48b38aa80. Total device: 1.33 GiB, total host: 13.12 MiB ggml_vulkan memory: ggml_vk_ensure_sync_staging_buffer(48168960) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: -13.12 MiB host at 0x1e48b38aa80. Total device: 1.33 GiB, total host: 0 B ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +45.94 MiB host at 0x1e48b38a300. Total device: 1.33 GiB, total host: 45.94 MiB ......................................................................................... llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 4096 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 1 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(268435456) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +256.00 MiB device at 0x1e48b38aa80. Total device: 1.58 GiB, total host: 45.94 MiB llama_kv_cache_init: NVIDIA GeForce GTX 1060 3GB KV buffer size = 256.00 MiB ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(805306368) ggml_vulkan memory: ggml_vk_host_malloc(805306400) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +768.00 MiB host at 0x1e48b38a900. Total device: 1.58 GiB, total host: 813.94 MiB llama_kv_cache_init: Vulkan_Host KV buffer size = 768.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(513024) ggml_vulkan memory: ggml_vk_host_malloc(513056) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +501.03 KiB host at 0x1e48b38a180. Total device: 1.58 GiB, total host: 814.43 MiB llama_new_context_with_model: Vulkan_Host output buffer size = 0.49 MiB ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(701997056) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +669.48 MiB device at 0x1e48b38a600. Total device: 2.24 GiB, total host: 814.43 MiB ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(58726400) ggml_vulkan memory: ggml_vk_host_malloc(58726432) ggml_vulkan memory: NVIDIA GeForce GTX 1060 3GB: +56.01 MiB host at 0x1e48b38a480. Total device: 2.24 GiB, total host: 870.43 MiB llama_new_context_with_model: NVIDIA GeForce GTX 1060 3GB compute buffer size = 669.48 MiB llama_new_context_with_model: Vulkan_Host compute buffer size = 56.01 MiB llama_new_context_with_model: graph nodes = 903 llama_new_context_with_model: graph splits = 262 warming up the model with an empty run