-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unload drivers with no physical devices + Refactor instance level objects to not use icd_index #1471
Unload drivers with no physical devices + Refactor instance level objects to not use icd_index #1471
Conversation
CI Vulkan-Loader build queued with queue ID 167885. |
CI Vulkan-Loader build # 2526 running. |
CI Vulkan-Loader build # 2526 passed. |
Adds a simple executable that times how long vkEnumerateInstanceExtensionProperties takes over and over to see how well the ICD preloading functions.
The loader did not unload any ICD's which contained zero physical devices, which could cause premature exhaustion of memory in some circumstances, like 32 bit applications. While the policy of the loader has been to keep things open for the duration of the instance, these ICD's don't meaningfully participate in anything due to the lack of VkPhysicalDevices. This change adds a check after vkEnumeratePhysicalDevices where pPhysicalDevices is not NULL such that all loader_icd_terms which reported zero physical devices have its vkDestroyInstance called, and removed from the loader_instance's icd_term linked list.
58a8668
to
434898f
Compare
CI Vulkan-Loader build queued with queue ID 173426. |
CI Vulkan-Loader build # 2536 running. |
CI Vulkan-Loader build queued with queue ID 173438. |
CI Vulkan-Loader build # 2537 running. |
The index must be referenced against the loader's internal index of each ICD. Instead, we should print the lib_name in the error message, making it more clear which driver the error is coming from.
Drivers resize after 32 elements, so to test that path we need to loop over instance level handle creation (surface, debug messenger, debug report). The driver unloading tests needed to create a debug report callback, so that functionality was added to the test framework, modifying VulkanFunctions with a new init function and to make InstWrapper call it when creating an instance. Modify how test_icd_version_7 operates so that by default the functions are exported which is the 'assumed' codepath. This results in a bit of duplication between version 6 & 7, but was kept so as to not modify every test. This also clarifies how a test should enable querying of the functions through vkGetInstanceProcAddr versus exporting those functions (it was combined before).
The previous way per-ICD instance level objects were accessed was using the ICD's index into an array that was allocated with the object. This solution worked while the indexes were static, but with the recent change to remove unused ICD's that is no longer the case. This commit replaces an array per object with object arrays, one for each type (surface, debug messenger, & debug report) and per ICD. That flips where the index comes from, with the instance storing an array indication which indices are used and which are free. Whenever an instance level object is created, the loader checks if there is a free index available, reusing it if available. Otherwise it resizes its own store as well as each ICD's array for that object.
2633b3f
to
dafddb6
Compare
CI Vulkan-Loader build queued with queue ID 173450. |
CI Vulkan-Loader build # 2538 running. |
CI Vulkan-Loader build # 2538 failed. |
Necessary for any functions called across dll boundaries on 32 bit windows.
CI Vulkan-Loader build queued with queue ID 174024. |
CI Vulkan-Loader build # 2539 running. |
CI Vulkan-Loader build # 2539 passed. |
@ivyl Just a notice that the revised version has been merged - the original version has critical defects requiring reverting. The current version is more complicated (and thus more liable to be buggy) but has significantly better testing to go along side it. |
Note that this PR still causes crashes on some systems. (I bisected the issue below to this PR). On a fairly vanilla Debian testing on a lenovo thinkpad with an Intel GPU I'm getting the following crash:
Is the flow for the unloading of drivers in that path the same flow as on vkDestroyInstance? |
Yes, because the drivers are being unloaded during EnumPhysDevs, the loader does the expected vkDestroyInstance tear down for that driver. That doesn't that this teardown is bug free so I will investigate this. |
Found the problem, code was doing 2 bad things.
|
Thank you for the quick investigation! I can try a fixup commit when you have one to confirm if it fixes the issue! |
Yes, PR is up. Would be great if you could try it. #1481 |
Even after the fix above, ANGLE tests still have problem on Linux Intel: |
I was able to repo with Here is a better stack: This is the line where it crashes: I attach VK_LOADER_DEBUG log: It looks suspicious to me that there are 2 lines of: Does the loader remove the driver twice from the list? |
No, that appears to be from multiple instances being created - as in the line appears in separate calls to EnumeratePhysicalDevices. That said, it could be that there are multiple createInstance calls which cause the preloaded ICD's array to be partially filled, leading to nullptr dereferences. |
Ah, so I've identified the 'bug' and why it doesn't crash locally on my machine - strcmp is UB when the passed in pointers are NULL. Which includes "not crashing", and apparently glibc for ubuntu 18 differs. Thanks for reporting, I'll have a fix up shortly. |
PR is up - would be swell if you can double check but I'm going to merge when it passes CI regardless. |
That worked! |
The loader did not unload any ICD's which contained zero physical devices, which
could cause premature exhaustion of memory in some circumstances, like 32 bit
applications. While the policy of the loader has been to keep things open for
the duration of the instance, these ICD's don't meaningfully participate in
anything due to the lack of VkPhysicalDevices.
This change adds a check after vkEnumeratePhysicalDevices where pPhysicalDevices
is not NULL such that all loader_icd_terms which reported zero physical devices
have its vkDestroyInstance called, and removed from the loader_instance's
icd_term linked list.
The previous way per-ICD instance level objects were accessed was
using the ICD's index into an array that was allocated with the object.
This solution worked while the indexes were static, but with the
recent change to remove unused ICD's that is no longer the case.
This commit replaces an array per object with object arrays, one for each
type (surface, debug messenger, & debug report) and per ICD. That flips
where the index comes from, with the instance storing an array indication
which indices are used and which are free.
Whenever an instance level object is created, the loader checks if there
is a free index available, reusing it if available. Otherwise it resizes
its own store as well as each ICD's array for that object.