Bug #63380: Use PHP's allocator for libxml #223
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Allocation via libxml does not use PHP's per-request allocator. So any memory used by libxml will not be accounted against memory_get_usage() or memory_limit.
At Wikimedia we use libxml DOM trees to store wikitext parse trees, because they are more compact in memory than the equivalent pure-PHP data structures. When these parse trees are cached, the memory requirements can become excessive, and the memory is typically not returned to the system after request termination. Using xmlMemSetup() to set hook functions which use PHP's per-request allocation functions will allow us to more effectively monitor and limit the use of libxml in production.
The task is somewhat complicated by the fact that libxml does not use a different hook for persistent and request-local allocations. So we use the persistent allocator during MINIT, and take extra care to ensure all libxml globals are initialised during MINIT, rather than lazy-initialised on the first call to the relevant module.
The typical consequences of the initialisation of a global pointer during request execution would be a dangling pointer after deactivate, which is exploitable for memory corruption during subsequent requests. I used gdb's "info variables" command to audit all global variables in libxml, as configured in the Ubuntu package.
I tested the code with make test, and with server-tests.php against Apache with "worker" MPM. I also tested it with and without LIBXML_THREAD_ALLOC_ENABLED. There was a bug in libxml2 which prevented it from working with LIBXML_THREAD_ALLOC_ENABLED. I submitted a patch:
https://bugzilla.gnome.org/show_bug.cgi?id=687084
That configuration option appears to be rarely used.