Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug #63380: Use PHP's allocator for libxml #223

Closed
wants to merge 1 commit into from

Conversation

tstarling
Copy link
Contributor

Allocation via libxml does not use PHP's per-request allocator. So any memory used by libxml will not be accounted against memory_get_usage() or memory_limit.

At Wikimedia we use libxml DOM trees to store wikitext parse trees, because they are more compact in memory than the equivalent pure-PHP data structures. When these parse trees are cached, the memory requirements can become excessive, and the memory is typically not returned to the system after request termination. Using xmlMemSetup() to set hook functions which use PHP's per-request allocation functions will allow us to more effectively monitor and limit the use of libxml in production.

The task is somewhat complicated by the fact that libxml does not use a different hook for persistent and request-local allocations. So we use the persistent allocator during MINIT, and take extra care to ensure all libxml globals are initialised during MINIT, rather than lazy-initialised on the first call to the relevant module.

The typical consequences of the initialisation of a global pointer during request execution would be a dangling pointer after deactivate, which is exploitable for memory corruption during subsequent requests. I used gdb's "info variables" command to audit all global variables in libxml, as configured in the Ubuntu package.

I tested the code with make test, and with server-tests.php against Apache with "worker" MPM. I also tested it with and without LIBXML_THREAD_ALLOC_ENABLED. There was a bug in libxml2 which prevented it from working with LIBXML_THREAD_ALLOC_ENABLED. I submitted a patch:

https://bugzilla.gnome.org/show_bug.cgi?id=687084

That configuration option appears to be rarely used.

The task is somewhat complicated by the fact that libxml does not use a
different hook for persistent and request-local allocations. So we use
the persistent allocator during MINIT, and take extra care to ensure
all libxml globals are initialised during MINIT, rather than
lazy-initialised on the first call to the relevant module.

The typical consequences of the initialisation of a global pointer
during request execution would be a dangling pointer after deactivate,
which is exploitable for memory corruption during subsequent requests. I
used gdb's "info variables" command to audit all global variables in
libxml, as configured in the Ubuntu package.

I tested the code with make test, and with server-tests.php
against Apache with "worker" MPM. I also tested it with and without
LIBXML_THREAD_ALLOC_ENABLED. There was a bug in libxml2 which prevented
it from working with LIBXML_THREAD_ALLOC_ENABLED. I submitted a patch:

https://bugzilla.gnome.org/show_bug.cgi?id=687084

That configuration option appears to be rarely used.
@lstrojny
Copy link
Contributor

lstrojny commented Nov 4, 2012

Have you benchmarked how XML parsing/constructing performance is affected with and without the patch as I would guess that the custom allocators might slow down things a little bit?
Besides that: as your patch depends on a newer (not yet released version?) of libxml, we should definitly adjust the configure check to take that into account.

@weltling
Copy link
Contributor

Just tested your patch, looks good on linux. On windows these three tests fail for me:

ext\dom\tests\dom005.phpt
ext\xmlwriter\tests\004.phpt
ext\xmlwriter\tests\OO_004.phpt

I think it should be tested more stresful. Though there is no need to initialize HTTP and FTP clients as they're not used in the XML extensions.

@lstrojny The needed infos will be contained (or missing) in the xmlversion.h header, so configure must not to be changed.

@lstrojny
Copy link
Contributor

@tstarling ping

@lstrojny
Copy link
Contributor

@tstarling ping again

@tstarling
Copy link
Contributor Author

lstrojny: no I haven't benchmarked it. I haven't had time to work on this recently. Based on comments on the bug report, I expect this will be a configure option, off by default. That makes the performance characteristics less critical.

It doesn't require a new version of libxml, it just needs libxml to be configured without --with-thread-alloc. There's no obvious reason anyone would want to use --with-thread-alloc, and it's not enabled in the distro packages I have checked.

@smalyshev
Copy link
Contributor

While I like the idea, sounds like a lot of complications for the code that would not work in most environments due to libxml not compiled with right options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants