Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Implement some internal functions in plain PHP #18204

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Mar 31, 2025

This is a very quick PoC of @staabm's idea in #17536 (comment).

This PR implements array_find() and related functions in plain PHP, and removes the C implementation.

Idea

As seen in #17536, internal functions implemented in C are not necessarily faster than plain PHP ones. Therefore, it may be beneficial to implement internal functions in plain PHP code.

Aside from performance, this would have the following benefits:

  • Users can read the code of internal functions / IDEs can navigate to that code
  • Simpler implementation
  • Decrease maintenance cost
  • Less bugs
  • Reduce friction when changing the engine or internal APIs
  • Reduce internal recursion

Implementation

This PR is mostly a mean to demonstrate the concept and open the discussion, and not a proper implementation. For instance, I use opcache.preload to load array.php.

Although a proper implementation would use a similar mechanism.

The .php files could be either embed in the binary, or installed separately on the file system. Both have pros and cons.

Benchmarks

I'm using the same benchmark script as in #17536.

The plain PHP implementations is 1.08 times faster than master on my machine:

hyperfine '/tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 /tmp/bench.php' '/tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.preload=ext/standard/array.php /tmp/bench.php'
Benchmark 1: /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 /tmp/bench.php
  Time (mean ± σ):     413.3 ms ±   1.7 ms    [User: 408.0 ms, System: 4.5 ms]
  Range (min … max):   411.3 ms … 416.1 ms    10 runs
 
Benchmark 2: /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.preload=ext/standard/array.php /tmp/bench.php
  Time (mean ± σ):     382.0 ms ±   3.3 ms    [User: 376.7 ms, System: 4.5 ms]
  Range (min … max):   379.2 ms … 390.5 ms    10 runs
 
Summary
  /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.preload=ext/standard/array.php /tmp/bench.php ran
    1.08 ± 0.01 times faster than /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 /tmp/bench.php

When JIT is enabled in both branches, this branch is 2.30 times faster than master:

hyperfine '/tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m /tmp/bench.php' '/tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m -dopcache.preload=ext/standard/array.php /tmp/bench.php'
Benchmark 1: /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m /tmp/bench.php
  Time (mean ± σ):     360.8 ms ±   2.0 ms    [User: 356.2 ms, System: 3.9 ms]
  Range (min … max):   357.9 ms … 363.3 ms    10 runs
 
Benchmark 2: /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m -dopcache.preload=ext/standard/array.php /tmp/bench.php
  Time (mean ± σ):     156.0 ms ±   0.8 ms    [User: 150.2 ms, System: 5.3 ms]
  Range (min … max):   154.7 ms … 157.8 ms    19 runs
 
Summary
  /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m -dopcache.preload=ext/standard/array.php /tmp/bench.php ran
    2.31 ± 0.02 times faster than /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m /tmp/bench.php

This is because JIT is able to inline the callback into array_find() in this benchmark, eliminating the function call overhead entirely, which is not possible with the C implementation. (JIT heuristics would probably blacklist some trace entry points after array_find() is used with many different callbacks, but this is something we can tune.) In general, plain PHP code gives more optimizations opportunities to the JIT.

@iluuu1994
Copy link
Member

Interesting. I wonder if preload scales well, or whether at some point with enough code we'll have a noticeable startup delay due to code compilation. It may be possible to store them in the binary as compiled opcodes, with the file cache mechanism.

@nielsdos
Copy link
Member

I would be in favor of this.
I wonder how performance compares between native and php in the array_find test when the closure takes 2 args, on your machine (to avoid extra arg copy overhead).
Also note on Windows we don't have preloading.

@arnaud-lb
Copy link
Member Author

With a 2 args closure, native is faster (1.04 times).
With JIT this branch is 2.24 times faster.

@TimWolla
Copy link
Member

TimWolla commented Apr 1, 2025

Unsurprisingly, I'm also in favor of this for the maintenance aspect alone. However as a long-term goal I would find it important to be able to mix and match native and userland methods within a single class, similarly to Java’s native methods.

As an example, SensitiveParameterValue could be mostly implemented in userland code, but it currently has some internals-only behavior (get_properties_for) to hide the values in cases where __debugInfo() is insufficient, e.g. when casting to (array).


For the implementation I would therefore be in favor of putting the implementation right into the stub files and to indicate somehow if the function / method is a stub or whether it contains the implementation. That would also allow downstream consumers to use the stub-files as-is.

@bukka
Copy link
Member

bukka commented Apr 2, 2025

We have got issue with preloading and FPM using multiple pools with different user / groups and then using permissions for some operations so this would need to be properly tested. There is currently an ugly hack for single user but if there are multiple different users, then it might not work - I haven't tested it for some time but there were issues around it.

@bukka
Copy link
Member

bukka commented Apr 2, 2025

Well this setup has already issues with opcache (clearing locks), just wondering if basically required preloading could make it even worse.

@arnaud-lb
Copy link
Member Author

@bukka @nielsdos yes, I used opcache.preload for demonstration purposes here, but a proper implementation would not use opcache.preload or depend on Opcache. There are a few possibilities, such as compiling during startup (without changing uid) and moving op arrays to the native heap, or compiling at build time as @iluuu1994 suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants