Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add BLAKE3 hash #13194

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open

add BLAKE3 hash #13194

wants to merge 34 commits into from

Conversation

divinity76
Copy link
Contributor

@divinity76 divinity76 commented Jan 19, 2024

BLAKE3 is a very fast cryptographically secure hash. It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).
RFC: https://wiki.php.net/rfc/blake3
Check this speed chart:

and this /ext/hash/bench.php run where BLAKE3 is much faster than every other secure hash, and beats several non-cryptographic hashes (AMD Ryzen 9 7950x):
output (5)

$ sapi/cli/php ext/hash/bench.php
crc32b       0.001195
crc32c       0.001202
crc32        0.001202
xxh3         0.001234
xxh128       0.001289
xxh64        0.001475
xxh32        0.002235
murmur3f     0.002459
murmur3c     0.003681
murmur3a     0.004289
adler32      0.007718
blake3       0.007752
md4          0.013109
fnv132       0.015075
fnv164       0.015109
fnv1a64      0.015116
fnv1a32      0.015251
joaat        0.018858
md5          0.019797
sha1         0.020472
tiger160,3   0.021290
tiger192,3   0.021363
tiger128,3   0.021366
tiger128,4   0.027518
tiger160,4   0.027743
tiger192,4   0.027870
ripemd128    0.029190
ripemd256    0.029378
sha3-224     0.029787
sha3-256     0.031518
haval256,3   0.038328
haval224,3   0.038479
haval128,3   0.038483
sha3-384     0.038559
haval192,3   0.038564
haval160,3   0.039068
sha512/256   0.039302
sha512       0.039307
sha512/224   0.039472
sha384       0.039508
ripemd160    0.042287
ripemd320    0.043036
sha3-512     0.054044
haval192,4   0.055699
haval224,4   0.055902
haval160,4   0.055925
haval256,4   0.055948
haval128,4   0.056165
sha256       0.057846
sha224       0.058139
haval128,5   0.070442
haval224,5   0.070503
haval256,5   0.070569
haval192,5   0.070576
haval160,5   0.071109
whirlpool    0.086256
gost         0.200251
gost-crypto  0.200709
snefru256    0.449650
snefru       0.451111
md2          1.237880

Quoting Google Bard:

In summary, BLAKE3 is a modern and highly efficient cryptographic hash function that offers security, flexibility, and ease of use. Its impressive speed and adaptability make it a promising choice for a wide range of applications in the digital age.

  • on x86_64 BLAKE3 is slightly faster than KangarooTwelve, but on ARM BLAKE3 is significantly faster than KangarooTwelve.
  • AFAIK the PHP project doesn't like git submodules (for example Dmitry Stogov wanted the new JIT engine to be a submodule but the PHP developers refused), so instead I made ext/hash/blake3/fetch_upstream_blake3.sh
  • I have not added SSE2/AVX2/AVX512/etc optimized builds to MSVC because I don't have a MSVC system to test on, so I just added the bare-minimum portable implementation to MSVC builds. (feel free to fix it, I don't want to.)

BLAKE3 is a very fast cryptograpically secure hash.
It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).

Check this speed chart: https://raw.githubusercontent.com/BLAKE3-team/BLAKE3/master/media/speed.svg
and this bench.php run where BLAKE3 is much faster than every secure hash, and beats several non-cryptographic hashes:
$ sapi/cli/php ext/hash/bench.php
crc32b       0.001195
crc32c       0.001202
crc32        0.001202
xxh3         0.001234
xxh128       0.001289
xxh64        0.001475
xxh32        0.002235
murmur3f     0.002459
murmur3c     0.003681
murmur3a     0.004289
adler32      0.007718
blake3       0.007752
md4          0.013109
fnv132       0.015075
fnv164       0.015109
fnv1a64      0.015116
fnv1a32      0.015251
joaat        0.018858
md5          0.019797
sha1         0.020472
tiger160,3   0.021290
tiger192,3   0.021363
tiger128,3   0.021366
tiger128,4   0.027518
tiger160,4   0.027743
tiger192,4   0.027870
ripemd128    0.029190
ripemd256    0.029378
sha3-224     0.029787
sha3-256     0.031518
haval256,3   0.038328
haval224,3   0.038479
haval128,3   0.038483
sha3-384     0.038559
haval192,3   0.038564
haval160,3   0.039068
sha512/256   0.039302
sha512       0.039307
sha512/224   0.039472
sha384       0.039508
ripemd160    0.042287
ripemd320    0.043036
sha3-512     0.054044
haval192,4   0.055699
haval224,4   0.055902
haval160,4   0.055925
haval256,4   0.055948
haval128,4   0.056165
sha256       0.057846
sha224       0.058139
haval128,5   0.070442
haval224,5   0.070503
haval256,5   0.070569
haval192,5   0.070576
haval160,5   0.071109
whirlpool    0.086256
gost         0.200251
gost-crypto  0.200709
snefru256    0.449650
snefru       0.451111
md2          1.237880
@devnexen
Copy link
Member

That s quite an addition ... worth discussing.

.. don't know what I'm supposed to do with it, but compiler complains about it missing.
@iluuu1994
Copy link
Member

This is indeed a lot of code. I think this should be discussed on the list, and possibly voted on.

@divinity76
Copy link
Contributor Author

divinity76 commented Jan 19, 2024

asked on php-internals mailing list: https://marc.info/?l=php-internals&m=170568974400302&w=2
(and fixed MSVC and ARM+Neon builds, the latter tested on Oracle Cloud ARM64, which is notably different from Apple's ARM)

#define PHP_BLAKE3_CTX blake3_hasher
// help: is V correct?
//#define PHP_BLAKE3_SPEC "b8b8qb64bbbbb1760"
#define PHP_BLAKE3_SPEC "L8L8Qa64CCCCL8Ca1760"
Copy link
Contributor Author

@divinity76 divinity76 Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can someone double-check this? because i couldn't wrap my head around it, and this is just my best guess based on

typedef struct {
uint32_t cv[8];
uint64_t chunk_counter;
uint8_t buf[BLAKE3_BLOCK_LEN];
uint8_t buf_len;
uint8_t blocks_compressed;
uint8_t flags;
} blake3_chunk_state;
typedef struct {
uint32_t key[8];
blake3_chunk_state chunk;
uint8_t cv_stack_len;
// The stack size is MAX_DEPTH + 1 because we do lazy merging. For example,
// with 7 chunks, we have 3 entries in the stack. Adding an 8th chunk
// requires a 4th entry, rather than merging everything down to 1, because we
// don't know whether more input is coming. This is different from how the
// reference implementation does things.
uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN];
} blake3_hasher;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with this? From what I see in

php-src/ext/hash/hash.c

Lines 211 to 230 in 6d4598e

/* Serialize a hash context according to a `spec` string.
Spec contents:
b[COUNT] -- serialize COUNT bytes
s[COUNT] -- serialize COUNT 16-bit integers
l[COUNT] -- serialize COUNT 32-bit integers
q[COUNT] -- serialize COUNT 64-bit integers
i[COUNT] -- serialize COUNT `int`s
B[COUNT] -- skip COUNT bytes
S[COUNT], L[COUNT], etc. -- uppercase versions skip instead of read
. (must be last character) -- assert that the hash context has exactly
this size
Example: "llllllb64l16." is the spec for an MD5 context: 6 32-bit
integers, followed by 64 bytes, then 16 32-bit integers, and that's
exactly the size of the context.
The serialization result is an array. Each integer is serialized as a
32-bit integer, except that a run of 2 or more bytes is encoded as a
string, and each 64-bit integer is serialized as two 32-bit integers, least
significant bits first. This allows 32-bit and 64-bit architectures to
interchange serialized HashContexts. */
, Q and C is not even there? This is a serialization format so best way to verify is to add some test and try serialization. Around the place that I link is also code for the actual serialization so it should be pretty easy to debug this and figure out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at line 150, uppercase Q still exists:

} else if (*spec == 'q' || *spec == 'Q') {

Pretty sure uppercase C also existed with the meaning unsigned char or uint8_t when it was written, but it seems to have been removed! I'll try to re-write it again.. thanks!

Copy link
Contributor Author

@divinity76 divinity76 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still can't get it quite right, do you see anything off here?

#define PHP_BLAKE3_SPEC /* uint32_t key[8]; */"l8" \
/* uint32_t cv[8]; */ "l8" \
/* uint64_t chunk_counter; */ "q1" \
/* uint8_t buf[BLAKE3_BLOCK_LEN];  */ "b64" \
/* uint8_t buf_len; */ "b1" \
/* uint8_t blocks_compressed */ "b1" \
/* uint8_t flags; */ "b1" \
/* uint8_t cv_stack_len; */ "b1" \
/* uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN]; */ "b1760" \
"."

It still hits

php-src/ext/hash/hash.c

Lines 1493 to 1495 in 6d4598e

serialize_failure:
zend_throw_exception_ex(NULL, 0, "HashContext for algorithm \"%s\" cannot be serialized", hash->ops->algo);
RETURN_THROWS();

Copy link
Contributor Author

@divinity76 divinity76 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a possible solution: it seems the compiler insert 5 alignment-padding-bytes after uint8_t buf_len;,
and an additional 7 padding bytes after uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN];, if the spec is changed to

#define PHP_BLAKE3_SPEC /* uint32_t key[8]; */"l8" \
/* uint32_t cv[8]; */ "l8" \
/* uint64_t chunk_counter; */ "q" \
/* uint8_t buf[BLAKE3_BLOCK_LEN];  */ "b64" \
/* uint8_t buf_len; */ "b" \
/* skip 5 bytes of alignment padding in chunk */ "B5" \
/* uint8_t blocks_compressed */ "b" \
/* uint8_t flags; */ "b" \
/* uint8_t cv_stack_len; */ "b" \
/* uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN]; */ "b1760" \
    /* skip 7 trailing alignment bytes */     "B7" \
"."

serialization actually works on my system!

But this seems like the kind of thing different compilers, and even the same compiler with different optimization settings, may choose to do differently. For example, I wouldn't trust gcc -Ofast (optimize-for-speed) and gcc -Os (optimize-for-size) to do the exact same alignments paddings in the exact same location 🤔 Nor would I trust the compiler to do the exact same alignment padding across 32bit and 64bit builds..

I might be wrong but.. idk!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could maybe add one uint8_t reserved field and then on 64bit reserved2 field of uint32_t type which should hopefully make it defined on all platforms.

somehow didn't see that message until after pushing 8762e32 🤔 but I like that better. I'll run some tests locally

Copy link
Member

@bukka bukka Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 5 bytes is correct only on 64bit but it should not be correct on 32bit. Also the compiler is free to choose how to do padding so your choice of the padding place is basically relaying on undefined behaviour and it could theoretical differ between compilers. As I mentioned, the only way how to be sure is to define fields as compiler must not re-order them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, there's also option to define your own serialization function but not sure if it would make it any simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it could be cleaner because it wouldn't rely on struct memory layout...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 5 bytes is correct only on 64bit but it should not be correct on 32bit

well 263b2f6 works on 32bit (tested locally on a debian 12 32bit docker image, and now confirmed on the 32bit github CI runner)

From what I see, there's also option to define your own serialization function but not sure if it would make it any simpler.

hmm could you link to an example? I haven't seen it. A custom serializer sounds like the safest bet across compilers/architectures/configurations

@divinity76 divinity76 mentioned this pull request Jan 20, 2024
@divinity76
Copy link
Contributor Author

divinity76 commented Jan 25, 2024

post on the mailing list worth re-posting here:
GCC11.4, even with -march=native -mtune=native, which is not commonly used in PHP,
the compiler didn't stand a chance against the hand-optimized assembly versions
image

wrote some benchmarks, but the TL;DR is:
portable -O2 usually used by PHP managed 1126MB/s,
portable -O2 -march=native managed 533MB/s (wtf? gcc obviously got
something wrong here),
hand-written -O2 SSE2 managed 3144MB/s,
hand-written -O2 SSE41 managed 3332MB/s,
hand-written -O2 avx2 managed 6554MB/s,
hand-writen -O2 AVX512 managed 8913MB/s,
on my AMD Ryzen 9 7950x,
benchmarking code:
https://gist.github.com/divinity76/5729472dd5d77e94cd0acb245aac2226
raw output:

array(6) {
  ["O2-portable-march"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(29295)
    ["mb_per_second"]=>
    float(533.3674688513398)
  }
  ["O2-portable"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(13876)
    ["mb_per_second"]=>
    float(1126.0449697319111)
  }
  ["O2-sse2"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(4969)
    ["mb_per_second"]=>
    float(3144.4958744214127)
  }
  ["O2-sse41"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(4688)
    ["mb_per_second"]=>
    float(3332.977815699659)
  }
  ["O2-avx2"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(2384)
    ["mb_per_second"]=>
    float(6554.1107382550335)
  }
  ["O2-avx512"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(1753)
    ["mb_per_second"]=>
    float(8913.291500285226)
  }
}

Edit:
tested ARM Neon optimizations on Oracle Cloud's cheapest ARM VPS:
VM.Standard.A1.Flex, Ubuntu 22.04, GCC11.4,
results:
image

-O2 portable: 596MB/s
-O2 -march=native portable: 601MB/s
-O2 ARM Neon optimized implementation: 1138MB/s

Again, even with -march=native, the compiler cannot make the portable
implementation nearly as fast as the hand-optimized cpu-specific
implementation.

ext/hash/config.m4 Outdated Show resolved Hide resolved
ext/hash/config.w32 Outdated Show resolved Hide resolved
divinity76 and others added 2 commits January 29, 2024 20:18
Co-authored-by: Peter Kokot <[email protected]>
Co-authored-by: Peter Kokot <[email protected]>
@cypherbits
Copy link

Having this would be good

@cypherbits
Copy link

More than 2 years since the first attempt to get Blake3 into PHP, why is it so hard?

@divinity76
Copy link
Contributor Author

divinity76 commented Oct 22, 2024

More than 2 years since the first attempt to get Blake3 into PHP, why is it so hard?

the main problem this round was that I just lost steam. Got occupied with other stuff and forgot about this. You reminded me though!

@vodnicearv
Copy link

BLAKE3 is very nice, we need that algorithm in php

@leonardocustodio
Copy link

Seems like this will never be merged, such a shame as a lot of blockchain are starting to use blake3 as standard

@divinity76
Copy link
Contributor Author

Fwiw this should be rebased against the newest blake3 release, iirc (on phone now can't check, but iirc) this pr currently contains 3 patches to blake3 with a dedicated patch file, but all 3 patches have been accepted upstream and is part of the newest Blake3 official release

@divinity76
Copy link
Contributor Author

divinity76 commented Jan 22, 2025

Scratch that, BLAKE3-team/BLAKE3#382 is still not resolved upstream,
so if we want to compile php cleanly on MacOS, we still need at least 1 patch for blake3.

the other 2 are resolved upstream tho.

Edit: also had to add BLAKE3-team/BLAKE3#443

caused
```
/Users/runner/work/php-src/php-src/ext/hash/blake3/upstream_blake3/c/blake3_dispatch.c:237:26: error: unused variable 'features' [-Werror,-Wunused-variable]
  const enum cpu_feature features = get_cpu_features();
```
on 32bit x86-32 linux builds.
@divinity76
Copy link
Contributor Author

divinity76 commented Jan 22, 2025

I'm requesting re-review.

@remicollet
Copy link
Member

-1 from me

I think it is a terrible idea to bundle more crypto stuff in PHP
For such serious things, we should rely on external libraries (OpenSSL, sodium...) provided by project specialized in crypto.
We already have bundled far too much things.

Is there an RFC for this ?

@divinity76
Copy link
Contributor Author

divinity76 commented Jan 24, 2025

@remicollet imo we should support BLAKE3 for the same reason we support xxHash (added 8.1.0): speed. BLAKE3 offers SHA3-256-like security at much higher speed than SHA3-256. Quoting /ext/hash/bench.php: sha3-256 0.031518 blake3 0.007752.
output (5)

Exactly how we support BLAKE3 is not that important, if OpenSSL/Sodium starts offering BLAKE3, I wouldn't mind requiring OpenSSL/Sodium for BLAKE3 support. But they don't support BLAKE3 (yet?).

As for RFC, made a draft a year ago, but now I can't even find it. I'll make a new one.

@remicollet
Copy link
Member

Exactly how we support BLAKE3 is not that important, if OpenSSL/Sodium starts offering BLAKE3, I wouldn't mind requiring OpenSSL/Sodium for BLAKE3 support. But they don't support BLAKE3 (yet?).

So effort should go there, not here.

@divinity76
Copy link
Contributor Author

RFC draft: https://wiki.php.net/rfc/blake3

Copy link
Member

@bukka bukka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the missing serialization, it seem ok to me. It would be also good to add a bit more tests to test some predefined vectors to be sure all is good.

I think it's good already for the RFC so you can announce it.

In terms of the maintanence, the hash extension bundles already quite a few of algos so this is just another one. It's taken from upstream like some other algorithms so I don't think it matters that much the amount of code. The upstream seems to be quite well maintained.

ext/hash/hash_blake3.c Outdated Show resolved Hide resolved
#define PHP_BLAKE3_CTX blake3_hasher
// help: is V correct?
//#define PHP_BLAKE3_SPEC "b8b8qb64bbbbb1760"
#define PHP_BLAKE3_SPEC "L8L8Qa64CCCCL8Ca1760"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with this? From what I see in

php-src/ext/hash/hash.c

Lines 211 to 230 in 6d4598e

/* Serialize a hash context according to a `spec` string.
Spec contents:
b[COUNT] -- serialize COUNT bytes
s[COUNT] -- serialize COUNT 16-bit integers
l[COUNT] -- serialize COUNT 32-bit integers
q[COUNT] -- serialize COUNT 64-bit integers
i[COUNT] -- serialize COUNT `int`s
B[COUNT] -- skip COUNT bytes
S[COUNT], L[COUNT], etc. -- uppercase versions skip instead of read
. (must be last character) -- assert that the hash context has exactly
this size
Example: "llllllb64l16." is the spec for an MD5 context: 6 32-bit
integers, followed by 64 bytes, then 16 32-bit integers, and that's
exactly the size of the context.
The serialization result is an array. Each integer is serialized as a
32-bit integer, except that a run of 2 or more bytes is encoded as a
string, and each 64-bit integer is serialized as two 32-bit integers, least
significant bits first. This allows 32-bit and 64-bit architectures to
interchange serialized HashContexts. */
, Q and C is not even there? This is a serialization format so best way to verify is to add some test and try serialization. Around the place that I link is also code for the actual serialization so it should be pretty easy to debug this and figure out.

return in_array($algo, [
"xxh3",
"xxh128",
"blake3", // todo: blake3 can be seralized but it's not implemented yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this should be changed before merging...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect PHP_BLAKE3_SPEC is wrong, and that it breaks serialization, but I haven't been able to figure it out yet 🤔

@@ -73,6 +106,8 @@ PHP_INSTALL_HEADERS([ext/hash], m4_normalize([
php_hash_ripemd.h
php_hash_sha.h
php_hash_sha3.h
php_hash_blake3.h
blake3/upstream_blake3/c/blake3.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there that extra upstream_blake3 directory. Cannot this go directly to blake3?

Copy link
Contributor Author

@divinity76 divinity76 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the directory separate code written for the php-src integration from the substantial amount of code cloned directly from https://github.com/BLAKE3-team/BLAKE3/

Cannot this go directly to blake3?

It can, but I'd prefer if it didn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why it needs extra subdirectory. The blake3 directly contains only upstream_blake3 directory which really doesn't make any sense to me. And you also have c directory which I can see is from upstream. So it should really be just blake3/c .

ext/hash/tests/hash-clone.phpt Outdated Show resolved Hide resolved
ext/hash/tests/hash_copy_001.phpt Outdated Show resolved Hide resolved
ext/hash/tests/hash_hmac_algos.phpt Outdated Show resolved Hide resolved
@divinity76
Copy link
Contributor Author

divinity76 commented Jan 24, 2025

Right now there is a problem with https://dev.mysql.com/ causing Windows CI's to fail, specifically https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.31-winx64.zip returns HTTP 403 Forbidden.

It's not this PR's fault 🤔

edit: made a dedicated issue for it: #17561

there's padding issues, idk how portable this is...
The CI's should be interesting.
I don't have a 32bit system to test on locally, so testing on CI for now..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants