-
Notifications
You must be signed in to change notification settings - Fork 9
Cleanup of the switch from yescrypt to yespower #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…longer misattributed.
possibly compile without AVX enabled in the compiler, not to disable this code in the source. Doing so in the source as it was done in Koto prior to this commit is dangerous: the resulting code is a mix of VEX-encoded compiler-generated instructions (e.g., for Salsa20) and non-VEX-encoded/legacy SSE2 instructions from the inline asm. Such mix can be extremely slow on some CPUs. Not slightly slower, but many times slower, depending on how frequent or not the transitions are (does the compiler also introduce occasional VEX-encoded instructions between the pieces of inline asm or not) and on CPU microarch (some only penalize the transitions, for some others the slowdown is smaller but it persists past transition point). A good explanation: https://stackoverflow.com/questions/41303780/why-is-this-sse-code-6-times-slower-without-vzeroupper-on-skylake/41349852#41349852
These source files are that I copied from cpuminer-yescrypt. For Koto core daemon, it seems that there are no -mavx or -march=native option for build script therefore Koto will be compiled with sse2 option in most case because gcc for x86_64 is sse2 enabled by default. |
I'm sorry I sent this PR against master - apparently, it should have been against develop. Yes, cpuminer will need similar changes, and probably more: I'd fully replace the old yescrypt code with yespower, not be adding yespower while still keeping the old yescrypt code like it's been done. Regarding the speeds, I'm aware that the SSE2 inline asm is typically the fastest - that's why I wrote and included it in there. We just shouldn't use it in AVX-enabled builds for the reason I mentioned. Unfortunately. Koto's cpuminer should probably build for SSE2 by default, not use any "native" optimization flags. None of your benchmarks include plain SSE2 builds. You always try to enable some native instruction set in the compiler, and that allows the compiler to also include AVX and even AVX2 instructions in other places, which as I explained might hurt. It'd be useful to include a plain SSE2 build in your comparison. Quite possibly it'll be the fastest on at least some of those CPUs. In
For the old yescrypt/yespower 0.5 mode, AVX and XOP matter slightly more, because the code uses Salsa20 more, and Salsa20 benefits from AVX and XOP. Especially when |
These new results make sense to me as it relates to relative speeds of the different builds. I'm a bit surprised i5-6500 (3.3 GHz all-core turbo) performs so poorly - I'd expect it to be closer to i5-4590 (3.5 GHz all-core turbo). I'm not surprised i5-4590 performs much worse than i7-4770K, for which I included a comparable benchmark result in For the Ryzen 1700 system, optimal thread count is probably 16 or 8, not 4 which you measured - or is that an error in the footnote? The speed is unrealistically high for 4 threads. In |
That's right. |
Also related: Koto adds |
Ignore exceptions when deserializing note plaintexts
196962ff0 Add AcceleratedCRC32C to port_win.h 1bdf1c34c Merge upstream LevelDB v1.20 d31721eb0 Merge #17: Fixed file sharing errors fecd44902 Fixed file sharing error in Win32Env::GetFileSize(), Win32SequentialFile::_Init(), Win32RandomAccessFile::_Init() Fixed error checking in Win32SequentialFile::_Init() 5b7510f1b Merge #14: Merge upstream LevelDB 1.19 0d969fd57 Merge #16: [LevelDB] Do no crash if filesystem can't fsync c8c029b5b [LevelDB] Do no crash if filesystem can't fsync a53934a3a Increase leveldb version to 1.20. f3f139737 Separate Env tests from PosixEnv tests. eb4f0972f leveldb: Fix compilation warnings in port_posix_sse.cc on x86 (32-bit). d0883b600 Fixed path to doc file: index.md. 7fa20948d Convert documentation to markdown. ea175e28f Implement support for Intel crc32 instruction (SSE 4.2) 95cd743e5 Including <limits> for std::numeric_limits. 646c3588d Limit the number of read-only files the POSIX Env will have open. d40bc3fa5 Merge #13: Typo ebbd772d3 Typo a2fb086d0 Add option for max file size. The currend hard-coded value of 2M is inefficient in colossus. git-subtree-dir: src/leveldb git-subtree-split: 196962ff01c39b4705d8117df5c3f8c205349950
98fadc090 Merge #24: Push bool into array correctly 5f03f1f39 Push bool into array correctly 98261b1e7 Merge #22: Clamp JSON object depth to PHP limit 54c401541 Clamp JSON object depth to PHP limit 5a58a4667 Merge #21: Remove hand-coded UniValue destructor. b4cdfc4f4 Remove hand-coded UniValue destructor. 7fba60b5a Merge #17: [docs] Update readme 4577454e7 Merge #13: Fix typo ac7e73cda [docs] Update readme 7890db99d Merge #11: Remove deprecated std pair wrappers 40e34852a Merge #14: Cleaned up namespace imports to reduce symbol collisions 4a4964729 Fix typo 85052a481 Remove deprecated std::pair wrappers 51d3ab34b Merge #10: Add pushKV(key, boolean) function (replaces #5) 129bad96d [tests] test pushKV for boolean values b3c44c947 Pushing boolean value to univalue correctly 07947ff2d Merge #9: [tests] Fix BOOST_CHECK_THROW macro ec849d9a2 [tests] Fix BOOST_CHECK_THROW macro d208f986d Cleaned up namespace imports to reduce symbol collisions 31bc9f5a4 Merge #8: Remove unused Homebrew workaround fa042093d Remove HomeBrew workaround a523e08ae Merge #7: Declare single-argument (non-converting) constructors "explicit" a9e53b38b Merge #4: Pull upstream fe805ea74 Declare single-argument (non-converting) constructors "explicit" 8a2d6f1e3 Merge pull request #41 from jgarzik/get-obj-map ba341a20d Add getObjMap() helper method. Also, constify checkObject(). ceb119413 Handle .pushKV() and .checkObject() edge cases. 107db9829 Add ::push_back(double) method for feature parity. d41530031 Move one-line implementation of UniValue::read() to header. 52e85b35b Move exception-throwing get_* methods into separate implementation module. dac529675 README.md: update code quotes 3e31dcffb README.md: close code quote d09b8429d Update README.md f1b86edb4 Convert README to markdown style. 1dfe464ef Import UniValue class unit tests from bitcoin project. 0d3e74dd1 operator[] takes size_t index parameter (versus unsigned int) 640158fa2 Private findKey() method becomes size_t clean, and returns bool on failure. 709913585 Merge pull request #36 from ryanofsky/pr/end-str 4fd5444d1 Reject unterminated strings 16a1f7f6e Merge #3: Pull upstream daf1285af Merge pull request #2 from jgarzik/master f32df99e9 Merge branch '2016_04_unicode' into bitcoin 280b191cb Merge remote-tracking branch 'jgarzik/master' into bitcoin 2740c4f71 Merge branch '2015_11_escape_plan' into bitcoin REVERT: 9ef5b78c1 Use size_t for UniValue array indexing git-subtree-dir: src/univalue git-subtree-split: 98fadc090984fa7e070b6c41ccb514f69a371c85
I thought I'd only drop the misattribution from
yespower.h
and fix the dangerous bug of failing to checkyespower_tls()
return value, but I ended up noticing and fixing other issues as well.Two things I left as-is for now for not knowing their rationale:
Why does
sha256.h
haveextern "C" {
...}
dropped? It has those in yespower 1.0, and nested ones are meant to work OK: https://stackoverflow.com/questions/48099828/what-happens-if-you-nest-extern-cWhy does
yespower-opt.c
have#undef unlikely
added, and why only in the GCC-specific branch? At least the latter is probably a bug: if this name was somehow a previously defined macro, then it probably needs to be undefined for non-GCC as well. But I didn't find any other code in the tree that would define this macro.These changes are untested, but they got to work. ;-) So perhaps test them before merging. Not being even a user of Koto, I feel it'd be too much for me to also contribute testing. ;-)