Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nlog: replace concurrenthashmap to phmap::parallel_hash_map #441

Open
yokofly opened this issue Dec 22, 2023 · 9 comments
Open

nlog: replace concurrenthashmap to phmap::parallel_hash_map #441

yokofly opened this issue Dec 22, 2023 · 9 comments
Labels
enhancement New feature or request

Comments

@yokofly
Copy link
Collaborator

yokofly commented Dec 22, 2023

Describe what enhancement you'd like to have

We are currently using a basic concurrent hashmap implementation in Proton nativelog , which utilizes an internal std::shared_mutex (functioning like a read/write mutex). We propose to transition to a more advanced concurrent hashmap.

one of choice is the https://github.com/greg7mdp/parallel-hashmap a widely-used hashmap in OLAP systems such as https://github.com/apache/doris.git

This hashmap's foundational design is inspired by absl but includes additional optimizations for enhanced thread safety and performance. A notable feature is its locking mechanism at the submap level, allowing for high concurrency.

We can conduct benchmark tests and consider replacing our current implementation. For detailed design insights, refer to: https://greg7mdp.github.io/parallel-hashmap/
full benchmark can be found here(update in 2022) https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

@yokofly yokofly added the enhancement New feature or request label Dec 22, 2023
@yokofly yokofly changed the title replace concurrenthashmap to phmap::parallel_hash_map nlog: replace concurrenthashmap to phmap::parallel_hash_map Dec 22, 2023
@yokofly yokofly added the good first issue Good for newcomers label Jan 24, 2024
@varshneydevansh
Copy link
Contributor

I did not asked about this is yesterday as I thought you guys were a bit busy with work. I wanna know that do I have to create a new header file in the concurrent folder where the src/NativeLog/Base/Concurrent/ConcurrentHashMap.h is implemented? Or should I just add the phmap file from the https://github.com/greg7mdp/parallel-hashmap?tab=readme-ov-file#overview and make the modifications replacing the usage of concurrantHashMap?

@yokofly
Copy link
Collaborator Author

yokofly commented Feb 6, 2024

I did not asked about this is yesterday as I thought you guys were a bit busy with work. I wanna know that do I have to create a new header file in the concurrent folder where the src/NativeLog/Base/Concurrent/ConcurrentHashMap.h is implemented? Or should I just add the phmap file from the https://github.com/greg7mdp/parallel-hashmap?tab=readme-ov-file#overview and make the modifications replacing the usage of concurrantHashMap?

glad to see you wanna work on this issue! Have you check the map impl? we use a shared_mutex and a map right? this is a manual way to control the thread-safe. and we wanna switch to the new map.

basically, we need to add a submodule first, then replace the map impl inside the header. and ensure compile pass, and our testing pass.

extra: we can add a new benchmark to check performance. we internally host a google benchmark style.

@yokofly
Copy link
Collaborator Author

yokofly commented Feb 6, 2024

step 0: you need to ensure the current proton has been built locally(either by docker mount or bare mental build). we use clang-16 compiler , check build.md here.
step 1: add a submodule you can follow this #154 , but there shall be more cmake need change.
step 2: replace the impl, i think you need replace the interface since this map has internally ensured safety, we do not need the mutex again, but other exposed map behaviors shall same.
step 3: build/compile pass (we use cpp20, check our code conversion here)
step 4: testing woks. this is necessary. we use unit test/ smoke test. (yeah, smoke is previous you have run, the ut need building proton and in build folder run ./src/unit_tests_dbms (same as clickhouse unit test way)

further: if we can have a benchmark showing the performance upgrade would be great. we have using the google benchmark style. this require create a folder.

@yokofly
Copy link
Collaborator Author

yokofly commented Feb 6, 2024

the first time build proton will be painful, and need one hour longer. if you install c-cache it will save more time to do only incremental building. (calculate the changed file from a hash alg, and only build the changed one)

@varshneydevansh
Copy link
Contributor

OMG thank you for this much help. =) I did looked at the https://github.com/timeplus-io/proton/blob/develop/src/NativeLog/Base/Concurrent/ConcurrentHashMap.h

This is how I add the git submodule add https://github.com/greg7mdp/parallel-hashmap.git in the src/NativeLog/Base/Concurrent? more over I already have ccache installed as I build the libreoffice project in my system couple of days back that took multiple hours to build.

@yokofly
Copy link
Collaborator Author

yokofly commented Feb 6, 2024

  1. the submodule add is incorrect. I expected you to understand the commands. proton handle the submodule in proton/contrib folder, and plz take a look the above pr i mentioned. (either need manual write a new Cmakefile or use old CMakeLists but need use proton cmake config) (plz check the pr submodule change: introduce google bench #154)
  2. I am not sure about the LibreOffice build relation with proton. the dependency installation is pretty complex at first. you need to handle different env dependencies...

@varshneydevansh
Copy link
Contributor

Setting-up according to the build.md is having some problems -

I tried following the bare-metal steps 3 times https://github.com/timeplus-io/proton/blob/develop/BUILD.md#bare-metal-build but my terminal crashed/closed after running for quite a while.

Then I did the build with docker https://github.com/timeplus-io/proton/blob/develop/BUILD.md#bare-metal-build below SS has the build_docker folder -
Screenshot from 2024-02-06 19-02-08

I used this command git clone --recurse-submodules https://github.com/varshneydevansh/proton.git and it completed as in the below SS but the cmake command is failing maybe because of GNU-12? I am still ways to get it set-up properly.

Screenshot from 2024-02-06 20-52-11

and because of this unable to run the server on my local pc -

 [~/proton]
 ✘  devansh   develop  sudo ./programs server --config-file ../programs/server/config.yaml                                                              
[sudo] password for devansh: 
sudo: ./programs: command not found

@chenziliang
Copy link
Collaborator

Thanks for reporting this issue. We only supported / validated clang-16 tool chain for compiling for the latest code base.

@yokofly yokofly removed the good first issue Good for newcomers label May 17, 2024
@yokofly
Copy link
Collaborator Author

yokofly commented May 17, 2024

this is still not easy for newcomer, I remove the label, and after #660 proton now has boost::concurrent_flat_map

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants