-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing Speed of Hull Algorithm #903
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #903 +/- ##
==========================================
- Coverage 91.84% 88.46% -3.38%
==========================================
Files 37 62 +25
Lines 4976 8685 +3709
Branches 0 1056 +1056
==========================================
+ Hits 4570 7683 +3113
- Misses 406 1002 +596 ☔ View full report in Codecov by Sentry. |
e.g. |
So, for the This is just food for thought and it's possible what I have explained isn't very clear, so let's discuss this in the next meeting. |
Also @elalish I know we discussed that we wanted to clean up the code a little, but I just wanted to verify whether I should do it in this PR itself, and then we can rename it too refactoring and optimizing that won't be an issue right? |
@pca006132 knows more about atomics than I do, but my impression is that you mostly pay the atomic price when they collide (you're actually trying to modify the same address at the same time). I think if the number of faces is >= the number of disabled points, then atomics might not be too bad. It might be easier to open a second PR for readability improvements. What do you think? |
Yeah I was thinking the same, but the number of faces is much less, so I was not sure about what we could do to optimize it in such a situation, so I just thought I'll share my thoughts and we can discuss about it. Yeah, I was thinking about opening a new PR as well, then we can sync this later once we are satisfied with it. So, I'll go ahead and start making changes in a new branch and send a PR soon. |
I think we can try to parallelize over the disabled points. I don't think we want to use the tbb vector, it is not as light weight as a typical vector. Also, I think considering we have more than several dozens of faces, it may be feasible to use a lock per face. If there is no contention, it can make things simple while giving nice performance improvement. |
#904 (comment) @pca006132 Should I start breaking the Face struct according to this into three structs or is there a better way to approach it? |
Parallelize things will likely be more beneficial. |
I've already started work on it, I had some doubts I was hoping to ask you in today's meeting and then continue based on that. |
@pca006132 I was wondering if we should use @elalish It seems to me like the bug of some verts being excluded is causing the change I made here to fail as well. I'll try to investigate the error again. But for the meanwhile is it alright if I add |
Just evaluate the two and see how much better it gets. We just care about the actual performance (and correctness), the opinions I gave earlier are heuristics, not rules. |
Yeah that's why I tried to implement it, but I was having errors, despite the code seeming correct to me. Could you go through that once. I've added in comments the changes needed for |
src/manifold/src/quickhull.cpp
Outdated
// return true; | ||
|
||
// For ExecutionPolicy::Seq | ||
pointMutex = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this pointMutex
thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just a variable I am using to check if that particular point has been assigned to a face yet, I initially planned to have a lock for it, forgot to change the name. I'll change the name.
pointMutex = 1; | ||
|
||
// Ensures atomic addition of point to face | ||
f.faceMutex->lock(); | ||
if (!f.pointsOnPositiveSide) { | ||
f.pointsOnPositiveSide = getIndexVectorFromPool(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this call is not thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lock call isn't thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the getIndexVectorFromPool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, since the Pool is same for all faces, I should use a lock specifically for pool. Thanks for pointing that out!, that helps a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do note that for every lock you are adding now, you will likely be thinking about how to remove them later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, for now, the important thing is to make sure it works. We can gradually improve on performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will keep that in mind. Also, the function call happens once per face. So, should not add much overhead.
src/manifold/src/quickhull.h
Outdated
@@ -126,6 +126,7 @@ class MeshBuilder { | |||
size_t visibilityCheckedOnIteration = 0; | |||
std::uint8_t isVisibleFaceOnCurrentIteration : 1; | |||
std::uint8_t inFaceStack : 1; | |||
std::recursive_mutex* faceMutex = new std::recursive_mutex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this probably should not be a recursive mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw recursive_mutex being used at a lot of places so just figured I could use it, I'll use std::mutex instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursive mutex is needed only when you may lock the same thing several times in the same thread. Yeah, you can probably just use std::mutex
.
Btw, if you use |
Oh, thanks for pointing that out! |
@pca006132 , I think I've got it working and now I want to try and improve the performance, I will try out both with and without parallelizing over the faces to see which gives better performance. Apart from that I had a couple of questions
Also to get an idea of how a change affects the average performance I was thinking I could run it on the Thingi10k dataset. While to test for larger cases where it should show improvement, I'll try it on the high quality sphere and MengerSponge. Does that sound good to you? |
Usually you either need a lock or an atomic, but ideally not both. I think of an atomic as a short hardware lock, and prefer them where possible. |
|
It seems that the tictac hull number of vertices output is somewhat related to evaluation order?
|
@Kushal-Shah-03 Hi Kushal, do you plan to work on this anytime soon? We have some large refactoring recently so some code here may have to be changed to make things compile. |
Hello @pca006132, I'm so sorry for the late reply, I have been really busy with the recovery and the end semester exams coming nearer. I can try to modify the code so that it compiles, but I don't think I would be able to work on it extensively for now. I can continue with the parallelization in December. We can possibly merge the other parallelization parts (the basic for loops we parallelized) for now, and I can work on the addPointToFace parallelization call later in another PR. |
@Kushal-Shah-03 This is fine. IIRC the parallelization implemented now is not giving much performance improvement? In this case, we probably don't want to merge it yet. I think we can close this PR for now, and you can open a new one when you revisit this later. |
I've added tracy headers in some functions, to observe the number of function calls, and time taken, to identify possible places to parallelize. I was wondering if there is a way to just run a specific test in our tests or would I have to comment out the others?