Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometry_Engine: Update PointClustersDBSCAN to use DomainTree #1888

Merged

Conversation

kThorsager
Copy link
Contributor

Issues addressed by this PR

Closes #1881
Had some hopes about making this general for DBSCAN and making them use INode, but there were annoyances as it i not a IEnumerable, there are difficulties in changing the objects in the tree without messing with the data and the DBSCAN method would need to be refactored to look at things per Collection rather than per Item.

And just adding clusters for IGeometries or ICurves was also annoying since no general distance method existed and would add a extra layer to this method. And there doesn't seem to be any huge need either so.

But the speed up is considerable, especially for large sets of points. 😁
(40000 points, 1.3m to 3.4s in the test file)

Test files

https://burohappold.sharepoint.com/:u:/s/BHoM/Ec8Vk7lcL1dCgXq-QdewJokBfJ4s8jwT5hIATIwiA-3WWg?e=RqtVBu

Changelog

Additional comments

@kThorsager kThorsager added the type:feature New capability or enhancement label Jul 2, 2020
@kThorsager kThorsager requested a review from pawelbaran July 2, 2020 11:36
@kThorsager kThorsager self-assigned this Jul 2, 2020
Copy link
Member

@pawelbaran pawelbaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine, LGTM, but two questions ring in my head:

  • before the proposed change, PointClusetersDBSCAN was and example of implementation of the generic DBSCAN algorithm, while now it does not have anything to do with it any more
  • if that can work on vectors in 3-dimensional space, couldn't it be generalized a bit cheaply?

These translate into: couldn't the tree become a part of the generic DBSCAN algorithm? I have read the PR description, but still, that would be soo nice!

@kThorsager
Copy link
Contributor Author

Works fine, LGTM, but two questions ring in my head:

  • before the proposed change, PointClusetersDBSCAN was and example of implementation of the generic DBSCAN algorithm, while now it does not have anything to do with it any more

It doesn't have anything to do with that specific method in Data_Engine, but it is still using the DBSCAN algorithm

  • if that can work on vectors in 3-dimensional space, couldn't it be generalized a bit cheaply?

These translate into: couldn't the tree become a part of the generic DBSCAN algorithm? I have read the PR description, but still, that would be soo nice!

I agree that that would be super nice, so I'll give it another go and see if I can be less annoyed and/or put in the extra work to make it feasible

@kThorsager kThorsager requested a review from adecler as a code owner July 8, 2020 14:38
@kThorsager kThorsager added the status:WIP PR in progress and still in draft, not ready for formal review label Jul 8, 2020
@kThorsager
Copy link
Contributor Author

So it is still annoying.

To make use of something similar to the existing version one would have to copy the tree structure over with an other collection of data, and iterate over that tree, and make sure the queries does not remove the object reference.

First one can't be done with the interface as it is structured as it would by necessity return an interface, which breaks everything as it's recursively defined. The second one is possible but exceedingly annoying without making it an IEnumerable which would break the UI. And could at least not form a quick look make the third one work.

So to copy the one previously here where I keep the incidences in a tree. This does not work for the interface as it's kind of a special case that it can be traversed without the data present, which would need to be exchanged for ints.

So we got a method for any thing which one can construct a DomainBox from and wish to cluster based on those in some manner.

Will add documentation if this is general enough. @pawelbaran

@kThorsager kThorsager removed the status:WIP PR in progress and still in draft, not ready for formal review label Jul 8, 2020
@kThorsager kThorsager requested a review from pawelbaran July 9, 2020 14:52
Copy link
Member

@pawelbaran pawelbaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works nice and super fast, I like the way of generalization (could possibly be used to other data types than 3-dimensional goemetry - correct me if I am wrong @kThorsager).

Happy to approve once the 2 code comments are replied 👍

Data_Engine/Compute/DomainTreeClusters.cs Outdated Show resolved Hide resolved
Func<Point, Point, bool> metricFunction = (x, y) => x.SquareDistance(y) <= maxSqrDist;
return points.ClusterDBSCAN(metricFunction, minPointCount);
double sqDist = maxDist * maxDist;
Func<Point, DomainBox> toDomainBox = a => a.IBounds().DomainBox();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we simply create a DomainBox without calling a chain of methods here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, (but it's a bit ugly)

And keep track of the items to evaluate through their indecies as opposed to the items themself
@kThorsager
Copy link
Contributor Author

It works nice and super fast, I like the way of generalization (could possibly be used to other data types than 3-dimensional goemetry - correct me if I am wrong @kThorsager).

Yes, anything which one can search for in a DomainTree and which can be clustered by its abstract DomainBoxes proximity (which is in a arbitrary dimension).

The one thing is that this only supports the one Create method for the tree, DomainBoxes encompassing DomainBoxes, which should be fine. But then again, who knows how this might be used one day.

@kThorsager kThorsager requested a review from pawelbaran July 13, 2020 11:36
Copy link
Member

@pawelbaran pawelbaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, works faster than GH! :D

@pawelbaran
Copy link
Member

/azp run BHoM_Engine.CheckInstaller

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FraserGreenroyd FraserGreenroyd merged commit bf9891b into master Jul 15, 2020
@FraserGreenroyd FraserGreenroyd deleted the Geometry_Engine-#1881-Use-DomainTree-for-PointClusters branch July 15, 2020 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New capability or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Geometry_Engine: Use DomainTree for PointClusters
3 participants