Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

point store size fix #265

Merged
merged 1 commit into from
Aug 6, 2021
Merged

point store size fix #265

merged 1 commit into from
Aug 6, 2021

Conversation

sudiptoguha
Copy link
Contributor

Description of changes: The dimensions parameter was multiplied to the size twice.

@sudiptoguha sudiptoguha requested review from jotok and wnbts August 6, 2021 20:58
@wnbts wnbts merged commit 1af8860 into aws:main Aug 6, 2021
kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Aug 9, 2021
We split and distribute models to different nodes to avoid large models on a single node. The splitting is unnecessary after introducing compact rcf as the model is smaller (at least 4x smaller). Splitting also undoes the shared point store optimization among trees. Also, splitting brings complications when computing expected values. Thus, this PR disables splitting by increasing the desired model size. We won't split a model whose size is less than the desired size.

This PR adjusts max features and shingle size accordingly to avoid huge models without explicit benefits.

This PR also adjusts the model size formula due to the change aws/random-cut-forest-by-aws#265. I will update the rcf version once rcf 2.0 is released in maven.

Testing done:
1. tested single-stream models won't be split after the change.
2. Updated unit tests.
kaituo added a commit to opensearch-project/anomaly-detection that referenced this pull request Aug 11, 2021
Disable model splitting in single-stream detectors

We split and distribute models to different nodes to avoid large models on a single node. The splitting is unnecessary after introducing compact rcf as the model is smaller (at least 4x smaller). Splitting also undoes the shared point store optimization among trees. Also, splitting brings complications when computing expected values. Thus, this PR disables splitting by increasing the desired model size. We won't split a model whose size is less than the desired size.

This PR adjusts max features and shingle size accordingly to avoid huge models without explicit benefits.

This PR also adjusts the model size formula due to the change aws/random-cut-forest-by-aws#265. I will update the rcf version once rcf 2.0 is released in maven.

Testing done:
1. tested single-stream models won't be split after the change.
2. Updated unit tests.
ohltyler pushed a commit to ohltyler/anomaly-detection-2 that referenced this pull request Sep 1, 2021
…t#162)

Disable model splitting in single-stream detectors

We split and distribute models to different nodes to avoid large models on a single node. The splitting is unnecessary after introducing compact rcf as the model is smaller (at least 4x smaller). Splitting also undoes the shared point store optimization among trees. Also, splitting brings complications when computing expected values. Thus, this PR disables splitting by increasing the desired model size. We won't split a model whose size is less than the desired size.

This PR adjusts max features and shingle size accordingly to avoid huge models without explicit benefits.

This PR also adjusts the model size formula due to the change aws/random-cut-forest-by-aws#265. I will update the rcf version once rcf 2.0 is released in maven.

Testing done:
1. tested single-stream models won't be split after the change.
2. Updated unit tests.
ohltyler pushed a commit to ohltyler/anomaly-detection-2 that referenced this pull request Sep 1, 2021
…t#162)

Disable model splitting in single-stream detectors

We split and distribute models to different nodes to avoid large models on a single node. The splitting is unnecessary after introducing compact rcf as the model is smaller (at least 4x smaller). Splitting also undoes the shared point store optimization among trees. Also, splitting brings complications when computing expected values. Thus, this PR disables splitting by increasing the desired model size. We won't split a model whose size is less than the desired size.

This PR adjusts max features and shingle size accordingly to avoid huge models without explicit benefits.

This PR also adjusts the model size formula due to the change aws/random-cut-forest-by-aws#265. I will update the rcf version once rcf 2.0 is released in maven.

Testing done:
1. tested single-stream models won't be split after the change.
2. Updated unit tests.
ohltyler pushed a commit to opensearch-project/anomaly-detection that referenced this pull request Sep 1, 2021
Disable model splitting in single-stream detectors

We split and distribute models to different nodes to avoid large models on a single node. The splitting is unnecessary after introducing compact rcf as the model is smaller (at least 4x smaller). Splitting also undoes the shared point store optimization among trees. Also, splitting brings complications when computing expected values. Thus, this PR disables splitting by increasing the desired model size. We won't split a model whose size is less than the desired size.

This PR adjusts max features and shingle size accordingly to avoid huge models without explicit benefits.

This PR also adjusts the model size formula due to the change aws/random-cut-forest-by-aws#265. I will update the rcf version once rcf 2.0 is released in maven.

Testing done:
1. tested single-stream models won't be split after the change.
2. Updated unit tests.
@sudiptoguha sudiptoguha deleted the pointstorefix branch June 29, 2022 00:40
hamersu9t added a commit to hamersu9t/anomaly-detection that referenced this pull request Aug 10, 2024
Disable model splitting in single-stream detectors

We split and distribute models to different nodes to avoid large models on a single node. The splitting is unnecessary after introducing compact rcf as the model is smaller (at least 4x smaller). Splitting also undoes the shared point store optimization among trees. Also, splitting brings complications when computing expected values. Thus, this PR disables splitting by increasing the desired model size. We won't split a model whose size is less than the desired size.

This PR adjusts max features and shingle size accordingly to avoid huge models without explicit benefits.

This PR also adjusts the model size formula due to the change aws/random-cut-forest-by-aws#265. I will update the rcf version once rcf 2.0 is released in maven.

Testing done:
1. tested single-stream models won't be split after the change.
2. Updated unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants