Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Generic ML Command in PPL #971

Merged
merged 5 commits into from
Oct 31, 2022
Merged

A Generic ML Command in PPL #971

merged 5 commits into from
Oct 31, 2022

Conversation

jngz-es
Copy link
Contributor

@jngz-es jngz-es commented Oct 26, 2022

Description

A generic ml command in ppl to apply algorithms in ml-commons.

Issues Resolved

#849

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jngz-es jngz-es requested a review from a team as a code owner October 26, 2022 19:22
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2022

Codecov Report

Merging #971 (f2bfe22) into 2.x (634e2ff) will decrease coverage by 34.84%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##                2.x     #971       +/-   ##
=============================================
- Coverage     97.60%   62.76%   -34.85%     
=============================================
  Files           308       10      -298     
  Lines          7983      658     -7325     
  Branches        520      119      -401     
=============================================
- Hits           7792      413     -7379     
- Misses          190      192        +2     
- Partials          1       53       +52     
Flag Coverage Δ
query-workbench 62.76% <ø> (?)
sql-engine ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ain/java/org/opensearch/sql/analysis/Analyzer.java
...a/org/opensearch/sql/analysis/TypeEnvironment.java
...ch/sql/planner/logical/LogicalPlanNodeVisitor.java
.../sql/planner/physical/PhysicalPlanNodeVisitor.java
...a/org/opensearch/sql/utils/MLCommonsConstants.java
...ecutor/protector/OpenSearchExecutionProtector.java
...rch/planner/physical/MLCommonsOperatorActions.java
...search/sql/opensearch/storage/OpenSearchIndex.java
...java/org/opensearch/sql/ppl/parser/AstBuilder.java
...l/opensearch/data/value/OpenSearchExprIpValue.java
... and 308 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@penghuo penghuo added the v2.4.0 'Issues and PRs related to version v2.4.0' label Oct 26, 2022
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running git checkout sql-jdbc/docs/img/tableau_graph.PNG restore deleted file. reference. #865 (comment)

@@ -411,6 +412,20 @@ public UnresolvedPlan visitAdCommand(AdCommandContext ctx) {
return new AD(builder.build());
}

/**
* Kmeans command.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will correct it to ml.

@@ -148,6 +148,14 @@ adParameter
| (ANOMALY_SCORE_THRESHOLD EQUAL anomaly_score_threshold=decimalLiteral)
;

mlCommand
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does ml also no arguments? e.g. source = index | ml. The syntax actaully allow this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no action or algorithm arguments, it will throw exception. I don't think we have to restrict the command content at command parsing stage, we leave all parameters validation to ml-client.

@Getter
@ToString
@EqualsAndHashCode(callSuper = true)
public class LogicalML extends LogicalPlan {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan deprecated Kmeans/AD in Logical plan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will deprecate them in the following version like 2.5, then remove them in 3.0.

public Map<String, ExprCoreType> getOutputSchema(TypeEnvironment env) {
switch (getAction()) {
case TRAIN:
env.clearAllFields();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why clean all fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ml train command will return only model/task id, and status, so remove all fields from input fields.


LogicalPlan actual = analyze(AstDSL.project(
new ML(AstDSL.relation("schema"), argumentMap), AstDSL.allFields()));
assertTrue(((LogicalProject) actual).getProjectList().size() >= 2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why >= 2, does the result non deterministic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is deterministic, just at least 2.

* @param nodeClient node client
* @return ml-commons result
*/
protected MLOutput getMLOutput(DataFrame inputDataFrame,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also deprecated getMLPredictionResult if it is not required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required for now, we can remove it when removing kmeans and ad commands.

*/
@RequiredArgsConstructor
@EqualsAndHashCode(callSuper = false)
public class MLOperator extends MLCommonsOperatorActions {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is difference with MLCommonsOperator, deptecated MLCommonsOperator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLOperator is totally new operator for all ml algorithms from ml-commons. I don't want to mix it with old operator as there is a big gap between them, and also for smoothly deprecating old operator in the future.

@dai-chen dai-chen added ml Issues related to integration with ML commons and plugin PPL Piped processing language labels Oct 27, 2022
@jngz-es
Copy link
Contributor Author

jngz-es commented Oct 28, 2022

Didn' t realized it, will recover it.

@jngz-es jngz-es requested a review from penghuo October 28, 2022 20:52
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!
Please add ML commands related doc. And add deprecated describtion for AD and Kmeans command.

@dai-chen dai-chen merged commit c6b234c into opensearch-project:2.x Oct 31, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 31, 2022
* Add generic ml command in ppl.

Signed-off-by: Jing Zhang <[email protected]>

* Recover ml client dependency.

Signed-off-by: Jing Zhang <[email protected]>

* Address the comments I.

Signed-off-by: Jing Zhang <[email protected]>

Signed-off-by: Jing Zhang <[email protected]>
(cherry picked from commit c6b234c)
penghuo pushed a commit that referenced this pull request Nov 1, 2022
* Address the comments I.

Signed-off-by: Jing Zhang <[email protected]>

Signed-off-by: Jing Zhang <[email protected]>
(cherry picked from commit c6b234c)

Co-authored-by: Jing Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.4 ml Issues related to integration with ML commons and plugin PPL Piped processing language v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants