Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disttask: add operator abstraction #45927

Closed
wants to merge 21 commits into from
Closed

Conversation

ywqzzy
Copy link
Contributor

@ywqzzy ywqzzy commented Aug 9, 2023

What problem does this PR solve?

Issue Number: ref #41495

Problem Summary:

What is changed and how it works?

Operator

Operator can run business logic with multiple workers.
Note that we use workerPool to manage all workers.
Workers will handle task when there is a task send to it.

Data

Abstract DataSource and DataSink to handle the dataflow.

Pipeline

Pipeline is a chain of Operators.

The workflow is:

  1. add task to the first operator, then this operator start to handle tasks.
  2. When the task is handled, the operator send task to the next operator.

Moreover, we can read dataSource to add task. It will be implemented soon.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Aug 9, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Aug 9, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zanmato1984 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 9, 2023
@tiprow
Copy link

tiprow bot commented Aug 9, 2023

Hi @ywqzzy. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@purelind
Copy link
Contributor

purelind commented Aug 9, 2023

/retest

@tiprow
Copy link

tiprow bot commented Aug 9, 2023

@purelind: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@codecov
Copy link

codecov bot commented Aug 9, 2023

Codecov Report

Merging #45927 (c17dc02) into master (a0cbff2) will decrease coverage by 0.6708%.
Report is 59 commits behind head on master.
The diff coverage is 90.5511%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #45927        +/-   ##
================================================
- Coverage   73.3499%   72.6792%   -0.6708%     
================================================
  Files          1277       1303        +26     
  Lines        393392     399993      +6601     
================================================
+ Hits         288553     290712      +2159     
- Misses        86434      90810      +4376     
- Partials      18405      18471        +66     
Flag Coverage Δ
integration 25.5599% <0.0000%> (?)
unit 73.3556% <90.5511%> (+0.0056%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 54.0444% <ø> (ø)
parser 85.0548% <ø> (-0.0055%) ⬇️
br 47.6072% <ø> (-4.4240%) ⬇️

Comment on lines 24 to 26
source DataSource
sink DataSink
pool *workerpool.WorkerPool[T] // workers running on pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make them public, there's no need to have those getter/setter methods, those fields shouldn't be changed after initialized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pool shouldn't be part of a operator, the pipeline that execute those operators should use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pool shouldn't be part of a operator, the pipeline that execute those operators should use it.

Ok, pipeline hold the pools will be cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pool shouldn't be part of a operator, the pipeline that execute those operators should use it.

Ok, pipeline hold the pools will be cleaner.

I will do it when we support better way to build pipeline with plan.

Comment on lines 29 to 36
source0 := impl0.getSource()
source1 := impl1.getSource()
source2 := impl2.getSource()
sink := &simpleAsyncDataSink{0, 0, sync.Mutex{}}

impl0.setSink(source1.(DataSink))
impl1.setSink(source2.(DataSink))
impl2.setSink(sink)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're sure pipeline is a chain, better provide a method in pipeline to connect them in an easy way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're sure pipeline is a chain, better provide a method in pipeline to connect them in an easy way

Yep, I will try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're sure pipeline is a chain, better provide a method in pipeline to connect them in an easy way

I will do it when plan is supported.

@hawkingrei
Copy link
Member

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Aug 15, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Aug 15, 2023

@ywqzzy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/build c17dc02 link true /test build

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tiprow
Copy link

tiprow bot commented Aug 15, 2023

@ywqzzy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tiprow_fast_test c17dc02 link true /test tiprow_fast_test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

// Operator read data from DataSource, then process the read data.
type DataSource[T any] interface {
Start() error
Next() (T, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming it to Read()? It should be complementary to DataSink.Write().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming it to Read()? It should be complementary to DataSink.Write().

Ok.

@ywqzzy ywqzzy closed this Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants