-
Notifications
You must be signed in to change notification settings - Fork 53
Conversation
Since this is a new repo, should the API version start with v1alpha1? |
Also should the repo be named xgboost-operator to be consistent with others? |
+1 to rename this repo for consistency. We also mentioned this in kubeflow/community#247 (comment). Maybe @jlewi @richardsliu or someone else with access could help change it? |
pkg/apis/xgboost/v1beta1/types.go
Outdated
// active. | ||
RunPolicy *commmonv1.RunPolicy `json:"runPolicy,omitempty"` | ||
|
||
// XGBReplicaSpec specifies the PyTorch replicas to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytorch -> typo here
pkg/apis/xgboost/v1beta1/types.go
Outdated
RunPolicy *commmonv1.RunPolicy `json:"runPolicy,omitempty"` | ||
|
||
// XGBReplicaSpec specifies the PyTorch replicas to run. | ||
XGBReplicaSpec *commmonv1.ReplicaSpec `json:"xgbReplicaSpec"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things here:
XGBReplicaSpec
should be pluralXGBReplicaSpecs
- You might want to define the corresponding
XGBReplicaType
so thatXGBReplicaSpecs
is a map, e.g.map[XGBReplicaType]*ReplicaSpec
- Should we consider using full name XGBoost or XGB here? I'd suggest we keep it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since distributed XGBoost run in AllReduce way, users are not expected to define master spec
and worker spec
separately. So, we only need a single ReplicaSpec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw master and worker replicas in the original proposal kubeflow/community#247 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The master is selected from the cluster like the pytorch job, thus, we can leave the replica for the master like the pytorch job here. it is possible that users want to more than one replica of master. For the worker. the replica is required for fault-tolerance .
@@ -0,0 +1,84 @@ | |||
ignored = ["github.com/kubeflow/xgboost"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need all these dependencies here yet? I'd suggest we add them when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove unused dep.
pkg/apis/xgboost/v1beta1/types.go
Outdated
// Read-only. | ||
Status commmonv1.JobStatus `json:"status,omitempty"` | ||
|
||
commmonv1.CleanPodPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't CleanPodPolicy also part of commmonv1.RunPolicy? Why is it separately added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh~Its a mistake, I'll remove it.
LGTM! good start ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* build bolierplate * add verify-codegen.sh * some modifications * fix update-codegen.sh * add register/defaults/constants for v1alpha1
I'm trying to build the skeleton of the project by following works:
XGBoostJob
andXGBoostJobSpec
XGBoostJob
andXGBoostJobSpec
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)