-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for BigQuery Models. #2039
Conversation
danielgsims
commented
Jun 12, 2019
- Updated discovery doc for bigquery
- Created model class
- Added model and models methods to BigQuery client
- Added network calls to ConnectionInterface and Rest connection
@dwsupplee @jdpedrie - Here's the first draft of ML Models from BigQuery; opening as a draft PR like we discussed today. I think one change I'd like to make is moving the fetch model functionality to dataset instead of being part of the BigQueryClient class since semantically it's a child of the model. Are there any other suggestions, feedback or advice you two have before wrapping this up for a PR in the next couple of days? |
See #1968 |
Codecov Report
@@ Coverage Diff @@
## master #2039 +/- ##
============================================
+ Coverage 92.57% 92.62% +0.04%
- Complexity 4379 4402 +23
============================================
Files 304 305 +1
Lines 13014 13078 +64
============================================
+ Hits 12048 12113 +65
+ Misses 966 965 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good so far!
BigQuery/src/Model.php
Outdated
public function update(array $metadata, array $options = []) | ||
{ | ||
$options = $this->applyEtagHeader( | ||
$options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: $options
should be the last array in a list of merged arrays. That'll prevent the user from overriding something you set specifically in the client code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me but Dataset has it written this way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the change, maybe we should think about making the change in Dataset in the future too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that's just old code. I try to fix it when I touch the methods where it's in a less than optimal order.
b3cd981
to
b8c8b8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice, @danielgsims. Thanks for this :). Mostly housekeeping related comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. mostly nits and small changes here.
BigQuery/src/Dataset.php
Outdated
return new ItemIterator( | ||
new PageIterator( | ||
function (array $model) { | ||
return $this->model($model['modelReference']['modelId']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't raise this earlier, but the API call returns the model resource, it seems reasonable to inject that into the instance. That way, calls to Model::info()
wouldn't require a second service call. If we don't hydrate the instance when we have the resource handy, there's no sense having $info
as a Model
constructor argument. This would require adding an $info
arg to the Dataset::model()
method, or you could do something like what we have in pubsub with a private factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey John, I ran into this one while developing. The data that is returned form the List call is not the full set of data that is returned when fetching the individual model resource. It seemed better to not include it so that you would consistently have the same data. In that case, should I just ditch it from the constructor and allow it to hydrate lazily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed adding info from the constructor for now since it should always be triggered via the reload method. I'll keep an eye on this if the BQ team modifies the Models return response to include the full set of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Yeah, I think we can probably remove the argument if it’s unused. It’s nothing to add it back later if the situation changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The models.list documentation says the following:
Only the following fields are populated: modelReference, modelType, creationTime, lastModifiedTime and labels.
Which leads me to believe this will be a permanent decision to only return a subset of the information on list calls. Since the information returned appears useful to users, I'd like to propose we populate the Model
object with the information when we have it, with documentation to let users know only a subset of the data is returned and to call reload
on the Model
if they would like access to all of the information.
BigQuery/src/Model.php
Outdated
public function exists(array $options = []) | ||
{ | ||
try { | ||
$this->connection->getModel($this->identity + $options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there are examples of direct connection calls in exists()
, but I prefer calling Model::reload($options)
here, and think we should make an effort to replace those eventually. Segregate connection calls as much as possible, and hydrate the instance in case this call is coming ahead of info()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that better. I can change the other parts of BigQuery in a future PR.
* Updated discovery doc for bigquery * Created model class * Added model and models methods to BigQuery\Dataset client * Added network calls to ConnectionInterface and Rest connection
f3d2b8e
to
a64d07e
Compare
@danielgsims can you please ping me when this is ready for review? |
// Fetch the model and wait for it to be trained | ||
$model = self::$dataset->model($modelId); | ||
|
||
$backoff = new ExponentialBackoff(8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this dead code then? Can we remove it please?
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the ℹ️ Googlers: Go here for more info. |
@dwsupplee addressed your latest comments; PTAL! |
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) ℹ️ Googlers: Go here for more info. |
BigQuery/src/Model.php
Outdated
@@ -54,20 +54,26 @@ public function __construct( | |||
ConnectionInterface $connection, | |||
$id, | |||
$datasetId, | |||
$projectId | |||
$projectId, | |||
$info = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please typehint to an array and add documentation.
BigQuery/src/Dataset.php
Outdated
); | ||
} | ||
|
||
/** | ||
* Fetches all of the models in the dataset. | ||
* | ||
* Please note that Model instances created by list calls may not contain a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please be explicit about what fields are returned.
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
BigQuery/src/Model.php
Outdated
* @param string $id The model's ID. | ||
* @param string $datasetId The dataset's ID. | ||
* @param string $projectId The project's ID. | ||
* @param array $info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more minor nit, could we please add a description?
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) ℹ️ Googlers: Go here for more info. |