Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for BigQuery Models. #2039

Merged
merged 5 commits into from
Jul 22, 2019
Merged

feat: Add support for BigQuery Models. #2039

merged 5 commits into from
Jul 22, 2019

Conversation

danielgsims
Copy link
Contributor

  • Updated discovery doc for bigquery
  • Created model class
  • Added model and models methods to BigQuery client
  • Added network calls to ConnectionInterface and Rest connection

@danielgsims danielgsims added api: bigquery Issues related to the BigQuery API. needs work This is a pull request that needs a little love. labels Jun 12, 2019
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jun 12, 2019
@danielgsims
Copy link
Contributor Author

@dwsupplee @jdpedrie - Here's the first draft of ML Models from BigQuery; opening as a draft PR like we discussed today.

I think one change I'd like to make is moving the fetch model functionality to dataset instead of being part of the BigQueryClient class since semantically it's a child of the model.

Are there any other suggestions, feedback or advice you two have before wrapping this up for a PR in the next couple of days?

@danielgsims
Copy link
Contributor Author

See #1968

@codecov
Copy link

codecov bot commented Jun 12, 2019

Codecov Report

Merging #2039 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2039      +/-   ##
============================================
+ Coverage     92.57%   92.62%   +0.04%     
- Complexity     4379     4402      +23     
============================================
  Files           304      305       +1     
  Lines         13014    13078      +64     
============================================
+ Hits          12048    12113      +65     
+ Misses          966      965       -1
Impacted Files Coverage Δ Complexity Δ
BigQuery/src/Connection/Rest.php 100% <100%> (ø) 26 <4> (+4) ⬆️
BigQuery/src/Model.php 100% <100%> (ø) 12 <12> (?)
BigQuery/src/Dataset.php 98.94% <100%> (+0.21%) 21 <3> (+3) ⬆️
PubSub/src/Subscription.php 100% <0%> (ø) 35% <0%> (+5%) ⬆️
PubSub/src/Connection/Grpc.php 97.56% <0%> (+0.48%) 37% <0%> (-1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27be9ef...9167426. Read the comment docs.

Copy link
Contributor

@jdpedrie jdpedrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far!

public function update(array $metadata, array $options = [])
{
$options = $this->applyEtagHeader(
$options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: $options should be the last array in a list of merged arrays. That'll prevent the user from overriding something you set specifically in the client code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me but Dataset has it written this way!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the change, maybe we should think about making the change in Dataset in the future too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that's just old code. I try to fix it when I touch the methods where it's in a less than optimal order.

@danielgsims danielgsims removed the needs work This is a pull request that needs a little love. label Jun 18, 2019
Copy link
Contributor

@dwsupplee dwsupplee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking nice, @danielgsims. Thanks for this :). Mostly housekeeping related comments.

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Jun 19, 2019
@dwsupplee dwsupplee added needs work This is a pull request that needs a little love. and removed 🚨 This issue needs some love. labels Jun 20, 2019
@danielgsims danielgsims marked this pull request as ready for review June 25, 2019 13:11
@danielgsims danielgsims removed the needs work This is a pull request that needs a little love. label Jun 25, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Jun 25, 2019
@dwsupplee dwsupplee added needs work This is a pull request that needs a little love. and removed 🚨 This issue needs some love. labels Jun 25, 2019
Copy link
Contributor

@jdpedrie jdpedrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. mostly nits and small changes here.

return new ItemIterator(
new PageIterator(
function (array $model) {
return $this->model($model['modelReference']['modelId']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't raise this earlier, but the API call returns the model resource, it seems reasonable to inject that into the instance. That way, calls to Model::info() wouldn't require a second service call. If we don't hydrate the instance when we have the resource handy, there's no sense having $info as a Model constructor argument. This would require adding an $info arg to the Dataset::model() method, or you could do something like what we have in pubsub with a private factory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey John, I ran into this one while developing. The data that is returned form the List call is not the full set of data that is returned when fetching the individual model resource. It seemed better to not include it so that you would consistently have the same data. In that case, should I just ditch it from the constructor and allow it to hydrate lazily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed adding info from the constructor for now since it should always be triggered via the reload method. I'll keep an eye on this if the BQ team modifies the Models return response to include the full set of data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Yeah, I think we can probably remove the argument if it’s unused. It’s nothing to add it back later if the situation changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The models.list documentation says the following:

Only the following fields are populated: modelReference, modelType, creationTime, lastModifiedTime and labels.

Which leads me to believe this will be a permanent decision to only return a subset of the information on list calls. Since the information returned appears useful to users, I'd like to propose we populate the Model object with the information when we have it, with documentation to let users know only a subset of the data is returned and to call reload on the Model if they would like access to all of the information.

public function exists(array $options = [])
{
try {
$this->connection->getModel($this->identity + $options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there are examples of direct connection calls in exists(), but I prefer calling Model::reload($options) here, and think we should make an effort to replace those eventually. Segregate connection calls as much as possible, and hydrate the instance in case this call is coming ahead of info().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that better. I can change the other parts of BigQuery in a future PR.

  * Updated discovery doc for bigquery
  * Created model class
  * Added model and models methods to BigQuery\Dataset client
  * Added network calls to ConnectionInterface and Rest connection
@jdpedrie jdpedrie changed the title BigQuery Models feat: Add support for BigQuery Models. Jun 28, 2019
@danielgsims danielgsims removed the needs work This is a pull request that needs a little love. label Jun 29, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Jun 29, 2019
@dwsupplee
Copy link
Contributor

@danielgsims can you please ping me when this is ready for review?

@yoshi-automation yoshi-automation removed the 🚨 This issue needs some love. label Jul 3, 2019
// Fetch the model and wait for it to be trained
$model = self::$dataset->model($modelId);

$backoff = new ExponentialBackoff(8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dead code then? Can we remove it please?

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Jul 16, 2019
@jdpedrie
Copy link
Contributor

@dwsupplee addressed your latest comments; PTAL!

@jdpedrie jdpedrie added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 16, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 16, 2019
@dwsupplee dwsupplee added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Jul 19, 2019
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

@@ -54,20 +54,26 @@ public function __construct(
ConnectionInterface $connection,
$id,
$datasetId,
$projectId
$projectId,
$info = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please typehint to an array and add documentation.

);
}

/**
* Fetches all of the models in the dataset.
*
* Please note that Model instances created by list calls may not contain a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please be explicit about what fields are returned.

@jdpedrie jdpedrie dismissed their stale review July 19, 2019 20:07

passed off to dws.

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Jul 19, 2019
* @param string $id The model's ID.
* @param string $datasetId The dataset's ID.
* @param string $projectId The project's ID.
* @param array $info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more minor nit, could we please add a description?

@dwsupplee dwsupplee added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Jul 19, 2019
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

@jdpedrie jdpedrie merged commit e2e006f into master Jul 22, 2019
@jdpedrie jdpedrie deleted the bigquery-models branch July 22, 2019 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants