Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic types 2 effective type #15022

Merged
merged 22 commits into from
Mar 15, 2021
Merged

Conversation

dpsutton
Copy link
Contributor

@dpsutton dpsutton commented Mar 1, 2021

add reliance on effective_type in the query processor. Things get a bit dicey with annotate and inferring types. sometimes we have a proper field definition and can say what the effective type and conversion strategy is. othertimes we just guess so effective type doesn't really make sense?

haven't gone through to excise the use of base_type. In the future, we should have only a few places that actually care about base_type when emitting otherwise everything should care about the effective_type.

UI Changes

To set the coercion strategy its moved underneath the gear cog on the data model for fields:
image

image

```clojure
user> (def config (metabase.db.spec/h2 {:db (str "/Users/dan/projects/clojure/metabase/resources/sample-dataset.db"
                                                 ";UNDO_LOG=0;CACHE_SIZE=131072;QUERY_CACHE_SIZE=128;COMPRESS=TRUE;"
                                                 "MULTI_THREADED=TRUE;MVCC=TRUE;DEFRAG_ALWAYS=TRUE;MAX_COMPACT_TIME=5000;"
                                                 "ANALYZE_AUTO=100")}))
user> (jdbc/execute! config ["UPDATE _metabase_metadata
                        SET keypath = 'PEOPLE.ZIP.semantic_type'
                        WHERE keypath = 'PEOPLE.ZIP.special_type'" ])
[1]
user> (jdbc/execute! config ["UPDATE _metabase_metadata
                        SET keypath = 'REVIEWS.BODY.semantic_type'
                        WHERE keypath = 'REVIEWS.BODY.special_type'" ])
[1]
```
locally i have some failing tests that are off by 1 errors:

Fail in compile-time-interval-test

�[36m:mongo�[0m Make sure time-intervals work the way they're supposed to. [:time-interval $date -4 :month] should give us something like Oct 01 2020 - Feb 01 2021 if today is Feb 17 2021

expected: [{$match {$and [{:$expr {$gte [$date {:$dateFromString {:dateString 2020-10-01T00:00Z}}]}} {:$expr {$lt [$date {:$dateFromString {:dateString 2021-02-01T00:00Z}}]}}]}} {$group {_id {date~~~day {:$let {:vars {:parts {:$dateToParts {:date $date}}}, :in {:$dateFromParts {:year $$parts.year, :month $$parts.month, :day $$parts.day}}}}}}} {$sort {_id 1}} {$project {_id false, date~~~day $_id.date~~~day}} {$sort {date~~~day 1}} {$limit 1048576}]

  actual: [{"$match"
            {"$and"
             [{:$expr {"$gte" ["$date" {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
              {:$expr {"$lt" ["$date" {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}
           {"$group"
            {"_id"
             {"date~~~day"
              {:$let
               {:vars {:parts {:$dateToParts {:date "$date"}}},
                :in {:$dateFromParts {:year "$$parts.year", :month "$$parts.month", :day "$$parts.day"}}}}}}}
           {"$sort" {"_id" 1}}
           {"$project" {"_id" false, "date~~~day" "$_id.date~~~day"}}
           {"$sort" {"date~~~day" 1}}
           {"$limit" 1048576}]
    diff: - [{"$match"
              {"$and"
               [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-10-01T00:00Z"}}]}}
                {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-02-01T00:00Z"}}]}}]}}]
          + [{"$match"
              {"$and"
               [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
                {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}]
@codecov
Copy link

codecov bot commented Mar 10, 2021

Codecov Report

Merging #15022 (ae50d74) into master (6b4930f) will decrease coverage by 0.01%.
The diff coverage is 89.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15022      +/-   ##
==========================================
- Coverage   84.30%   84.28%   -0.02%     
==========================================
  Files         389      389              
  Lines       30660    30704      +44     
  Branches     2198     2202       +4     
==========================================
+ Hits        25848    25880      +32     
- Misses       2614     2622       +8     
- Partials     2198     2202       +4     
Impacted Files Coverage Δ
...c/metabase/query_processor/middleware/annotate.clj 81.19% <ø> (-0.06%) ⬇️
src/metabase/models/field.clj 86.71% <40.00%> (-1.29%) ⬇️
src/metabase/driver/sql/query_processor.clj 91.09% <70.00%> (ø)
src/metabase/api/field.clj 79.91% <80.00%> (-0.53%) ⬇️
shared/src/metabase/types.cljc 96.15% <87.50%> (-1.60%) ⬇️
backend/mbql/src/metabase/mbql/normalize.clj 90.48% <100.00%> (ø)
src/metabase/driver/common/parameters/values.clj 82.96% <100.00%> (ø)
...rc/metabase/driver/sql/parameters/substitution.clj 78.78% <100.00%> (-0.32%) ⬇️
src/metabase/driver/sql_jdbc.clj 93.93% <100.00%> (ø)
src/metabase/models/field_values.clj 87.50% <100.00%> (ø)
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b4930f...ae50d74. Read the comment docs.

@dpsutton dpsutton marked this pull request as ready for review March 10, 2021 16:31
@dpsutton dpsutton requested review from camsaul and robdaemon March 10, 2021 16:31
(s/optional-key :field-comment) (s/maybe su/NonBlankString)
(s/optional-key :pk?) s/Bool
(s/optional-key :nested-fields) #{(s/recursive #'TableMetadataField)}
(s/optional-key :custom) {s/Any s/Any}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: we should probably make this schema open with s/Keyword s/Any so new keys we add in the future don't break things. e.g. if a driver starts returning :effective-type it wouldn't work on Metabase < 0.39.0. Not that changing the schema in 0.39.0 would make any difference, but in the future it would be easier for driver authors to take advantage of new features and still support older MB versions

Comment on lines 32 to 35
(defn- to-field [{:keys [base_type] :as field}]
(map->FieldInstance
(merge {:coercion_strategy nil, :effective_type base_type, :semantic_type nil}
field)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use mt/derecordize instead to make FieldInstances regular maps for test purposes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also mt/object-defaults that add default values for keys for an object, so we don't have to repeat things like this all over the code or go update a bunch of tests when we add new columns. I'd use that instead

(s/optional-key :coercion-strategy) (s/maybe su/CoercionStrategy)
(s/optional-key :visibility-type) (s/maybe (apply s/enum field/visibility-types))
(s/optional-key :fk) (s/maybe su/KeywordOrString)
(s/optional-key :field-comment) (s/maybe su/NonBlankString)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non really part of this PR, but it wouldn't be a bad idea to open this schema up either

Copy link
Member

@camsaul camsaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes but overall LGTM

@dpsutton
Copy link
Contributor Author

Hoo that was fun chasing down the failing druid tests. We have a close but not perfect version of the test-data that's flattened on druid. When we're creating databases, we call add-extra-metadata! with the test-data definition. But this dataset includes a column called "date". But our druid test database doesn't include this column:
image

This has knock on effects because if you aren't adding any extra metadata there is no problem, as looking for the field is behind a delay. However, if you add metadata you blow up, preventing subsequent extra metadata being added and then downstream tests start failing.

This one never got the message that one of the fields was sensitive and therefore was showing up in some query results where it wasn't expected.

it actually has knock on effects:
- does more work now as almost every field has an update to do in
`add-extra-metadata`
- we have databases that have state that we don't create. druid for
example has stuff to mimic the dataset in tqpt/with-flattened-dbdef on
checkins but we don't actually create this. And our dbdef has a field
called "date" that is not present in the druid db, so if we attempt to
add metadata it fails and kills the rest of the metadata that we add.
- tests need this metadata to be present and the error causes field
visibilities (for example) to not be set
@dpsutton
Copy link
Contributor Author

Lol i think those tests failed because of the time change last night.

object details didn't work out well here. they added way more stuff
from the db than what is flowing through here.

```clojure
  actual: {:field
           {:name "DATE",
            :parent_id nil,
            :table_id 69,
            :base_type :type/Date,
            :effective_type :type/Date,
            :coercion_strategy nil,
            :semantic_type nil},
           :value {:type :date/all-options, :value "past5days"}}
    diff: - {:field
             {:description nil,
              :database_type "VARCHAR",
              :fingerprint_version 0,
              :has_field_values nil,
              :settings nil,
              :caveats nil,
              :fk_target_field_id nil,
              :custom_position 0,
              :active true,
              :last_analyzed nil,
              :position 1,
              :visibility_type :normal,
              :preview_display true,
              :database_position 0,
              :fingerprint nil,
              :points_of_interest nil}}
```

Object defaults adds quite a bit of stuff such that we'd be dissoc'ing
more than we are currently adding in
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants