Skip to content

Commit

Permalink
Semantic types 2 effective type (metabase#15022)
Browse files Browse the repository at this point in the history
* First pass using coercions

* Coercions

* Handle effective_type coercion_strategy in test data sets

* special-type -> semantic type in sample db

```clojure
user> (def config (metabase.db.spec/h2 {:db (str "/Users/dan/projects/clojure/metabase/resources/sample-dataset.db"
                                                 ";UNDO_LOG=0;CACHE_SIZE=131072;QUERY_CACHE_SIZE=128;COMPRESS=TRUE;"
                                                 "MULTI_THREADED=TRUE;MVCC=TRUE;DEFRAG_ALWAYS=TRUE;MAX_COMPACT_TIME=5000;"
                                                 "ANALYZE_AUTO=100")}))
user> (jdbc/execute! config ["UPDATE _metabase_metadata
                        SET keypath = 'PEOPLE.ZIP.semantic_type'
                        WHERE keypath = 'PEOPLE.ZIP.special_type'" ])
[1]
user> (jdbc/execute! config ["UPDATE _metabase_metadata
                        SET keypath = 'REVIEWS.BODY.semantic_type'
                        WHERE keypath = 'REVIEWS.BODY.special_type'" ])
[1]
```

* Correct mismatch in validation preventing sync

* fixing up alternative date tests

* More passing tests

* Tests for values, nested queries, fetch metadata

* tests

* tests passing

* Fixup mongo qp for coercions

locally i have some failing tests that are off by 1 errors:

Fail in compile-time-interval-test

�[36m:mongo�[0m Make sure time-intervals work the way they're supposed to. [:time-interval $date -4 :month] should give us something like Oct 01 2020 - Feb 01 2021 if today is Feb 17 2021

expected: [{$match {$and [{:$expr {$gte [$date {:$dateFromString {:dateString 2020-10-01T00:00Z}}]}} {:$expr {$lt [$date {:$dateFromString {:dateString 2021-02-01T00:00Z}}]}}]}} {$group {_id {date~~~day {:$let {:vars {:parts {:$dateToParts {:date $date}}}, :in {:$dateFromParts {:year $$parts.year, :month $$parts.month, :day $$parts.day}}}}}}} {$sort {_id 1}} {$project {_id false, date~~~day $_id.date~~~day}} {$sort {date~~~day 1}} {$limit 1048576}]

  actual: [{"$match"
            {"$and"
             [{:$expr {"$gte" ["$date" {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
              {:$expr {"$lt" ["$date" {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}
           {"$group"
            {"_id"
             {"date~~~day"
              {:$let
               {:vars {:parts {:$dateToParts {:date "$date"}}},
                :in {:$dateFromParts {:year "$$parts.year", :month "$$parts.month", :day "$$parts.day"}}}}}}}
           {"$sort" {"_id" 1}}
           {"$project" {"_id" false, "date~~~day" "$_id.date~~~day"}}
           {"$sort" {"date~~~day" 1}}
           {"$limit" 1048576}]
    diff: - [{"$match"
              {"$and"
               [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-10-01T00:00Z"}}]}}
                {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-02-01T00:00Z"}}]}}]}}]
          + [{"$match"
              {"$and"
               [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
                {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}]

* ee fixes

* UI to set coercion type

* Don't need to populate effective-type here

it actually has knock on effects:
- does more work now as almost every field has an update to do in
`add-extra-metadata`
- we have databases that have state that we don't create. druid for
example has stuff to mimic the dataset in tqpt/with-flattened-dbdef on
checkins but we don't actually create this. And our dbdef has a field
called "date" that is not present in the druid db, so if we attempt to
add metadata it fails and kills the rest of the metadata that we add.
- tests need this metadata to be present and the error causes field
visibilities (for example) to not be set

* Docstrings on shared lib

* Add effective and coercion to redshift expectations

* Fixup google analytics

* Derecordize instead of recordize the expectation

object details didn't work out well here. they added way more stuff
from the db than what is flowing through here.

```clojure
  actual: {:field
           {:name "DATE",
            :parent_id nil,
            :table_id 69,
            :base_type :type/Date,
            :effective_type :type/Date,
            :coercion_strategy nil,
            :semantic_type nil},
           :value {:type :date/all-options, :value "past5days"}}
    diff: - {:field
             {:description nil,
              :database_type "VARCHAR",
              :fingerprint_version 0,
              :has_field_values nil,
              :settings nil,
              :caveats nil,
              :fk_target_field_id nil,
              :custom_position 0,
              :active true,
              :last_analyzed nil,
              :position 1,
              :visibility_type :normal,
              :preview_display true,
              :database_position 0,
              :fingerprint nil,
              :points_of_interest nil}}
```

Object defaults adds quite a bit of stuff such that we'd be dissoc'ing
more than we are currently adding in

Co-authored-by: Cam Saul <[email protected]>
  • Loading branch information
dpsutton and camsaul authored Mar 15, 2021
1 parent 6b4930f commit 6b8ddc8
Show file tree
Hide file tree
Showing 58 changed files with 576 additions and 270 deletions.
2 changes: 1 addition & 1 deletion backend/mbql/src/metabase/mbql/normalize.clj
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@
"Normalize source/results metadata for a single column."
[metadata]
{:pre [(map? metadata)]}
(-> (reduce #(m/update-existing %1 %2 keyword) metadata [:base_type :semantic_type :visibility_type :source :unit])
(-> (reduce #(m/update-existing %1 %2 keyword) metadata [:base_type :effective_type :semantic_type :visibility_type :source :unit])
(m/update-existing :field_ref (comp canonicalize-mbql-clauses normalize-tokens))
(m/update-existing :fingerprint walk/keywordize-keys)))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,21 +166,25 @@
[:> $date [:absolute-datetime #t "2014-01-01T00:00Z[UTC]" :default]]
[:=
$user_id
[:value 5 {:base_type :type/Integer
:semantic_type :type/FK
:database_type "INTEGER"
:name "USER_ID"}]]]
[:value 5 {:base_type :type/Integer
:effective_type :type/Integer
:coercion_strategy nil
:semantic_type :type/FK
:database_type "INTEGER"
:name "USER_ID"}]]]
::row-level-restrictions/gtap? true}
:joins [{:source-query
{:source-table $$venues
:fields [$venues.id $venues.name $venues.category_id
$venues.latitude $venues.longitude $venues.price]
:filter [:=
$venues.price
[:value 1 {:base_type :type/Integer
:semantic_type :type/Category
:database_type "INTEGER"
:name "PRICE"}]]
[:value 1 {:base_type :type/Integer
:effective_type :type/Integer
:coercion_strategy nil
:semantic_type :type/Category
:database_type "INTEGER"
:name "PRICE"}]]
::row-level-restrictions/gtap? true}
:alias "v"
:strategy :left-join
Expand Down Expand Up @@ -762,6 +766,8 @@
(is (= [:=
[:field (mt/id :products :category) {:join-alias "products"}]
[:value "Widget" {:base_type :type/Text
:effective_type :type/Text
:coercion_strategy nil
:semantic_type (db/select-one-field :semantic_type Field
:id (mt/id :products :category))
:database_type "VARCHAR"
Expand Down
32 changes: 32 additions & 0 deletions frontend/src/metabase/admin/datamodel/containers/FieldApp.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ import { LeftNavPane, LeftNavPaneItem } from "metabase/components/LeftNavPane";
import Section, { SectionHeader } from "../components/Section";
import SelectSeparator from "../components/SelectSeparator";

import { is_coerceable, coercions_for_type } from "cljs/metabase.types";
import { isFK } from "metabase/lib/types";

import {
FieldVisibilityPicker,
SemanticTypeAndTargetPicker,
Expand Down Expand Up @@ -313,6 +316,35 @@ const FieldGeneralPane = ({
/>
</Section>

{!isFK(field.semantic_type) && is_coerceable(field.base_type) && (
<Section>
<SectionHeader title={t`Coercion`} />
<Select
className="inline-block"
placeholder={t`Select a conversion`}
searchProp="name"
value={field.coercion_strategy}
onChange={({ target: { value } }) =>
onUpdateFieldProperties({
coercion_strategy: value,
})
}
options={[
...coercions_for_type(field.base_type).map(c => ({
id: c,
name: c,
})),
{
id: null,
name: t`No coercion strategy`,
},
]}
optionValueFn={field => field.id}
optionNameFn={field => field.name.replace("Coercion/", "")}
optionIconFn={field => null}
/>
</Section>
)}
<Section>
<SectionHeader
title={t`Filtering on this field`}
Expand Down
Binary file modified frontend/test/__runner__/test_db_fixture.db.mv.db
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,10 @@
(memoize column-metadata))

(defn- add-col-metadata [{database-id :database} col]
(merge col (memoized-column-metadata (u/get-id database-id) (:name col))))
(let [{:keys [base_type] :as metadata} (merge col (memoized-column-metadata (u/get-id database-id) (:name col)))]
(cond-> metadata
(and base_type (not (:effective_type metadata)))
(assoc :effective_type base_type))))

(def ^:const ga-type->base-type
"Map of Google Analytics field types to Metabase types."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -296,21 +296,24 @@
:status :completed
:data {:rows [["Toucan Sighting" 1000]]
:native_form expected-ga-query
:cols [{:description "This is ga:eventLabel"
:semantic_type nil
:name "ga:eventLabel"
:settings nil
:source :breakout
:parent_id nil
:visibility_type :normal
:display_name "ga:eventLabel"
:fingerprint nil
:base_type :type/Text}
:cols [{:description "This is ga:eventLabel"
:semantic_type nil
:name "ga:eventLabel"
:settings nil
:source :breakout
:parent_id nil
:visibility_type :normal
:display_name "ga:eventLabel"
:fingerprint nil
:base_type :type/Text
:effective_type :type/Text
:coercion_strategy nil}
{:name "metric"
:display_name "ga:totalEvents"
:source :aggregation
:description "This is ga:totalEvents"
:base_type :type/Text}]
:base_type :type/Text
:effective_type :type/Text}]
:results_timezone system-timezone-id}}
(-> (tu/doall-recursive (qp query))
(update-in [:data :cols] #(for [col %]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
t)))

(defn- param-value->str
[{semantic-type :semantic_type, :as field} x]
[{coercion :coercion_strategy, :as field} x]
(cond
;; sequences get converted to `$in`
(sequential? x)
Expand All @@ -36,11 +36,11 @@
(param-value->str field (u.date/parse (:s x)))

(and (instance? Temporal x)
(isa? semantic-type :type/UNIXTimestampSeconds))
(isa? coercion :Coercion/UNIXSeconds->DateTime))
(long (/ (t/to-millis-from-epoch (->utc-instant x)) 1000))

(and (instance? Temporal x)
(isa? semantic-type :type/UNIXTimestampMilliseconds))
(isa? coercion :Coercion/UNIXMilliSeconds->DateTime))
(t/to-millis-from-epoch (->utc-instant x))

;; convert temporal types to ISODate("2019-12-09T...") (etc.)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,16 +110,16 @@
x)

(defmethod ->rvalue (class Field)
[{semantic-type :semantic_type, :as field}]
[{coercion :coercion_strategy, :as field}]
(let [field-name (str \$ (field->name field "."))]
(cond
(isa? semantic-type :type/UNIXTimestampMicroseconds)
(isa? coercion :Coercion/UNIXMicroSeconds->DateTime)
{:$dateFromParts {:millisecond {$divide [field-name 1000]}, :year 1970}}

(isa? semantic-type :type/UNIXTimestampMilliseconds)
(isa? coercion :Coercion/UNIXMilliSeconds->DateTime)
{:$dateFromParts {:millisecond field-name, :year 1970}}

(isa? semantic-type :type/UNIXTimestampSeconds)
(isa? coercion :Coercion/UNIXSeconds->DateTime)
{:$dateFromParts {:second field-name, :year 1970}}

:else field-name)))
Expand Down
4 changes: 2 additions & 2 deletions modules/drivers/oracle/src/metabase/driver/oracle.clj
Original file line number Diff line number Diff line change
Expand Up @@ -201,11 +201,11 @@
(hx/+ (hsql/raw "timestamp '1970-01-01 00:00:00 UTC'")
(num-to-ds-interval :second field-or-value)))

(defmethod sql.qp/cast-temporal-string [:oracle :type/ISO8601DateTimeString]
(defmethod sql.qp/cast-temporal-string [:oracle :Coercion/ISO8601->DateTime]
[_driver _semantic_type expr]
(hsql/call :to_timestamp expr "YYYY-MM-DD HH:mi:SS"))

(defmethod sql.qp/cast-temporal-string [:oracle :type/ISO8601DateString]
(defmethod sql.qp/cast-temporal-string [:oracle :Coercion/ISO8601->Date]
[_driver _semantic_type expr]
(hsql/call :to_date expr "YYYY-MM-DD"))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,9 @@
:id (mt/id :extsales :buyerid)
:visibility_type :normal
:display_name "Buyer ID"
:base_type :type/Integer}
:base_type :type/Integer
:effective_type :type/Integer
:coercion_strategy nil}
{:description nil
:table_id (mt/id :extsales)
:semantic_type nil
Expand All @@ -135,7 +137,9 @@
:id (mt/id :extsales :salesid)
:visibility_type :normal
:display_name "Sale Sid"
:base_type :type/Integer}]
:base_type :type/Integer
:effective_type :type/Integer
:coercion_strategy nil}]
; in different Redshift instances, the fingerprint on these
; columns is different.
(map #(dissoc % :fingerprint)
Expand Down
6 changes: 3 additions & 3 deletions modules/drivers/sqlite/src/metabase/driver/sqlite.clj
Original file line number Diff line number Diff line change
Expand Up @@ -216,15 +216,15 @@
[_ _ expr]
(->datetime expr (hx/literal "unixepoch")))

(defmethod sql.qp/cast-temporal-string [:sqlite :type/ISO8601DateTimeString]
(defmethod sql.qp/cast-temporal-string [:sqlite :Coercion/ISO8601->DateTime]
[_driver _semantic_type expr]
(->datetime expr))

(defmethod sql.qp/cast-temporal-string [:sqlite :type/ISO8601DateString]
(defmethod sql.qp/cast-temporal-string [:sqlite :Coercion/ISO8601->Date]
[_driver _semantic_type expr]
(->date expr))

(defmethod sql.qp/cast-temporal-string [:sqlite :type/ISO8601TimeString]
(defmethod sql.qp/cast-temporal-string [:sqlite :Coercion/ISO8601->Time]
[_driver _semantic_type expr]
(->time expr))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@
;; Work around this by converting the timestamps to minutes instead before calling DATEADD().
(date-add :minute (hx// expr 60) (hx/literal "1970-01-01")))

(defmethod sql.qp/cast-temporal-string [:sqlserver :type/ISO8601DateTimeString]
(defmethod sql.qp/cast-temporal-string [:sqlserver :Coercion/ISO8601->DateTime]
[_driver _semantic_type expr]
(hx/->datetime expr))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

(mt/defdataset ^:private genetic-data
[["genetic-data"
[{:field-name "gene", :base-type {:native "VARCHAR(MAX)"}}]
[{:field-name "gene", :base-type {:native "VARCHAR(MAX)"}, :effective-type :type/Text}]
[[(a-gene)]]]])

(deftest clobs-should-come-back-as-text-test
Expand Down
73 changes: 71 additions & 2 deletions resources/migrations/000_migrations.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7983,8 +7983,77 @@ databaseChangeLog:
sql: |
ALTER TABLE task_history
MODIFY ended_at timestamp(6) DEFAULT current_timestamp(6) NOT NULL;
- changeSet:
id: 284
author: dpsutton
comment: Added 0.39 - Semantic type system - add effective type
changes:
- addColumn:
tableName: metabase_field
columns:
- column:
name: effective_type
type: varchar(255)
remarks: 'The effective type of the field after any coercions.'
- changeSet:
id: 285
author: dpsutton
comment: Added 0.39 - Semantic type system - add coercion column
changes:
- addColumn:
tableName: metabase_field
columns:
- column:
name: coercion_strategy
type: varchar(255)
remarks: 'A strategy to coerce the base_type into the effective_type.'
- changeSet:
id: 286
author: dpsutton
comment: Added 0.39 - Semantic type system - set effective_type default
changes:
- sql:
sql: UPDATE metabase_field set effective_type = base_type
- changeSet:
id: 287
author: dpsutton
comment: Added 0.39 - Semantic type system - migrate ISO8601 strings
changes:
- sql:
sql: >-
UPDATE metabase_field
SET semantic_type = NULL, -- special type was overriden to provide coercion so no semantic type
effective_type = (CASE semantic_type
WHEN 'type/ISO8601DateTimeString' THEN 'type/DateTime'
WHEN 'type/ISO8601TimeString' THEN 'type/Time'
WHEN 'type/ISO8601DateString' THEN 'type/Date'
END),
coercion_strategy = (CASE semantic_type
WHEN 'type/ISO8601DateTimeString' THEN 'Coercion/ISO8601->DateTime'
WHEN 'type/ISO8601TimeString' THEN 'Coercion/ISO8601->Time'
WHEN 'type/ISO8601DateString' THEN 'Coercion/ISO8601->Date'
END)
WHERE semantic_type IN ('type/ISO8601DateTimeString',
'type/ISO8601TimeString',
'type/ISO8601DateString');
- changeSet:
id: 288
author: dpsutton
comment: Added 0.39 - Semantic type system - migrate unix timestamps
changes:
- sql:
sql: >-
UPDATE metabase_field
set semantic_type = null,
effective_type = 'type/DateTime',
coercion_strategy = (case semantic_type
WHEN 'type/UNIXTimestampSeconds' THEN 'Coercion/UNIXSeconds->DateTime'
WHEN 'type/UNIXTimestampMilliSeconds' THEN 'Coercion/UNIXMilliSeconds->DateTime'
WHEN 'type/UNIXTimestampMicroSeconds' THEN 'Coercion/UNIXMicroSeconds->DateTime'
END)
WHERE semantic_type IN ('type/UNIXTimestampSeconds',
'type/UNIXTimestampMilliSeconds',
'type/UNIXTimestampMicroSeconds')
# >>>>>>>>>> DO NOT ADD NEW MIGRATIONS BELOW THIS LINE! ADD THEM ABOVE <<<<<<<<<<

########################################################################################################################
Expand Down
Binary file modified resources/sample-dataset.db.mv.db
Binary file not shown.
Loading

0 comments on commit 6b8ddc8

Please sign in to comment.