diff --git a/.gitignore b/.gitignore index 24d46f36250ee..abeee60f02aa7 100644 --- a/.gitignore +++ b/.gitignore @@ -25,6 +25,7 @@ /.lein-repl-history /.sqlite.db /backend-checksums.txt +/backend-docs/uberdoc.html /build.xml /checkouts /classes @@ -32,7 +33,6 @@ /crate-* /cypress /deploy/artifacts/* -/docs/uberdoc.html /frontend/test/snapshots/* /e2e/snapshots/* /lein-plugins/*/target diff --git a/deps.edn b/deps.edn index 8755843e202d5..31dc6346258d5 100644 --- a/deps.edn +++ b/deps.edn @@ -283,8 +283,8 @@ {:extra-paths ["enterprise/backend/src"]} ;; Include EE tests. - ;; for ee dev: :dev:ee:ee-dev - ;; for ee tests: clojure -X:dev:ee:ee-dev:test + ;; for EE dev: `clojure -X:dev:ee:ee-dev` + ;; for EE tests: `clojure -X:dev:ee:ee-dev:test` :ee-dev {:extra-paths ["enterprise/backend/test"]} @@ -295,6 +295,15 @@ :oss-dev {} + ;; Generate BE documentation with + ;; clojure -M:marginalia + :marginalia + {:extra-deps + {com.github.tsmacdonald/marginalia {:mvn/version "0.9.2"}} + :main-opts ["-m" "marginalia.main" "-n" "Metabase" "-d" "backend-docs" "-D" + "The simplest, fastest way to get business intelligence and analytics to everyone in your company 😋" + "dev" "src" "shared/src" "enterprise/backend/src"]} + ;; Find outdated versions of dependencies. Run with `clojure -M:outdated` :outdated {;; Note that it is `:deps`, not `:extra-deps` :deps {com.github.liquidz/antq {:mvn/version "RELEASE"}} diff --git a/dev/src/dev.clj b/dev/src/dev.clj index e62d5dcb543ba..98ac6d26424c4 100644 --- a/dev/src/dev.clj +++ b/dev/src/dev.clj @@ -1,3 +1,37 @@ +;; # Metabase Backend Developer Documentation +;; +;; Welcome to Metabase! Here are links to useful resources. +;; +;; ## Project Management +;; +;; - [Engineering and Product Playbook](https://www.notion.so/metabase/Engineering-and-Product-Playbook-cd4bc1c0b8744470bebc0b979f8f5268) +;; - [Weekly Tactical Board: how to](https://www.notion.so/metabase/Weekly-Tactical-Board-how-to-6e81f994a792493ba7ae430f2afa1673) +;; - [The Escalations Process](https://www.notion.so/Escalating-a-bug-b876f78c801345f3bda8504d4a63ba80) +;; +;; ## Dev Environment +;; +;; - [Getting started with backend development](https://github.com/metabase/metabase/blob/master/docs/developers-guide/devenv.md#backend-development) +;; - [Additional notes on using tools.deps](https://github.com/metabase/metabase/wiki/Migrating-from-Leiningen-to-tools.deps) +;; - [Other tips](https://github.com/metabase/metabase/wiki/Metabase-Backend-Dev-Secrets) +;; +;; ## Important Parts of the Codebase +;; +;; - [API Endpoints](file:///home/tmacdonald/src/metabase/backend-docs/uberdoc.html#metabase.api.common) +;; - [Drivers](#metabase.driver) +;; - [Permissions](#metabase.models.permissions) +;; - [The Query Processor](#metabase.query-processor) +;; - [Application Settings](#metabase.models.setting) +;; +;; ## Important Libraries +;; +;; - [Toucan 2](https://github.com/camsaul/toucan2/) to work with models +;; - [Honey SQL](https://github.com/seancorfield/honeysql) (version 2) for SQL queries +;; - [Liquibase](https://docs.liquibase.com/concepts/changelogs/changeset.html) for database migrations +;; - [Compojure](https://github.com/weavejester/compojure) on top of [Ring](https://github.com/ring-clojure/ring) for our API +;; +;;
+ + (ns dev "Put everything needed for REPL development within easy reach" (:require diff --git a/enterprise/backend/src/metabase_enterprise/core.clj b/enterprise/backend/src/metabase_enterprise/core.clj index 42b0b998ca029..f96301cbfc998 100644 --- a/enterprise/backend/src/metabase_enterprise/core.clj +++ b/enterprise/backend/src/metabase_enterprise/core.clj @@ -1,2 +1,9 @@ +;; Unless otherwise noted, all files © 2023 Metabase, Inc. +;; +;; Source code in this repository is variously licensed under the GNU Affero General Public License (AGPL), or the +;; [Metabase Commercial License](https://www.metabase.com/license/commercial). +;; +;;
+ (ns metabase-enterprise.core "Empty namespace. This is here solely so we can try to require it and see whether or not EE code is on the classpath.") diff --git a/src/metabase/api/common.clj b/src/metabase/api/common.clj index 5bf8a02f18a2b..6fb874edfab13 100644 --- a/src/metabase/api/common.clj +++ b/src/metabase/api/common.clj @@ -1,3 +1,41 @@ +;; # API Endpoints at Metabase +;; +;; We use a custom macro called `defendpoint` for defining all endpoints. It's best illustrated with an example: +;; +;;

+;; (ns metabase.api.dashboard ...)
+;;
+;; (api/defendpoint GET "/"
+;;  "Get `Dashboards`. With filter option `f`..."
+;;  [f]
+;;  {f [:maybe [:enum "all" "mine" "archived"]]}
+;;  (let ...))
+;;
+;;  ; ...
+;;
+;; (api/define-routes)
+;; 
+;; +;; As you can see, the arguments are: +;; +;; * **The HTTP verb.** (`GET`, `PUT`, `POST`, etc) +;; * **The route.** This will automatically have `api` and the namespace prefixed to it, so in this case `"/"` is defining +;; the route for `/api/dashboard/`. +;; * **A docstring.** Apart from being helpful to us, this is used for API documentation for third-party devs, so please +;; be thorough! +;; * **A schema.** This uses [Malli's vector syntax](https://github.com/metosin/malli#vector-syntax). This is documented +;; on Malli's page, of course, but we also have some of our own schemas defined. Start by looking in +;; [`metabase.util.malli.schema`](#metabase.util.malli.schema) +;; * **The parameters.** This uses Compojure's +;; [destructuring syntax](https://github.com/weavejester/compojure/wiki/Destructuring-Syntax) (a superset of Clojure's +;; normal destructuring syntax). +;; * **The actual code for the endpoint.** The returned value could be one of several types. The Right Thing (such as +;; converting to JSON or setting an appropriate status code) usually happens by default. Consult +;; [Compojure's documentation](https://github.com/weavejester/compojure/blob/master/src/compojure/response.clj), +;; but it may be more instructive to look at examples in our codebase. +;; +;;
+ (ns metabase.api.common "Dynamic variables and utility functions/macros for writing API functions." (:require diff --git a/src/metabase/driver/common/parameters/parse.clj b/src/metabase/driver/common/parameters/parse.clj index 688c1f50f7d30..68b47cf4f276c 100644 --- a/src/metabase/driver/common/parameters/parse.clj +++ b/src/metabase/driver/common/parameters/parse.clj @@ -156,7 +156,7 @@ "Attempts to parse parameters in string `s`. Parses any optional clauses or parameters found, and returns a sequence of non-parameter string fragments (possibly) interposed with `Param` or `Optional` instances. - If handle-sql-comments is true (default) then we make a best effort to ignore params in SQL comments." + If `handle-sql-comments` is true (default) then we make a best effort to ignore params in SQL comments." ([s :- s/Str] (parse s true)) ([s :- s/Str, handle-sql-comments :- s/Bool] diff --git a/src/metabase/search/scoring.clj b/src/metabase/search/scoring.clj index ddbb162b6e00b..3b7d662320b6e 100644 --- a/src/metabase/search/scoring.clj +++ b/src/metabase/search/scoring.clj @@ -1,4 +1,115 @@ +;; # How does search scoring work? +;; +;; _This was written for a success engineer, but may be helpful here, too._ +;; +;; Most of what you care about happens in the `scoring.clj` file [here](https://github.com/metabase/metabase/blob/master/src/metabase/search/scoring.clj). +;; +;; We have two sets of scorers. The first is based on the literal text matches and defined [here](https://github.com/metabase/metabase/blob/8d5f5db02c84899a053e20468986050b2034a9a4/src/metabase/search/scoring.clj#L132C1-L137): +;; +;;

+;; (def ^:private match-based-scorers
+;;   [{:scorer exact-match-scorer :name "exact-match" :weight 4}
+;;    {:scorer consecutivity-scorer :name "consecutivity" :weight 2}
+;;    {:scorer total-occurrences-scorer :name "total-occurrences" :weight 2}
+;;    {:scorer fullness-scorer :name "fullness" :weight 1}
+;;    {:scorer prefix-scorer :name "prefix" :weight 1}])
+;; 
+;; +;; * The `exact-match-scorer` gives points for exact matches. So if you search `foo` it'll score well for `foo +;; collection` but not `my favorite foods`. Everything else counts partial matches +;; +;; * `consecutivity-scorer` gives points for a sequence of matching words. So if you search `four five six seven` +;; it'll score well for `one two three four five six seven eight` and 0 for `eight seven six five four three two +;; one`. +;; +;; * `total-occurrences-scorer` gives points for the number of tokens that show up in the search result. So if you +;; search for `foo bar` it'll score better for `Admiral Akbar's Food Truck` (2; note that `akbar` and `food` count +;; as matches even though it's not exact) than for `foo collection` (1; being an exact match doesn't matter. That's +;; why we have the `exact-match-scorer`). +;; +;; * `fullness-scorer` is sort of the opposite of that: it gives points for how much of the result is "covered" by the +;; search query. So if you search `foo bar` then `Barry's Food` will have a perfect fullness score and `Barry's +;; Dashboard Of Favorite Bars, Restaurants, and Food Trucks` will score poorly since only 3/9 of the dashboard's +;; title is covered by the search query. Why 3? `bar` matches both `Barry's` and `Bars`. +;; +;; * `prefix-scorer` gives points for an exact prefix match. So if you search for `foo bar` then `foo collection` will +;; have a good prefix score (4/24: `foo ` matches), `Food trucks I love` will have a worse one (3/18), and +;; `top 10 foo bars` will be zero. +;; +;; +;; These are all weighted: you can see that the exact-match scorer is responsible for 4/10 of the score, the consecutivity one is 2/10, etc. +;; +;; The second set of scorers is defined lower down, +;; [here](https://github.com/metabase/metabase/blob/8d5f5db02c84899a053e20468986050b2034a9a4/src/metabase/search/scoring.clj#L215-L222): +;; +;;

+;; (defn weights-and-scores
+;;   "Default weights and scores for a given result."
+;;   [result]
+;;   [{:weight 2 :score (pinned-score result) :name "pinned"}
+;;    {:weight 2 :score (bookmarked-score result) :name "bookmarked"}
+;;    {:weight 3/2 :score (recency-score result) :name "recency"}
+;;    {:weight 1 :score (dashboard-count-score result) :name "dashboard"}
+;;    {:weight 1/2 :score (model-score result) :name "model"}])
+;; 
+;; +;; And there are two more for Enterprise +;; [here](https://github.com/metabase/metabase/blob/8d5f5db02c84899a053e20468986050b2034a9a4/enterprise/backend/src/metabase_enterprise/search/scoring.clj#L27-L33): +;; +;;

+;; (premium-features/has-feature? :official-collections)
+;;     (conj {:weight 2
+;;             :score  (official-collection-score result)
+;;             :name   "official collection score"})
+;;     (premium-features/has-feature? :content-verification)
+;;     (conj {:weight 2
+;;            :score  (verified-score result)
+;;            :name   "verified"})))
+;; 
+;; +;; These are easier to explain: you get points if the search result is pinned (yes or no), bookmarked (yes or no), how +;; recently it was updated (sliding value between 1 (edited just now) and 0 (edited [180+ +;; days](https://github.com/metabase/metabase/blob/8d5f5db02c84899a053e20468986050b2034a9a4/src/metabase/search/config.clj#L29-L32) +;; ago), how many dashboards it appears in (sliding value between 0 (zero dashboards) and 1 ([50+ +;; dashboards](https://github.com/metabase/metabase/blob/8d5f5db02c84899a053e20468986050b2034a9a4/src/metabase/search/config.clj#L34-L36)) +;; and it's type (`model-score`): the earlier a type appears in [this +;; list](https://github.com/metabase/metabase/blob/8d5f5db02c84899a053e20468986050b2034a9a4/src/metabase/search/config.clj#L55-L58) +;; the higher score it gets: +;; +;; ["dashboard" "metric" "segment" "indexed-entity" "card" "dataset" "collection" "table" "action" "database"] +;; +;; On the EE side, we also give points if something's an official collection and if it's verified. +;; +;; Finally, what we actually search is defined in the search +;; config [here](https://github.com/metabase/metabase/blob/8d5f5db02c84899a053e20468986050b2034a9a4/src/metabase/search/config.clj#L73-L109), +;; but the short answer is "the name and, if there is one, the description". We used to search raw SQL queries for +;; cards, but that got turned off recently (but I've seen chat about turning it back on). +;; +;; ❦ +;; +;; So, these 12 scorers are weighted and combined together, and the grand total affects search order. If this sounds a +;; little complicated…it is! It also means that it can be tricky to give a proper answer about why the search ranking +;; is "wrong", maybe you search for `monthly revenue` and are looking for a card called `monthly revenue` and are mad +;; that a dashboard called `company stats` shows up first…but then it turns out that the dashboard's description is +;; `Stats that everyone should be aware of, such as our order count and monthly revenue.` and the dashboard happens to +;; be pinned, bookmarked, part of an official collection, verified, and edited a couple hours ago…whereas the card is +;; none of those things. +;; +;; Also, be aware that as of October 2023 there's [a big epic under +;; way](https://github.com/metabase/metabase/issues/27982) to add filtering to search results, which should help +;; people find what they're looking for (and spares us from having to make the above algorithm better). +;; +;;
+ (ns metabase.search.scoring + "Computes a relevancy score for search results using the weighted average of various scorers. Scores are determined by + various ways of comparing the text of the search string and the item's title or description, as well as by + Metabase-specific features such as how many dashboards a card appears in or whether an item is pinned. + + Get the score for a result with `score-and-result`, and efficiently get the most relevant results with + `top-results`. + + Some of the scorers can be tweaked with configuration in [[metabase.search.config]]." (:require [clojure.string :as str] [java-time.api :as t]