Skip to content

Commit

Permalink
[WIP] CQL Cache
Browse files Browse the repository at this point in the history
  • Loading branch information
alexanderkiel committed Aug 27, 2023
1 parent 82cf8f9 commit 9fecaf8
Show file tree
Hide file tree
Showing 125 changed files with 3,326 additions and 982 deletions.
2 changes: 2 additions & 0 deletions .github/distributed-test/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ services:
DB_CASSANDRA_MAX_CONCURRENT_REQUESTS: "128"
DB_RESOURCE_CACHE_SIZE: "100000"
LOG_LEVEL: "debug"
ENABLE_FRONTEND: "true"
ports:
- "8081:8081"
volumes:
Expand Down Expand Up @@ -111,6 +112,7 @@ services:
DB_CASSANDRA_MAX_CONCURRENT_REQUESTS: "128"
DB_RESOURCE_CACHE_SIZE: "100000"
LOG_LEVEL: "debug"
ENABLE_FRONTEND: "true"
ports:
- "8082:8081"
volumes:
Expand Down
9 changes: 9 additions & 0 deletions .github/scripts/check-patient-as-of-index-missing.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. "$SCRIPT_DIR/util.sh"

BASE="http://localhost:8080/fhir"
curl -s "$BASE/__admin/rocksdb/index/column-families" | jq -r '."column-families"[]' | grep -q "patient-as-of-index"

test "exit code" "$?" "1"
9 changes: 9 additions & 0 deletions .github/scripts/check-patient-as-of-index-state.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash -e

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. "$SCRIPT_DIR/util.sh"

BASE="http://localhost:8080/fhir"
STATE="$(curl -s "$BASE/__admin/db/index/column-families/patient-as-of-index/state" | jq -r .type)"

test "state" "$STATE" "$1"
19 changes: 19 additions & 0 deletions .github/scripts/test-cql-expr-cache-metrics.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash -e

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. "$SCRIPT_DIR/util.sh"

URL="http://localhost:8081/metrics"

num-metrics() {
NAME="$1"
FILTER="$2"
curl -s "$URL" | grep "$NAME" | grep -c "$FILTER"
}

# CQL expression cache is available
test "blaze_cache_estimated_size cql-expr-cache" "$(num-metrics "blaze_cache_estimated_size" "name=\"cql-expr-cache\"")" "1"

# other caches are still available
test "blaze_cache_estimated_size tx-cache" "$(num-metrics "blaze_cache_estimated_size" "name=\"tx-cache\"")" "1"
test "blaze_cache_estimated_size resource-cache" "$(num-metrics "blaze_cache_estimated_size" "name=\"resource-cache\"")" "1"
5 changes: 2 additions & 3 deletions .github/scripts/test-metrics.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,5 @@ test "blaze_rocksdb_block_cache_data_miss index" "$(num-metrics "blaze_rocksdb_b
test "blaze_rocksdb_block_cache_data_miss transaction" "$(num-metrics "blaze_rocksdb_block_cache_data_miss" "name=\"transaction\"")" "1"
test "blaze_rocksdb_block_cache_data_miss resource" "$(num-metrics "blaze_rocksdb_block_cache_data_miss" "name=\"resource\"")" "1"

test "blaze_rocksdb_table_reader_usage_bytes index" "$(num-metrics "blaze_rocksdb_table_reader_usage_bytes" "name=\"index\"")" "14"
test "blaze_rocksdb_table_reader_usage_bytes transaction" "$(num-metrics "blaze_rocksdb_table_reader_usage_bytes" "name=\"transaction\"")" "1"
test "blaze_rocksdb_table_reader_usage_bytes resource" "$(num-metrics "blaze_rocksdb_table_reader_usage_bytes" "name=\"resource\"")" "1"
test "blaze_cache_estimated_size tx-cache" "$(num-metrics "blaze_cache_estimated_size" "name=\"tx-cache\"")" "1"
test "blaze_cache_estimated_size resource-cache" "$(num-metrics "blaze_cache_estimated_size" "name=\"resource-cache\"")" "1"
194 changes: 192 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ jobs:
- anomaly
- async
- byte-buffer
- cache-collector
- cassandra
- coll
- cql
Expand Down Expand Up @@ -259,6 +260,143 @@ jobs:
with:
sarif_file: trivy-results.sarif

cql-expr-cache-test:
needs: build
runs-on: ubuntu-22.04

steps:
- name: Check out Git repository
uses: actions/checkout@v3

- name: Install Blazectl
run: .github/scripts/install-blazectl.sh

- name: Download Blaze Image
uses: actions/download-artifact@v3
with:
name: blaze-image
path: /tmp

- name: Load Blaze Image
run: docker load --input /tmp/blaze.tar

- name: Run Blaze
run: docker run --name blaze -d -e JAVA_TOOL_OPTIONS=-Xmx2g -e ENABLE_FRONTEND=true -e CQL_EXPR_CACHE_SIZE=1000 -p 8080:8080 -p 8081:8081 -v blaze-data:/app/data blaze:latest

- name: Wait for Blaze
run: .github/scripts/wait-for-url.sh http://localhost:8080/health

- name: Docker Logs
run: docker logs blaze

- name: Check Capability Statement
run: .github/scripts/check-capability-statement.sh

- name: Ensure that the State of PatientAsOf Index is Current
run: .github/scripts/check-patient-as-of-index-state.sh current

- name: Load Data
run: blazectl --no-progress --server http://localhost:8080/fhir upload .github/test-data/synthea

- name: Prometheus Metrics
run: .github/scripts/test-cql-expr-cache-metrics.sh

- name: Check Total-Number of Resources are 92114
run: .github/scripts/check-total-number-of-resources.sh 92114

- name: Evaluate CQL Query 1
run: .github/scripts/evaluate-measure.sh q1 56

- name: Evaluate CQL Query 1 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q1 56

- name: Evaluate CQL Query 1 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q1 56

- name: Evaluate CQL Query 1 on Individual Patients
run: .github/scripts/evaluate-patient-q1-measure.sh

- name: Evaluate CQL Query 2
run: .github/scripts/evaluate-measure.sh q2 42

- name: Evaluate CQL Query 2 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q2 42

- name: Evaluate CQL Query 2 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q2 42

- name: Evaluate CQL Query 4
run: .github/scripts/evaluate-measure.sh q4 0

- name: Evaluate CQL Query 4 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q4 0

- name: Evaluate CQL Query 4 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q4 0

- name: Evaluate CQL Query 7
run: .github/scripts/evaluate-measure.sh q7 81

- name: Evaluate CQL Query 7 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q7 81

- name: Evaluate CQL Query 7 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q7 81

- name: Evaluate CQL Query 14
run: .github/scripts/evaluate-measure.sh q14 96

- name: Evaluate CQL Query 14 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q14 96

- name: Evaluate CQL Query 14 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q14 96

- name: Evaluate CQL Query 17
run: .github/scripts/evaluate-measure.sh q17 120

- name: Evaluate CQL Query 17 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q17 120

- name: Evaluate CQL Query 17 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q17 120

- name: Evaluate CQL Query 20 using Blazectl
run: .github/scripts/evaluate-measure-blazectl-stratifier.sh q20-stratifier-city 120

- name: Evaluate CQL Query 21 using Blazectl
run: .github/scripts/evaluate-measure-blazectl-stratifier.sh q21-stratifier-city-of-only-women 64

- name: Evaluate CQL Query 26 using Blazectl
run: .github/scripts/evaluate-measure-blazectl-stratifier.sh q26-stratifier-bmi 120

- name: Evaluate CQL Query 27 using Blazectl
run: .github/scripts/evaluate-measure-blazectl-stratifier.sh q27-stratifier-calculated-bmi 120

- name: Evaluate CQL Query 32 using Blazectl
run: .github/scripts/evaluate-measure-blazectl-stratifier.sh q32-stratifier-underweight 120

- name: Evaluate CQL Query 36
run: .github/scripts/evaluate-measure.sh q36-parameter 86

- name: Evaluate CQL Query 36 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q36-parameter 86

- name: Evaluate CQL Query 34
run: .github/scripts/evaluate-measure.sh q37-overlaps 24

- name: Evaluate CQL Query 34 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q37-overlaps 24

- name: Evaluate CQL Query 34 - Subject List
run: .github/scripts/evaluate-measure-subject-list.sh q37-overlaps 24

- name: Evaluate CQL Query 46
run: .github/scripts/evaluate-measure.sh q46-between-date 19

- name: Evaluate CQL Query 46 using Blazectl
run: .github/scripts/evaluate-measure-blazectl.sh q46-between-date 19

integration-test:
needs: build
runs-on: ubuntu-22.04
Expand All @@ -280,7 +418,7 @@ jobs:
run: docker load --input /tmp/blaze.tar

- name: Run Blaze
run: docker run --name blaze -d -e JAVA_TOOL_OPTIONS=-Xmx2g -p 8080:8080 -p 8081:8081 -v blaze-data:/app/data blaze:latest
run: docker run --name blaze -d -e JAVA_TOOL_OPTIONS=-Xmx2g -e ENABLE_FRONTEND=true -p 8080:8080 -p 8081:8081 -v blaze-data:/app/data blaze:latest

- name: Wait for Blaze
run: .github/scripts/wait-for-url.sh http://localhost:8080/health
Expand All @@ -294,6 +432,9 @@ jobs:
- name: Check Referential Integrity Enforced
run: .github/scripts/check-referential-integrity-enforced.sh

- name: Ensure that the State of PatientAsOf Index is Current
run: .github/scripts/check-patient-as-of-index-state.sh current

- name: Load Data
run: blazectl --no-progress --server http://localhost:8080/fhir upload .github/test-data/synthea

Expand Down Expand Up @@ -1137,6 +1278,50 @@ jobs:
- name: Fetch Patient Expecting an Error
run: .github/scripts/fetch-resource-0-with-missing-resource-content.sh

build-patient-as-of-index-test:
needs: build
runs-on: ubuntu-22.04

steps:
- name: Check out Git repository
uses: actions/checkout@v3

- name: Install Blazectl
run: .github/scripts/install-blazectl.sh

- name: Download Blaze Image
uses: actions/download-artifact@v3
with:
name: blaze-image
path: /tmp

- name: Load Blaze Image
run: docker load --input /tmp/blaze.tar

- name: Run Blaze v0.22
run: docker run --name blaze -d -e JAVA_TOOL_OPTIONS=-Xmx2g -e ENABLE_FRONTEND=true -p 8080:8080 -v blaze-data:/app/data samply/blaze:0.22

- name: Wait for Blaze
run: .github/scripts/wait-for-url.sh http://localhost:8080/health

- name: Load Data
run: blazectl --no-progress --server http://localhost:8080/fhir upload .github/test-data/synthea

- name: Ensure that the PatientAsOf Index does not exist
run: .github/scripts/check-patient-as-of-index-missing.sh

- name: Shut down Blaze
run: docker stop blaze && docker rm blaze

- name: Run Latest Blaze
run: docker run --name blaze -d -e JAVA_TOOL_OPTIONS=-Xmx2g -e ENABLE_FRONTEND=true -e LOG_LEVEL=debug -p 8080:8080 -v blaze-data:/app/data blaze:latest

- name: Wait for Blaze
run: .github/scripts/wait-for-url.sh http://localhost:8080/health

- name: Ensure that the State of PatientAsOf Index is Current
run: .github/scripts/check-patient-as-of-index-state.sh current

distributed-test:
needs: build
runs-on: ubuntu-22.04
Expand Down Expand Up @@ -1193,6 +1378,9 @@ jobs:
- name: Check Referential Integrity Enforced
run: .github/scripts/check-referential-integrity-enforced.sh

- name: Ensure that the State of PatientAsOf Index is Current
run: .github/scripts/check-patient-as-of-index-state.sh current

- name: Load Data
run: blazectl --no-progress --server http://localhost:8080/fhir upload .github/test-data/synthea

Expand Down Expand Up @@ -1613,6 +1801,7 @@ jobs:
needs:
- build
- image-scan
- cql-expr-cache-test
- integration-test
- not-enforcing-referential-integrity-test
- small-transactions-test
Expand All @@ -1623,13 +1812,14 @@ jobs:
- bundle-with-references-test
- jepsen-test
- openid-auth-test
- custom-search-parameters-test
- doc-copy-data-test
- big-binary-test
- frontend-test
- missing-resource-content-test
- build-patient-as-of-index-test
- distributed-test
- jepsen-distributed-test
- custom-search-parameters-test
runs-on: ubuntu-22.04
permissions:
packages: write
Expand Down
7 changes: 6 additions & 1 deletion dev/blaze/dev.clj
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
(ns blaze.dev
(:require
[blaze.byte-string :as bs]
[blaze.cache-collector.protocols :as ccp]
[blaze.db.api :as d]
[blaze.db.api-spec]
[blaze.db.cache-collector.protocols :as ccp]
[blaze.db.resource-cache :as resource-cache]
[blaze.db.resource-store :as rs]
[blaze.db.tx-log :as tx-log]
Expand Down Expand Up @@ -66,6 +66,11 @@
(resource-cache/invalidate-all! (:blaze.db/resource-cache system))
)

;; CQL Expression Cache
(comment
(str (ccp/-stats (:blaze.fhir.operation.evaluate-measure/expr-cache system)))
)

;; RocksDB Stats
(comment
(.reset (system [:blaze.db.kv.rocksdb/stats :blaze.db.index-kv-store/stats]))
Expand Down
1 change: 1 addition & 0 deletions docs/deployment/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ More information about distributed deployment are available [here](distributed.m
| ENFORCE_REFERENTIAL_INTEGRITY | true | v0.14 || Enforce referential integrity on resource create, update and delete. |
| DB_SYNC_TIMEOUT | 10000 | v0.15 || Timeout in milliseconds for all reading FHIR interactions acquiring the newest database state. |
| DB_SEARCH_PARAM_BUNDLE || v0.21 || Name of a custom search parameter bundle file. |
| CQL_EXPR_CACHE_SIZE || v0.23 || Size of the CQL expression cache. Will be disabled if not given. |

¹ Deprecated

Expand Down
40 changes: 40 additions & 0 deletions docs/implementation/cql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# CQL

## Expression Cache

* bloom filter
* the set we like to build is the set of expressions returning true
* if the bloom filter returns false, we can be sure that the expression is not in the set of expressions returning true, so it will certainly return false
* the number of expressions returning true is far less then the number of expressions returning false
* we'll fill the Bloom filter with the expressions that returned true
* the problem is the following
* if we don't have filled the filter with all expression, the answer we get has no value
* we don't know whether the expression isn't in the set because we didn't put it there or because if returned false
* but we could use two filters
* one for all expressions returning true and one for all returning false
* if the expression isn't in both filters, it is new and so we have the evaluate it
* we could use a bloom filter for each expression
* then we would see whether we have a bloom filter or not
* we would insert Patients for which the expression returns true
* the first query evaluation is only used for insertion
* after that we mark the filter as ready for query
* now we can determine whether a expression is not true for a certain Patient
*

* we'll use one Bloom filter per expression
* that Bloom filters will be stored in a Caffeine cache by expression hash
* each Bloom filter will be assigned the t of its creation
* the Bloom filters of all expressions of a query will be collected at the start of the query evaluation
* if a Bloom filter isn't found, its calculation will be queued and carried out asynchronously
* existing Bloom filters are immutable and will be used in query evaluation
* the Patient ID will be used to test whether this Patient isn't in the Bloom filter
* if the Patient ID wasn't found, the expression will return false
* if the Patient ID is found, the expression will be evaluated normally

### Bloom Filter Calculation

* the Bloom filter will be calculated for a particular exists expression
* it will be calculated based on a database with a particular t
* that t will be assigned to the Bloom filter
* the calculation will evaluate the expression for each patent of the database
* the ID's of Patients for which the expression returns true will be put into the Bloom filter
Loading

0 comments on commit 9fecaf8

Please sign in to comment.