Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/mx-1381 rework database model #25

Merged
merged 69 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
2a4badc
Some clean up
cutoffthetop Jan 16, 2024
d62dedd
Implement graph id provider
cutoffthetop Jan 16, 2024
59e584e
Remove docs
cutoffthetop Jan 16, 2024
0bcac39
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Jan 18, 2024
5af07fa
Clean up enum, add tests, bump versions
cutoffthetop Jan 19, 2024
d4b567e
Update expectation
cutoffthetop Jan 19, 2024
5aab81a
Changelog
cutoffthetop Jan 19, 2024
514d81e
Changelog
cutoffthetop Jan 19, 2024
a341a83
Poetry update
cutoffthetop Jan 19, 2024
feed73c
Stop inline nested and model as nodes instead
cutoffthetop Jan 26, 2024
3deea4c
WIP
cutoffthetop Feb 7, 2024
6678e82
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 7, 2024
ff652d1
Cruft update
cutoffthetop Feb 7, 2024
b35cf1d
Fix tests
cutoffthetop Feb 7, 2024
f9751a9
Merge branch 'feature/mx-1533-graph-id-provider' into feature/mx-1381…
cutoffthetop Feb 8, 2024
e27bddc
Fixing tests
cutoffthetop Feb 8, 2024
e73fec2
Rewrite using jinja
cutoffthetop Feb 14, 2024
61d0902
Create field lists nicer
cutoffthetop Feb 14, 2024
798524c
Add edge pruning
cutoffthetop Feb 14, 2024
55a2b48
Elevate query testing
cutoffthetop Feb 15, 2024
14165f0
Update tests
cutoffthetop Feb 15, 2024
f9376f2
Polishing and version bumps
cutoffthetop Feb 15, 2024
d4de1d1
Merge branch 'main' into feature/mx-1381-prep-rule-endpoint
cutoffthetop Feb 19, 2024
3c6809b
Update lock
cutoffthetop Feb 19, 2024
c8250c3
Set id provider for integration testing
cutoffthetop Feb 19, 2024
9fee560
Fix connector test
cutoffthetop Feb 19, 2024
1ae565b
Add query readme and docs
cutoffthetop Feb 19, 2024
dce0172
Add example to arg
cutoffthetop Feb 20, 2024
4452c6e
Fix tests and update common
cutoffthetop Feb 21, 2024
f53e4be
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 21, 2024
be9bfc8
Rename query_nodes to fetch_extracted_data
cutoffthetop Feb 21, 2024
bd11479
No need to stringify identifier
cutoffthetop Feb 21, 2024
e34f35e
Simplify merge node gc query
cutoffthetop Feb 21, 2024
83702b2
Simplify query builder teardown
cutoffthetop Feb 21, 2024
c2f2116
Update cruft and fix linting
cutoffthetop Feb 21, 2024
b7f556c
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 22, 2024
0544134
Add random pytest order
cutoffthetop Feb 22, 2024
db73b46
Remove id-provider env var
cutoffthetop Feb 22, 2024
26a4375
Rename Result.update_counters and add tests
cutoffthetop Feb 22, 2024
6167888
Simplify refs match clause
cutoffthetop Feb 22, 2024
7ead1fc
More speaking query variables
cutoffthetop Feb 22, 2024
038c69d
Add doc to merge_node query
cutoffthetop Feb 22, 2024
42a94bb
Re-create index if there were changes to searchable classes and fields
cutoffthetop Feb 22, 2024
05bb00c
Update docs, make merge_node/edges private
cutoffthetop Feb 22, 2024
c316465
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Feb 27, 2024
e0530d8
Update cruft and deps
cutoffthetop Feb 27, 2024
fd65b57
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 5, 2024
d37df78
Update cruft 12165319453990fdbe02bce39a3236337e298bc0
cutoffthetop Mar 5, 2024
d52f08d
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 15, 2024
7fbac89
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 15, 2024
489c242
Reduce diff
cutoffthetop Mar 15, 2024
71d44c2
Update docstring
cutoffthetop Mar 27, 2024
0c531b6
Update versions
cutoffthetop Mar 27, 2024
a4b2e36
Update uvicorn and neo4j
cutoffthetop Mar 27, 2024
3fe1882
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Mar 27, 2024
0b957e8
Use speaking names for gc
cutoffthetop Mar 28, 2024
e52136b
Add APOC example
cutoffthetop Apr 2, 2024
fd09bc3
Remove redundant label filter
cutoffthetop Apr 2, 2024
dbb1e51
Add annotated test case
cutoffthetop Apr 2, 2024
4a2f5ef
Remove local import
cutoffthetop Apr 2, 2024
c7e6b08
Merge branch 'main' of https://github.com/robert-koch-institut/mex-ba…
cutoffthetop Apr 2, 2024
c004b98
Update changelog
cutoffthetop Apr 2, 2024
98cce8d
update versions
cutoffthetop Apr 2, 2024
32191a4
Rename to _contains_only_types
cutoffthetop Apr 2, 2024
febd447
Fix docstring
cutoffthetop Apr 3, 2024
3b83109
Expand docstrings
cutoffthetop Apr 3, 2024
871fba3
Ensure lifespan is called
cutoffthetop Apr 3, 2024
3fb35d5
Fix test isolation and close coverage gaps
cutoffthetop Apr 3, 2024
0e4daea
Merge branch 'main' into feature/mx-1381-prep-rule-endpoint
cutoffthetop Apr 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ default_language_version:
python: python3.11
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.2
rev: v0.3.5
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/psf/black
rev: 24.2.0
rev: 24.3.0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changes

- re-implemented queries as templated cql files
- updated graph connector for new queries
- improved isolation of neo4j dependency
- improved documentation and code-readability

### Deprecated

### Removed

- trashed hydration module

### Fixed

### Security
Expand Down
11 changes: 6 additions & 5 deletions mex/backend/fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def _get_inner_types(annotation: Any) -> Generator[type, None, None]:
yield annotation


def _has_any_type(field: FieldInfo, *types: type) -> bool:
def _contains_only_types(field: FieldInfo, *types: type) -> bool:
"""Return whether a field is annotated as one of the given types.

Lists and unions with `NoneType` are allowed and only the non-`NoneType` annotation
Expand Down Expand Up @@ -71,24 +71,25 @@ def _group_fields_by_class_name(
# fields typed as merged identifiers containing references to merged items
REFERENCE_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _has_any_type(field_info, *MERGED_IDENTIFIER_CLASSES),
lambda field_info: _contains_only_types(field_info, *MERGED_IDENTIFIER_CLASSES),
)

# nested fields that contain `Text` objects
TEXT_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _has_any_type(field_info, Text),
lambda field_info: _contains_only_types(field_info, Text),
)

# nested fields that contain `Link` objects
LINK_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _has_any_type(field_info, Link),
lambda field_info: _contains_only_types(field_info, Link),
)

# fields annotated as `str` type
STRING_FIELDS_BY_CLASS_NAME = _group_fields_by_class_name(
EXTRACTED_MODEL_CLASSES_BY_NAME, lambda field_info: _has_any_type(field_info, str)
EXTRACTED_MODEL_CLASSES_BY_NAME,
lambda field_info: _contains_only_types(field_info, str),
)

# fields that should be indexed as searchable fields
Expand Down
10 changes: 4 additions & 6 deletions mex/backend/graph/connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from mex.backend.graph.transform import (
expand_references_in_search_result,
)
from mex.backend.settings import BackendSettings
from mex.backend.transform import to_primitive
from mex.common.connector import BaseConnector
from mex.common.exceptions import MExError
Expand Down Expand Up @@ -55,9 +56,6 @@ def __init__(self) -> None:

def _init_driver(self) -> Driver:
"""Initialize and return a database driver."""
# break import cycle, sigh
from mex.backend.settings import BackendSettings

settings = BackendSettings.get()
return GraphDatabase.driver(
settings.graph_url,
Expand Down Expand Up @@ -219,10 +217,10 @@ def _merge_node(self, model: AnyExtractedModel) -> Result:
All nested properties (like Text or Link) are created as their own nodes
and linked via edges. For multi-valued fields, the position of each nested
object is stored as a property on the outbound edge.
Any nested objects that are found in the graph, gut are not present on the
Any nested objects that are found in the graph, but are not present on the
model any more are purged.
In addition, a merged item is created (if it does not exist yet) and linked
to the extracted item via an edge of the label `stableTargetId`.
In addition, a merged item is created (if it does not exist yet) and the
extracted is linked it via an edge of the label `stableTargetId`.

Args:
model: Model to merge into the graph as a node
Expand Down
4 changes: 2 additions & 2 deletions mex/backend/graph/cypher/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ For example, a new model class or changing model fields are automatically handle
and don't require rewriting any cypher query.

Some of these use-cases could be covered by neo4j's [APOC](https://neo4j.com/labs/apoc/)
extensions. However, APOC is not included in the official neo4j base image.
So to keep deployment simple for now, the use of APOC was avoided.
add-on (e.g. `expand_references_in_search_result`). However, APOC is not included in the
official neo4j docker image. So, to keep deployment simple, the use of APOC was avoided.

Contrary to the jinja default tags that are centered around curly braces, we use
less/greater signs that do not collide with cypher syntax that often.
Expand Down
2 changes: 1 addition & 1 deletion mex/backend/graph/cypher/fetch_identities.cql
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Returns:
List of identity objects.
-#>
MATCH (n:<<extracted_labels|join("|")>>)-[:stableTargetId]->(merged:<<merged_labels|join("|")>>)
MATCH (n:<<extracted_labels|join("|")>>)-[:hadPrimarySource]->(primary_source:MergedPrimarySource)
MATCH (n)-[:hadPrimarySource]->(primary_source:MergedPrimarySource)
<%- if filter_by_had_primary_source or filter_by_identifier_in_primary_source or filter_by_stable_target_id %>
WHERE
<%- set and_ = joiner("AND ") -%>
Expand Down
8 changes: 4 additions & 4 deletions mex/backend/graph/cypher/merge_edges.cql
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ CALL {
WITH source, collect(edge) as edges
CALL {
WITH source, edges
MATCH (source)-[gc]->(:<<merged_labels|join("|")>>)
WHERE NOT gc IN edges
DELETE gc
RETURN count(gc) as pruned
MATCH (source)-[outdated_edge]->(:<<merged_labels|join("|")>>)
WHERE NOT outdated_edge IN edges
DELETE outdated_edge
RETURN count(outdated_edge) as pruned
}
RETURN count(edges) as merged, pruned, edges;
8 changes: 4 additions & 4 deletions mex/backend/graph/cypher/merge_node.cql
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ WITH extracted,
[<<range(nested_edge_labels|count)|map("ensure_prefix", "value_")|join(", ")>>] as values
CALL {
WITH extracted, values
MATCH (extracted)-[]->(gc:<<nested_labels|join("|")>>)
WHERE NOT gc IN values
DETACH DELETE gc
RETURN count(gc) as pruned
MATCH (extracted)-[]->(outdated_node:<<nested_labels|join("|")>>)
WHERE NOT outdated_node IN values
DETACH DELETE outdated_node
RETURN count(outdated_node) as pruned
}
RETURN extracted, edges, values, pruned;
4 changes: 2 additions & 2 deletions mex/backend/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@

from fastapi.encoders import jsonable_encoder

from mex.common.types import Identifier, Timestamp
from mex.common.types import Identifier, TemporalEntity

JSON_ENCODERS = {
Enum: lambda obj: obj.value,
Identifier: lambda obj: str(obj),
Timestamp: lambda obj: str(obj),
TemporalEntity: lambda obj: str(obj),
}


Expand Down
Loading
Loading