All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- BREAKING: move ItemsContainer and PaginatedItemsContainer to mex.common.models
- BREAKING: replace post_extracted_items with ingest and allow AnyRuleSetResponses
- allow AnyRuleSetResponses as arguments to sinks
- BREAKING: sinks now yield the models they loaded, instead of just their identifiers
- update mex-model to 3.5.1
- fix regex pattern for GndIdStr in organization models
- do not wrap field types in
setValues
in mapping rules in another list
- reduce Filter classes to a single list field of
FilterField
items
- new (partially generic) classes for defining Mapping and Filter fields and rules
- BREAKING: replaced dynamic Mapping and Filter classes with static ones
- use FILTER_MODEL_CLASSES_BY_NAME instead of FILTER_MODEL_BY_EXTRACTED_CLASS_NAME
- use MAPPING_MODEL_CLASSES_BY_NAME instead of MAPPING_MODEL_BY_EXTRACTED_CLASS_NAME
- add a sink registry with
register_sink
andget_sink
functions - add a multi-sink implementation, akin to
mex.extractors.load
- BREAKING: convert post_to_backend_api to BackendApiSink
- BREAKING: convert write_ndjson to NdjsonSink
- backend and ndjson sinks log progress only in batches
- increase timeout and decrease chunk size for backend API sink
- port backend identity provider implementation from editor/extractors to common
- allow backend and graph as identity provider setting to simplify setting subclasses, even though graph is not implemented in mex-common
- BREAKING: make backend api connector response models generic, to keep DRY
- skip None values when merging extracted and rule items
- merging logic to mex-common
- BREAKING: add nested models (Text, Link) to all lookups in
mex.common.fields
- BREAKING: move
GenericFieldInfo
frommodels.base.field_info
toutils
- BREAKING: move
get_all_fields
fromBaseModel
toutils
to support all base models
- BREAKING: change type of distribution.title to an array of texts
- updated ldap search from name and familyname to one single attribute "displayname"
- add preview models for merged items without cardinality validation
- BREAKING: preview models are now part of all
mex.common.fields
lookups - add
BackendApiConnector.fetch_preview_items
for fetching previews
- stop using
ExtractedData
, useAnyExtractedModel
instead - stop using
MergedItem
, useAnyMergedModel
instead - stop using
AdditiveRule
, useAnyAdditiveRule
instead - stop using
SubtractiveRule
, useAnySubtractiveRule
instead - stop using
PreventiveRule
, useAnyPreventiveRule
instead - stop using
BaseEntity
, use a concrete union instead
- removed deprecated
BulkInsertResponse
as alias forIdentifiersResponse
- removed unused module export of
mex.common.models.generate_entity_filter_schema
- removed unused module export of
mex.common.models.generate_mapping_schema
- drop export
models.ExtractedPrimarySourceIdentifier
, import fromtypes
instead - drop export
models.MergedPrimarySourceIdentifier
, import fromtypes
instead
- add vocabulary and temporal unions and lookups to
mex.common.types
- add
mex.common.fields
with field type by class name lookups
- wikidata helper now optionally accepts wikidata primary source
- set default empty rules to all of the rule-set models
- pin pydantic to sub 2.10 (for now) because of breaking changes
- switch HTTP method for preview endpoint to
POST
- add optional values to variadic values for distribution models
- make
endpointDescription
optional for variadic access platform models
- organigram extraction checks for duplicate emails/labels in different organigram units
- upgrade mex-model dependency to version 3.2
- upgrade mex-model dependency to version 3.1
- fix typo in
repositoryURL
of bibliographic resources - make identifier and stableTargetId of ExtractedBibliographicResource computed fields
- added new consent and bibliography reference models and vocabs
- added doi field to resource models
- helper function for primary source look up
- upgrade mex-model dependency to version 3
- make ruff linter config opt-out, instead of opt-in
- make instances of extracted data hashable
- BREAKING: Wikidata convenience function refactored and renamed to 'helper'
- wikidata helper function split between mex-common and mex-extractors
- code de-duplication: fixture extracted_primary_sources uses function-part of helper
- split up YearMonth and Year temporal types and improved patterns
- applied all changes to model fields according to model v3
- update LOINC pattern
- fix temporal entity schemas
- add pattern constants for vocabs, emails, urls and ids to types module
- add regex pattern to json schema of identifier fields
- automatically add examples and useScheme to json schema of enum fields
- BREAKING: use
identifier
instead ofstableTargetId
to get merged item from backend - ensure identifier unions are typed to generic
Identifier
instead of the first match to signal that we don't actually know which of the union types is correct - unify pydantic schema configuration for all types
- consistently parse emails, identifiers and temporals in models to their type, not str
- consistently serialize emails, ids and temporals in models to str, not their type
- make instances of Link type hashable, to harmonize them with Text models
- drop manual examples from enum fields, because they are autogenerated now
- BREAKING: remove
MEX_ID_PATTERN
from types, in favor ofIDENTIFIER_PATTERN
- BREAKING: make public
MEX_ID_ALPHABET
constant from identifier module private - BREAKING: remove
__str__
methods from Text and Link classes - BREAKING: drop support for parsing UUIDs as Identifiers, this was unused
- BREAKING: drop support for parsing Links from markdown syntax, this was unused
- BREAKING: remove pydantic1-style
validate
methods from all type models - BREAKING:
BackendApiConnector.post_models
in favor ofpost_extracted_items
- added methods for extracting persons by name or ID from ldap
contains_only_types
to check if fields are annotated as desiredgroup_fields_by_class_name
utility to simplify filtered model/field lookups- new parameters to
get_inner_types
to customize what to unpack
- pin pytz to 2024.1, as stopgap for MX-1703
- added
BackendApiConnector
methods to cover all current (and near future) endpoints:fetch_extracted_items
,fetch_merged_items
,get_merged_item
,preview_merged_item
andget_rule_set
- complete the list of exported names in
models
andtypes
modules
- deprecated
BackendApiConnector.post_models
in favor ofpost_extracted_items
- containerize section from release pipeline
- added the
rki/mex
user-agent to all requests of the HTTPConnector
- update cruft and loosen up pyproject dependencies
- harmonize signatures/docs of pydantic core/json schema manipulating methods
- fix schema tests not starting with diverging model names in common and mex-model
- fix serialization for temporal entity instances within pydantic models
- wikidata fixtures to pytest plugin: wikidata_organization_raw, wikidata_organization, mocked_wikidata
- convenience function
get_merged_organization_id_by_query_with_extract_transform_and_load
for getting the stableTargetId of an organization, while transforming and loading the organization using the provided load function - models for rule-set requests and responses along with typing and lookups
- add
BaseT
models to the exported names ofmex.common.models
- add
MEX_ID_PATTERN
to the exported names ofmex.common.types
- move all base models and pydantic scaffolding into
mex.common.models.base
for a cleaner structure within the growingmodels
module
- HTTP connector backoff for 10 retries on 403 from server
rki/mex
user agent is sent with query requests via wikidata connector
-
changed backend api connector payload to "items"
-
update wikidata search organization request query, with optional language parameter wikidata query search can be enhanced by specifying the language. EN is the default language.
- move log timestamp and coloring into the formatter
mex.common.logging.echo
is deprecated in favor oflogging.info
- add missing listyness-fix support for computed-fields
- BREAKING: ability to store different settings instances at the same time. Dependent repositories now must bundle all settings in a single class.
- get count of found wikidata organizations
- add validator to base model that verifies computed fields can be set but not altered
- new class hierarchy for identifiers: ExtractedIdentifier and MergedIdentifier
- improve typing for methods using
Self
- make local type variables private
- use json instead of pickle to calculate checksum of models
- replace
set_identifiers
validator with computed fields on each extracted model
- removed custom stringify method on base entities that included the
identifier
field
- fix typing for
__eq__
arguments
- extract multiple organizations from wikidata
- update mex-model to version 2.5
- add static class attribute
stemType
to models, containing an unprefixed entityType - add
AnyRuleModel
,RULE_MODEL_CLASSES
,RULE_MODEL_CLASSES_BY_NAME
to models - add type aliases
AnyPrimitiveType
andLiteralStringType
to types - add new utility function
ensure_postfix
for adding postfixes to strings
- clean-up and unify
mapping
andfilter
class generation
- fix memory identity provider seeding
- add classes for Additive, Preventive and Subtractive rules for all entity types
- add types, lists and lookups for all three rule types to
mex.common.models
- move aux-extractor documentation from readme to
__init__
to have it in sphinx - move
BaseModel
specific descriptions from class to model to avoid duplication - BREAKING: move
FILTER_MODEL_BY_EXTRACTED_CLASS_NAME
tomex.common.models
- BREAKING: move
MAPPING_MODEL_BY_EXTRACTED_CLASS_NAME
tomex.common.models
- BREAKING: change
MEX_PRIMARY_SOURCE_IDENTIFIER
to end in1
, so that it differs fromMEX_PRIMARY_SOURCE_STABLE_TARGET_ID
- isolate settings context before first test
- add
precision
keyword to TemporalEntity constructor - add transform function for single wikidata organization to extracted organization
- add tests for ldap.extract
- fix ldap.extract.get_merged_ids_by_email
- synchronize changes to fields in
BaseSettings
to all active settings subclasses - added github action for renovatebot
- make memory identity provider deterministic (same input args results in same stableTargetId and Identifier)
- rework
ContextStore
intoSingletonStore
with more intuitive API - phase out ambiguous "context" naming in favor of more descriptive "singleton store"
- rename
SettingsContext
toSETTINGS_STORE
and allow multiple active subclasses - rename
ConnectorContext
toCONNECTOR_STORE
removing its context manager functions - replace
reset_connector_context()
with more consistentCONNECTOR_STORE.reset()
- removed types
IdentifierT
,SettingsType
,ConnectorType
in favor oftyping.Self
- remove github dependabot configuration
- return only one org from wikidata, if multiple or no org is found then return None
- filter quotation marks (") from requested wikidata label
- port
get_inner_types
frommex-backend
tomex.common.utils
- rename Timestamp class to TemporalEntity
- added subclasses with specific resolution YearMonth, YearMonthDay, YearMonthDayTime
- modernize typing with syntactic sugar
- simplify
BaseModel._get_list_field_names
usingget_inner_types
- switch from poetry to pdm
- use vocabulary JSON files from mex-model
- remove vocabulary JSON files
- date and time validation working and harmonized with mex-model
- add
entityType
type hint toMExModel
(nowBaseEntity
) - add types for
AnyBaseModel
,AnyExtractedModel
andAnyMergedModel
- create more specific subclasses of
Identifier
(for extracted and merged) - expose unions, lists and lookups for
Identifier
subclasses inmex.common.types
- swap
contextvars.ContextVar
formex.common.context.ContextStore
- move
stableTargetId
property from base models to extracted models - update typing of identifiers to specific subclasses
- use
Annotated[..., Field(...)]
notation for pydantic field configs - split up
mex.common.models.base
and move outMExModel
andJsonSchemaGenerator
- rename
MExModel
toBaseEntity
with only type hints an model config - declare
hadPrimarySource
,identifier
andidentifierInPrimarySource
as frozen
- absorb unused
BaseExtractedData
intoExtractedData
- remove
stableTargetId
property from merged models - drop support for sinks to accept merged items (now only for extracted data)
- update cruft and dev dependencies
- randomize test order by default
- remove
mex.common.public_api
module and the correlating sinks - remove
PathWrapper.resolve
andPathWrapper.raw
methods
- remove
pytest.mark
from fixture inmex.common.testing.plugin
- update cruft and minor dependencies
- date-time format validation for mapping model generation
- update cruft to apply new workflow trigger config
- update poetry and pre-commit dependencies
- fix mex mapping model name
- pytest plugins for random order and parallelized test execution
- move dynamic mapping model generation from mex-assets
mex.bat test
uses random order and xdist plugins by default
- cruft template link
- workflow that syncs main branch to openCoDE
- constant for MEX_PRIMARY_SOURCE_IDENTIFIER
- harmonized boilerplate
- ExtractedData raises proper ValidationError when parsing wrong base type
- add
entityType
field in all extracted and merged models
- wikidata test
CHANGELOG.md
documenting notable changes to this project- a template for pull requests
- language french in language vocabulary
- tests for
mex.common.types.PathWrapper
- method
is_relative
tomex.common.types.PathWrapper
to check whether the path is relative
- resolve base paths of work/assets path fields in settings
- nesting of
mex.common.types.PathWrapper
on instantiation
- move
Sink
andIdentityProvider
tomex.common.types
- deprecate
MExModel.get_entity_type
, usecls.__name__
instead - deprecate
mex.common.models.MODEL_CLASSES[_BY_ENTITY_TYPE]
, use the more precise lists or dicts likeEXTRACTED_MODEL_CLASSES_BY_NAME
instead
- use dmypy for pre-commit type checking
- fix previously undetected typing issue
- configure CI linting to install poetry
- update versions