Skip to content

Commit

Permalink
Merge pull request #630 from ga4gh/merge-2.0.0-ballot-work
Browse files Browse the repository at this point in the history
Merge 2.0.0 ballot work
  • Loading branch information
larrybabb authored Feb 10, 2025
2 parents 2a9e2ea + 9496cf1 commit 61911d4
Show file tree
Hide file tree
Showing 106 changed files with 936 additions and 770 deletions.
24 changes: 24 additions & 0 deletions .github/workflows/cqa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: checks
on: [push, pull_request]
jobs:
precommit_hooks:
runs-on: ubuntu-latest
strategy:
matrix:
cmd:
- "check-added-large-files"
- "trailing-whitespace"
- "end-of-file-fixer"
- "mixed-line-ending"
- "update-json-def-files"
steps:
- uses: actions/checkout@v4

- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: 3.12

- uses: pre-commit/[email protected]
with:
extra_args: ${{ matrix.cmd }} --all-files
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ repos:
- id: detect-private-key
- id: trailing-whitespace
- id: end-of-file-fixer
- id: mixed-line-ending
args: [ --fix=lf ]
- repo: local
hooks:
- id: update-json-def-files
Expand Down
2 changes: 1 addition & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ sphinx:

python:
install:
- requirements: docs/source/requirements.txt
- requirements: docs/source/requirements.txt
2 changes: 1 addition & 1 deletion .requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ jsonschema
referencing
ipython
pyyaml
ga4gh.gks.metaschema==0.3.0
ga4gh.gks.metaschema==0.3.1
sphinx ~= 7.2
sphinx-rtd-theme ~= 1.2
jupyterlab
Expand Down
16 changes: 8 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# Contributing
Contributions to this repository are intended to follow the VRS
[development process](https://vrs.ga4gh.org/en/stable/appendices/development_process.html).
The additional information presented here are guidelines for issues,
branches, commits, and pull requests. Before adding documentation,
The additional information presented here are guidelines for issues,
branches, commits, and pull requests. Before adding documentation,
please also review the [docs style guide](docs/source/style.rst).

## Discussions
[Discussions](https://github.com/ga4gh/vrs/discussions) are for feature
[Discussions](https://github.com/ga4gh/vrs/discussions) are for feature
requests, release candidate discussions, and questions.

## Issues
[Issues](https://github.com/ga4gh/vrs/issues) are for bug
reports, and planned feature descriptions. When creating an issue, use
reports, and planned feature descriptions. When creating an issue, use
sentence case for the issue title and avoid the use of periods at the end
of titles.

Expand All @@ -25,12 +25,12 @@ branch for [issue 250](https://github.com/ga4gh/vrs/issues/250) could
be `250-contributing`.

## Pull Requests
[Pull Requests](https://github.com/ga4gh/vrs/pulls) (PRs) for new
features should target the `main` branch. For version
[Pull Requests](https://github.com/ga4gh/vrs/pulls) (PRs) for new
features should target the `main` branch. For version
patches, the PR should target the appropriate minor version branch.
PRs must be approved by at least one project maintainer before they may
be merged. PR titles must reflect the issue associated with the PR. For
example, the associated PR title for
example, the associated PR title for
[issue 250](https://github.com/ga4gh/vrs/issues/250) would be
`#250: Add CONTRIBUTING.md`, as seen in
`#250: Add CONTRIBUTING.md`, as seen in
[PR #253](https://github.com/ga4gh/vrs/pull/253).
2 changes: 1 addition & 1 deletion CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
|Brian Walsh | [[10](#10)] |
|Andrew D Yates | [[8](#8)] |

See also
See also
[VRS contributors](https://github.com/ga4gh/vrs/graphs/contributors) and
[VRS Python contributors](https://github.com/ga4gh/vrs-python/graphs/contributors).

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The VRS model is the product of the [GA4GH Variation Representation group](https
## Using the schema

The schema is available in the [schema/](./schema/) directory, in both yaml and json versions.
The schema is available in the [schema/](./schema/) directory, in both yaml and json versions.
It conforms to JSON Schema Draft 2020-12. For a list of
libraries that support JSON schema, see
[JSONSchema>Tools](https://json-schema.org/tools).
Expand Down
4 changes: 2 additions & 2 deletions TODO
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Docs
see doc-updates branch
* Standardize quoting: '**blah**' → ``blah``
* Investigate
https://pypi.org/project/sphinx-jsonschema/
* Investigate
https://pypi.org/project/sphinx-jsonschema/
90 changes: 90 additions & 0 deletions docs/source/appendices/design_decisions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
.. _design_decisions:

Design Decisions
!!!!!!!!!!!!!!!!

The following design decisions were made in the development of the VRS:

GA4GH Inherent Properties over Value Objects
--------------------------------------------

In VRS 1.0 we operated under the principle that all identifiable objects in VRS (e.g. Allele, SequenceLocation, etc.)
would be *value objects*. This meant that they should be immutable and contain only required fields that are
necessary to uniquely identify the object. This approach somewhat simplified the ability to generate the digests by
allowing the computation of the digest to be based on the entire object. An exception was made for properties with a
leading underscore (namely, the *_id* property), which was removed from the object before a digest was calculated.

In VRS 2.0 we extended the principle of excepting designated attributes by explicitly defining *inherent properties*
that constitute the properties used to compute an object digest. This was done to enable expressivity of VRS,
enabling implementations to pass common, descriptive metadata as part of the identifiable objects without sacrificing
the ability to create globally unique, federated identifiers from VRS 1.3.

As a result, we had to introduce a new field in the digest model called *ga4gh.inherent* which is described in detail
in the section on :ref:`ga4gh-inherent-properties`.

IRIs over CURIEs
----------------

In VRS 2.0 we moved away from the use of CURIEs in favor of :ref:`iriReference`. Several factors played a role in
this decision.

JSON Schema, the default data model for GKS specifications, does not allow for encoding of CURIE namespaces as is done
in other frameworks such as JSON-LD or XML. As a result, namespaces must be captured from custom data structures, API
endpoints, or documentation that may not persist as messages are exchanged between systems. To address this, references
in GKS specs now use IRIs to reference objects explicitly.

IRI-References over IRIs
------------------------
We opted for the general use of IRI-References as a way to provide a more flexible approach to the use of IRIs
in most GKS message structures. IRI-references (relative IRIs) benefit the users allow for compact representation
of concepts that are accessible within a system (e.g. a directory structure or web API).

VRS identifier syntax and versioning
------------------------------------

The :ref:`versioning` section describes the versioning and release naming conventions for the VRS product.
Approved releases will be assigned to the version number alone, but connect, ballot and snapshot releases will
include the context term and date in addition to the target version number.

During the GA4GH Connect April 2023 meeting the maturity model was discussed at length and the following
proposal was presented for instance and class GKS identifiers.

.. image:: ../images/2023-connect-gks-identifier-proposal.png
:alt: GKS Identifiers Proposal from 2023 April Connect Session
:align: center

As an example, the Github JSON Schema URL ($id) for the VRS 2.0.0 Allele is:

.. code-block:: json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://w3id.org/ga4gh/schema/vrs/2.0/json/Allele",
...
}
During the **release and versioning** discussion at the GA4GH Connect April 2023 meeting the proposal
delved into the idea of including the major version number in the VRS identifier itself. Proponents of
this approach cited concern for the change in digests (and their derived identifiers) between major
versions of the same VRS object, which would become clearly visible in the identifier itself if the
major version was included.

Opponents of this approach argued that new identifiers would be required for every type of VRS object
for every major version release. Meaning that even if a given type of object has no change that would
result in a new digest, a new identifier would still be required for the new major version.

After much discussion, the decision was made to NOT include the major version number in the VRS identifier
itself. Therefore, the :ref:`identifier-construction` does NOT contain the version number, resulting in
the following syntax:

**CURIE namespace resolution**

.. code-block::
ga4gh:VA.Oop4kjdTtKcg1kiZjIJAAR3bp7qi4aNT
**URI Syntax**

.. code-block::
https://w3id.org/ga4gh/vrs/VA.Oop4kjdTtKcg1kiZjIJAAR3bp7qi4aNT
20 changes: 15 additions & 5 deletions docs/source/appendices/ga4gh_identifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,21 @@ reference:


.. _ga4gh-digest-keys:
.. _ga4gh-inherent-properties:

GA4GH Digest Keys
#################
When creating computed identifiers from objects, VRS uses a custom schema attribute,
*ga4gh.inherent*, that contains the property names used for computing digests. For example,
GA4GH Inherent Properties
#########################

.. admonition:: New in v2

In VRS v1, data classes were limited to only inherent properties that contained the minimum
information for describing a variant or other identifiable object. In practice, this resulted
in frequent nesting of VRS objects inside descriptive containers, a complicated pattern for
implementations. VRS 2.0 addresses this limitation with the designation of inherent properties
for use with the computed identifier algorithm.

When creating computed identifiers from objects, VRS uses a custom schema attribute,
*ga4gh.inherent*, that contains the property names used for computing digests. For example,
the Allele JSON Schema:

.. parsed-literal::
Expand All @@ -95,7 +105,7 @@ the Allele JSON Schema:
.. note::

The `ga4gh` JSON Schema namespace is aligned with the Sequence Collections effort
The `ga4gh` JSON Schema namespace is aligned with the Sequence Collections effort
(see `SeqCol#84 <https://github.com/ga4gh/refget/issues/84>`_).

GA4GH Type Prefixes
Expand Down
2 changes: 2 additions & 0 deletions docs/source/appendices/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,7 @@ Appendices
class_diagram
maturity_model
ga4gh_identifiers
resource_identifiers
truncated_digest_collision_analysis
design_decisions
glossary
Loading

0 comments on commit 61911d4

Please sign in to comment.