Containers/units: High level APIs and expanded test cases #278

bradenmacdonald · 2025-02-14T00:33:50Z

Changes in this PR

Compared to the prototype in #251 :

Expanded the unit test cases significantly. Please review carefully
Simplified EntitityListRow: combined draft_version and published_version into just entity_version.
Added a new contains_unpublished_changes API to check if a unit's unpinned children have unpublished changes, even if the unit itself is unchanged.
Added high-level APIs: get_components_in_draft_unit and get_components_in_published_unit
Added a get_components_in_published_unit_as_of API that can get the state of the unit at any previous point in the learning package's publication history.
Removed functions related to defined_list/initial_list/frozen_list from the api for now. These are more of an internal implementation detail and we don't want the test cases tied too heavily to the internal details. I'm hoping we can remove these entirely, but we're still discussing that.

TODO: maybe combine defined_list, initial_list, and frozen_list ?

Questions

UX specs say modifying a child will mark the unit as "having unpublished changes" but ADR says otherwise. Does it make sense to implement a container_has_unpublished_changes() helper function that recursively checks for unpublished changes in a container (and its child containers, and so on...), so we can display this in the UI? See slack discussion
- I implemented this for now.
Do I understand correctly then that the version history of units should NOT be thought of as a series of snapshots of the unit (as I had been thinking) but rather a changelog of edits made to the unit itself (but ignoring most changes made its the children)?
- If so, To get a snapshot of the unit at a point in time, it's necessary to pass in a timestamp or a PublishLog PK. Should we have an API to get an old version of a unit by passing in a PublishLog PK (or timestamp)?
- I have implemented a proof of concept as get_components_in_published_unit_as_of
Is it correct that units allow you to pin the version of a component, which means you always get exactly that component, but sequences/subsections will not really support pinning in the same way, because even if you pin a unit to a specific version, the unpinned components in that unit can still be updated?
- If so, is there any value in even implementing "pinning" for containers larger than units?
Do we have any plans to implement pinned/unpinned components in the UI or is everything unpinned?
If we have unpinned published versions, why do we need the various defined_list, initial_list, frozen_list etc? So far in the test cases I've been writing I haven't found a need to even reference initial_list nor frozen_list.
Why do we specify draft_version_pks and published_version_pks when creating a new draft version of a unit? Seems wrong to say anything at all about the published version when it doesn't yet exist and may never exist.
Note to self: if soft deleting a component doesn't mark the unit as changed, we need to update the UI to show soft deleted draft components and allow publishing the deletion. Or maybe this is handled by a container_has_unpublished_changes and recursivelying pseudo-publishing the container (no changes to container but force publish the children including soft deletes).

# This value is mutable if and only if there are unpinned references in
# defined_list. In that case, frozen_list should start as None, and be
# updated to pin references when another version of this Container becomes
# the Draft version. But if this version ever becomes the Draft again
# (e.g. the user hits "discard changes" or some kind of revert happens),
# then we need to clear this back to None.

Based on the above, frozen list doesn't make a lot of sense, because the versions of the container don't capture many of the changes to the contents. e.g. container version 1 had component version 1 through 50, and container version 2 had component versions 51 through 100, what is the point of tracking whether container version 1 "should" show v1 or v50 when they're wildly different?

openedx-webhooks · 2025-02-14T00:33:54Z

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
- This process (including the steps you'll need to take) is documented here.
If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

Dependencies

This PR must be merged before / after / at the same time as ...
Blockers

This PR is waiting for OEP-1234 to be accepted.
Timeline information

This PR must be merged by XX date because ...
Partner information

This is for a course on edx.org.
Supporting documentation
Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

The size and impact of the changes that it introduces
The need for product review
Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

ormsbee · 2025-02-14T00:55:10Z

UX specs say modifying a child will mark the unit as "having unpublished changes" but ADR says otherwise. Does it make sense to implement a container_has_unpublished_changes() helper function that recursively checks for unpublished changes in a container (and its child containers, and so on...), so we can display this in the UI? See slack discussion

I think we'll want some bookkeeping models in the containers app that track when a new child version is created and recursively marks its ancestors "dirty". Similarly, I think we'll want to have one that tracks "this publish affected these ancestors, even if they haven't otherwise changed". I haven't worked out any details beyond the vague thoughts that:

We need to keep it really lightweight.
It might be useful to have a major/minor version numbering of that for containers, in the sense that major number tracks changes to the container's metadata, ordering, and membership; and the minor number tracks changes to the contents. So 1.0, 1.1, 1.7, 2.0, 2.1, etc.
I used to think that we had to track all containers that used a particular entity and then auto-generate new draft versions of those containers with the entity missing if the entity was soft-deleted... and then also auto-publish those containers when te entity's deletion was published. But now I'm thinking that we do that where we know it makes sense (like in the originating Unit where we're editing it), but we otherwise are tolerant of when unpinned children disappear like that.

ormsbee · 2025-02-14T01:05:50Z

Why do we specify draft_version_pks and published_version_pks when creating a new draft version of a unit? Seems wrong to say anything at all about the published version when it doesn't yet exist and may never exist.

Yeah, I feel like we had this discussion at some point and came to the conclusion that it was because my original sketch for EntityListRow had draft_version and published_version, but really it should have just had a single version field.

ormsbee · 2025-02-14T15:56:25Z

If we have unpinned published versions, why do we need the various defined_list, initial_list, frozen_list etc? So far in the test cases I've been writing I haven't found a need to even reference initial_list nor frozen_list.

I had originally thought that these would be useful for history purposes, but I think it's not necessary if we have minor version tracking.

I do wonder if the relationship between containers and versioning is basic enough where containers should be folded directly into the publishing app (with specific container-types still having their own apps).

ormsbee · 2025-02-14T16:06:48Z

Though I guess one very confusing part about having a major and minor version for containers is that it's possible to publish any particular subset of the children, so it's not like there's really a published "1.12" version–"1.12" just means "twelve changes have happened to some descendent". So it'd be more like we'd be running two minor counters for children–one for draft changes, one for publishes.

ormsbee · 2025-02-14T16:21:09Z

Also, I guess we should double-check the actual cost of encoding snapshots of the child versions given the current representation. Trying to sketch some of this out now...

bradenmacdonald · 2025-02-14T19:19:35Z

I think we'll want some bookkeeping models in the containers app that track when a new child version is created and recursively marks its ancestors "dirty".

For now, I implemented a contains_unpublished_changes() that recursively scans a container for changes; it uses a fixed 3 queries to determine if a unit has changes, regardless of how many components are in the unit. This should be enough to write the test cases for now and confirm correctness, but doesn't scale above unit size so will need to be replaced by a tracking-based approach or something like a closure table for all the containers in each learning package. (Presumably only for drafts or possibly separate closure tables for draft and published trees.) I kind of like the closure table idea more than a tracking-based approach, as long as its performant enough - it would let us optimize so many different things.

…y 3 queries

…-> entity_version

bradenmacdonald · 2025-02-15T02:41:37Z

openedx_learning/apps/authoring/units/api.py


-    Args:
-        unit_version_pk: The unit version ID.
+def get_components_in_draft_unit(


It feels a bit suboptimal to me that we have to have separate APIs for get_components_in_draft_unit and get_components_in_published_unit. This is because when things are unpinned, we need to know whether to use the latest draft or latest published version.

I don't think this is a big deal, but it does "feel" to me like it goes against the spirit of the publishing app, where you have a series of versions and draft/published are just pointers to a particular version. With components (and presumably most PublishableEntities in general other than our new containers), if you ask for "version 37", you get a specific thing regardless of whether it is/was a draft or published. Whereas in this new case, you could try to request a particular version of a unit, but unless we know whether you want it as a draft or you want it as published, we usually can't fully describe that particular version's contents.

That is also why my proposed get_components_in_published_unit API doesn't accept a version number; it can't really get anything other than the current published version. (The unit version is insufficient to specify a snapshot of the unit when things are unpinned.)

That said, there likely will still be a way to get snapshots of any historic published version you want, analogous to requesting a specific component version, but it will require asking for the state of the published unit when the entire LearningPackage was at a specific version, which is given by the PublishLog ID. Edit: I implemented an example of this in the PR as get_components_in_published_unit_as_of()

bradenmacdonald · 2025-02-16T04:40:34Z

openedx_learning/apps/authoring/containers/api.py

-            row = publishable_entities_in_rows.get(entity_pk)
-            if row and row.order_num == order_num:
-                new_entity_list.entitylistrow_set.add(row)
-                continue


This had a bug - the code was moving EntityListRow objects into the new EntityList rather than copying them into it.

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Feb 14, 2025

bradenmacdonald added 5 commits February 14, 2025 15:13

feat: Implement higher-level APIs for getting components from units

a24a964

test: Expand test cases

fe9ccf9

feat: Add a contains_unpublished_changes API

7eaa206

perf: Optimize contains_unpublished_changes to check a unit using onl…

3ac1e11

…y 3 queries

refactor: Simplify EntityList: merge draft_version+published_version …

236f7cc

…-> entity_version

bradenmacdonald force-pushed the braden/containers-units branch from e8fecd1 to 236f7cc Compare February 14, 2025 23:14

bradenmacdonald added 2 commits February 14, 2025 15:37

refactor: Simplify create_next_unit_version() API

5f08231

test: add tests for pinned components

3545527

bradenmacdonald force-pushed the braden/containers-units branch from cac79e1 to 3545527 Compare February 15, 2025 00:39

feat: remove defined_list/initial_list/frozen_list from the API for now

275d5d0

bradenmacdonald mentioned this pull request Feb 15, 2025

docs: ADRs for modeling containers capability #251

Open

bradenmacdonald changed the title ~~Understanding containers/units: High level APIs, high level test cases, open questions~~ Containers/units: High level APIs and expanded test cases Feb 15, 2025

docs: mark all containers APIs as unstable

5750147

bradenmacdonald commented Feb 15, 2025

View reviewed changes

fix: minor corrections to units/containers API

fd83bd6

bradenmacdonald commented Feb 16, 2025

View reviewed changes

bradenmacdonald force-pushed the braden/containers-units branch from c17a738 to 5b93da6 Compare February 16, 2025 17:57

bradenmacdonald added 2 commits February 16, 2025 14:15

feat: API to get a historic published version of a unit

a061722

test: verify that units can only contain Components, not other types

74247f3

bradenmacdonald force-pushed the braden/containers-units branch from 5b93da6 to 74247f3 Compare February 16, 2025 22:16

bradenmacdonald added 2 commits February 16, 2025 14:34

test: "creating two units with the same components"

e7e6472

test: verify that components are from the same learning package

6cc36c0

bradenmacdonald added 3 commits February 18, 2025 10:54

test: implement test_publishing_shared_component based on Slack convo

c268dff

test: clean up unit tests by implying pinned+false by default

a6f21d3

chore: fix minor quality issues

735f4e4

bradenmacdonald force-pushed the braden/containers-units branch from d8f7cb6 to 696d253 Compare February 19, 2025 23:36

bradenmacdonald added 3 commits February 19, 2025 16:52

refactor: re-enable the too-many-positional-arguments pylint check

8f27192

fix: reduce query count of contains_unpublished_changes

4f50022

fix: mypy typing

bbbd33e

bradenmacdonald force-pushed the braden/containers-units branch from 696d253 to bbbd33e Compare February 20, 2025 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containers/units: High level APIs and expanded test cases #278

Containers/units: High level APIs and expanded test cases #278

bradenmacdonald commented Feb 14, 2025 •

edited

Loading

openedx-webhooks commented Feb 14, 2025

ormsbee commented Feb 14, 2025 •

edited

Loading

ormsbee commented Feb 14, 2025

ormsbee commented Feb 14, 2025

ormsbee commented Feb 14, 2025

ormsbee commented Feb 14, 2025

bradenmacdonald commented Feb 14, 2025 •

edited

Loading

bradenmacdonald Feb 15, 2025 •

edited

Loading

bradenmacdonald Feb 16, 2025

Containers/units: High level APIs and expanded test cases #278

Are you sure you want to change the base?

Containers/units: High level APIs and expanded test cases #278

Conversation

bradenmacdonald commented Feb 14, 2025 • edited Loading

Changes in this PR

Questions

openedx-webhooks commented Feb 14, 2025

ormsbee commented Feb 14, 2025 • edited Loading

ormsbee commented Feb 14, 2025

ormsbee commented Feb 14, 2025

ormsbee commented Feb 14, 2025

ormsbee commented Feb 14, 2025

bradenmacdonald commented Feb 14, 2025 • edited Loading

bradenmacdonald Feb 15, 2025 • edited Loading

Choose a reason for hiding this comment

bradenmacdonald Feb 16, 2025

Choose a reason for hiding this comment

bradenmacdonald commented Feb 14, 2025 •

edited

Loading

ormsbee commented Feb 14, 2025 •

edited

Loading

bradenmacdonald commented Feb 14, 2025 •

edited

Loading

bradenmacdonald Feb 15, 2025 •

edited

Loading