Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers/units: High level APIs and expanded test cases #278

Open
wants to merge 20 commits into
base: braden/containers-units-ref
Choose a base branch
from

Conversation

bradenmacdonald
Copy link
Contributor

@bradenmacdonald bradenmacdonald commented Feb 14, 2025

Changes in this PR

Compared to the prototype in #251 :

  1. Expanded the unit test cases significantly. Please review carefully
  2. Simplified EntitityListRow: combined draft_version and published_version into just entity_version.
  3. Added a new contains_unpublished_changes API to check if a unit's unpinned children have unpublished changes, even if the unit itself is unchanged.
  4. Added high-level APIs: get_components_in_draft_unit and get_components_in_published_unit
  5. Added a get_components_in_published_unit_as_of API that can get the state of the unit at any previous point in the learning package's publication history.
  6. Removed functions related to defined_list/initial_list/frozen_list from the api for now. These are more of an internal implementation detail and we don't want the test cases tied too heavily to the internal details. I'm hoping we can remove these entirely, but we're still discussing that.

TODO: maybe combine defined_list, initial_list, and frozen_list ?

Questions

  1. UX specs say modifying a child will mark the unit as "having unpublished changes" but ADR says otherwise. Does it make sense to implement a container_has_unpublished_changes() helper function that recursively checks for unpublished changes in a container (and its child containers, and so on...), so we can display this in the UI? See slack discussion

    • I implemented this for now.
  2. Do I understand correctly then that the version history of units should NOT be thought of as a series of snapshots of the unit (as I had been thinking) but rather a changelog of edits made to the unit itself (but ignoring most changes made its the children)?

    • If so, To get a snapshot of the unit at a point in time, it's necessary to pass in a timestamp or a PublishLog PK. Should we have an API to get an old version of a unit by passing in a PublishLog PK (or timestamp)?
    • I have implemented a proof of concept as get_components_in_published_unit_as_of
  3. Is it correct that units allow you to pin the version of a component, which means you always get exactly that component, but sequences/subsections will not really support pinning in the same way, because even if you pin a unit to a specific version, the unpinned components in that unit can still be updated?

    • If so, is there any value in even implementing "pinning" for containers larger than units?
  4. Do we have any plans to implement pinned/unpinned components in the UI or is everything unpinned?

  5. If we have unpinned published versions, why do we need the various defined_list, initial_list, frozen_list etc? So far in the test cases I've been writing I haven't found a need to even reference initial_list nor frozen_list.

  6. Why do we specify draft_version_pks and published_version_pks when creating a new draft version of a unit? Seems wrong to say anything at all about the published version when it doesn't yet exist and may never exist.

  7. Note to self: if soft deleting a component doesn't mark the unit as changed, we need to update the UI to show soft deleted draft components and allow publishing the deletion. Or maybe this is handled by a container_has_unpublished_changes and recursivelying pseudo-publishing the container (no changes to container but force publish the children including soft deletes).

# This value is mutable if and only if there are unpinned references in
# defined_list. In that case, frozen_list should start as None, and be
# updated to pin references when another version of this Container becomes
# the Draft version. But if this version ever becomes the Draft again
# (e.g. the user hits "discard changes" or some kind of revert happens),
# then we need to clear this back to None.

  1. Based on the above, frozen list doesn't make a lot of sense, because the versions of the container don't capture many of the changes to the contents. e.g. container version 1 had component version 1 through 50, and container version 2 had component versions 51 through 100, what is the point of tracking whether container version 1 "should" show v1 or v50 when they're wildly different?

@openedx-webhooks
Copy link

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Feb 14, 2025
@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

  1. UX specs say modifying a child will mark the unit as "having unpublished changes" but ADR says otherwise. Does it make sense to implement a container_has_unpublished_changes() helper function that recursively checks for unpublished changes in a container (and its child containers, and so on...), so we can display this in the UI? See slack discussion

I think we'll want some bookkeeping models in the containers app that track when a new child version is created and recursively marks its ancestors "dirty". Similarly, I think we'll want to have one that tracks "this publish affected these ancestors, even if they haven't otherwise changed". I haven't worked out any details beyond the vague thoughts that:

  • We need to keep it really lightweight.
  • It might be useful to have a major/minor version numbering of that for containers, in the sense that major number tracks changes to the container's metadata, ordering, and membership; and the minor number tracks changes to the contents. So 1.0, 1.1, 1.7, 2.0, 2.1, etc.
  • I used to think that we had to track all containers that used a particular entity and then auto-generate new draft versions of those containers with the entity missing if the entity was soft-deleted... and then also auto-publish those containers when te entity's deletion was published. But now I'm thinking that we do that where we know it makes sense (like in the originating Unit where we're editing it), but we otherwise are tolerant of when unpinned children disappear like that.

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

  1. Why do we specify draft_version_pks and published_version_pks when creating a new draft version of a unit? Seems wrong to say anything at all about the published version when it doesn't yet exist and may never exist.

Yeah, I feel like we had this discussion at some point and came to the conclusion that it was because my original sketch for EntityListRow had draft_version and published_version, but really it should have just had a single version field.

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

  1. If we have unpinned published versions, why do we need the various defined_list, initial_list, frozen_list etc? So far in the test cases I've been writing I haven't found a need to even reference initial_list nor frozen_list.

I had originally thought that these would be useful for history purposes, but I think it's not necessary if we have minor version tracking.

I do wonder if the relationship between containers and versioning is basic enough where containers should be folded directly into the publishing app (with specific container-types still having their own apps).

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

Though I guess one very confusing part about having a major and minor version for containers is that it's possible to publish any particular subset of the children, so it's not like there's really a published "1.12" version–"1.12" just means "twelve changes have happened to some descendent". So it'd be more like we'd be running two minor counters for children–one for draft changes, one for publishes.

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

Also, I guess we should double-check the actual cost of encoding snapshots of the child versions given the current representation. Trying to sketch some of this out now...

@bradenmacdonald
Copy link
Contributor Author

bradenmacdonald commented Feb 14, 2025

I think we'll want some bookkeeping models in the containers app that track when a new child version is created and recursively marks its ancestors "dirty".

For now, I implemented a contains_unpublished_changes() that recursively scans a container for changes; it uses a fixed 3 queries to determine if a unit has changes, regardless of how many components are in the unit. This should be enough to write the test cases for now and confirm correctness, but doesn't scale above unit size so will need to be replaced by a tracking-based approach or something like a closure table for all the containers in each learning package. (Presumably only for drafts or possibly separate closure tables for draft and published trees.) I kind of like the closure table idea more than a tracking-based approach, as long as its performant enough - it would let us optimize so many different things.

@bradenmacdonald bradenmacdonald changed the title Understanding containers/units: High level APIs, high level test cases, open questions Containers/units: High level APIs and expanded test cases Feb 15, 2025

Args:
unit_version_pk: The unit version ID.
def get_components_in_draft_unit(
Copy link
Contributor Author

@bradenmacdonald bradenmacdonald Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit suboptimal to me that we have to have separate APIs for get_components_in_draft_unit and get_components_in_published_unit. This is because when things are unpinned, we need to know whether to use the latest draft or latest published version.

I don't think this is a big deal, but it does "feel" to me like it goes against the spirit of the publishing app, where you have a series of versions and draft/published are just pointers to a particular version. With components (and presumably most PublishableEntities in general other than our new containers), if you ask for "version 37", you get a specific thing regardless of whether it is/was a draft or published. Whereas in this new case, you could try to request a particular version of a unit, but unless we know whether you want it as a draft or you want it as published, we usually can't fully describe that particular version's contents.


That is also why my proposed get_components_in_published_unit API doesn't accept a version number; it can't really get anything other than the current published version. (The unit version is insufficient to specify a snapshot of the unit when things are unpinned.)

That said, there likely will still be a way to get snapshots of any historic published version you want, analogous to requesting a specific component version, but it will require asking for the state of the published unit when the entire LearningPackage was at a specific version, which is given by the PublishLog ID. Edit: I implemented an example of this in the PR as get_components_in_published_unit_as_of()

Comment on lines -108 to -111
row = publishable_entities_in_rows.get(entity_pk)
if row and row.order_num == order_num:
new_entity_list.entitylistrow_set.add(row)
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had a bug - the code was moving EntityListRow objects into the new EntityList rather than copying them into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Status: Waiting on Author
Development

Successfully merging this pull request may close these issues.

3 participants