Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: normalize component types with ComponentType #148

Merged
merged 2 commits into from
Feb 1, 2024

Conversation

ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Jan 31, 2024

Prior to this commit, Components encoded their namespace + type info as
two columns on the Component model itself. This wasted a lot of space on
a very common index lookup–many rows of ('xblock.v1', 'video'),
('xblock.v1', 'problem'), and ('xblock.v1', 'text'). But worse, the lack
of a separate ComponentType model meant that there was no first class
entity to store per-type policy data against. For example, we might want
to say "the following types are supported by libraries, while these
other types are experimental", or "these types are enabled for these
orgs", etc.

Components are required to have a ComponentType. We're rewriting the
first migration for the components app here, since this app hasn't been
added to edx-platform yet.

Prior to this commit, Components encoded their namespace + type info as
two columns on the Component model itself. This wasted a lot of space on
a very common index lookup–many rows of ('xblock.v1', 'video'),
('xblock.v1', 'problem'), and ('xblock.v1', 'text'). But worse, the lack
of a separate ComponentType model meant that there was no first class
entity to store per-type policy data against. For example, we might want
to say "the following types are supported by libraries, while these
other types are experimental", or "these types are enabled for these
orgs", etc.

Components are required to have a ComponentType. We're rewriting the
first migration for the components app here, since this app hasn't been
added to edx-platform yet.
@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 31, 2024

Supersedes #147

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple Qs, otherwise looks great.

openedx_learning/core/components/models.py Show resolved Hide resolved


@lru_cache(maxsize=128)
def get_or_create_component_type_id(namespace: str, name: str) -> int:
Copy link
Member

@kdmccormick kdmccormick Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it returns a pk, I think I would mildly prefer get_or_create_component_type, since you're going to create a "compoment type", not a "component type id".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not great, but I decided to keep it with _id at the end because of all the places we're returning a full model.

Honestly, I'm currently mulling over how this part works in my current PR. I realized that the way things are now, the cache is vulnerable to getting corrupted when there are errors, since we might cache a value for a ComponentType that was later rolled back. So I might have to rethink this setup entirely. But I'll do that in the next PR. 😛

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Would the rolled-back id be mis-cached in the lru_cache, or somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in lru_cache.

Copy link
Contributor

@bradenmacdonald bradenmacdonald Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect that creating ComponentType instances will occur outside of create_component. So I suggest that you just put the ComponentType.objects.get_or_create( call inline (without a cache) inside create_component. Then if it gets rolled back, no problem.

If this gets simplified to @lru_cache def get_component_type_id() then you have two nice simplifications:

  1. If the ComponentType is not found, it won't be cached, due to the ComponentType.NotFound exception

  2. You can optimize lookups like this:

     Component.with_publishing_relations \
                        .get(
                            learning_package_id=learning_package_id,
                            component_type__namespace=namespace,
                            component_type__name=type_name,
                            local_key=local_key,
                        )

    becomes

     Component.with_publishing_relations \
                        .get(
                            learning_package_id=learning_package_id,
                            component_type_id=get_component_type_id(namespace, type_name),
                            local_key=local_key,
                        )

which will then cause your function to throw differentiated exceptions: Component.NotFound OR ComponentType.NotFound vs. only throwing Component.NotFound. Arguably more explicit (but maybe a pain if callers need to plan to catch both).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Nice, I like that.

Copy link
Contributor Author

@ormsbee ormsbee Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I'm going to just kick the lru to the curb entirely for now. The worry I have is that the lru bleeds across transactions and probably even threads (?), and we don't know what combination of reads and writes of Components people are going to do in a big transaction (e.g. import). So it's possible that a process will add a bunch of stuff (including new ContentTypes), then do some reads of Components because they're grouping them somehow (thus loading things into the LRU cache). Then their process dies, and gets rolled back.

I guess the semantics I'm really looking for are transaction-local caches. Which actually sounds really cool now that I think about it... but I'm definitely not going there this week. A much simpler version of that would be a local object that proxies the requests through and goes out of scope at the end of whatever import operation happens. But things will probably be fine if I just punt on caching for a while.

@ormsbee ormsbee merged commit 8d03e22 into openedx:main Feb 1, 2024
7 checks passed
@ormsbee ormsbee deleted the component-types-4 branch February 1, 2024 20:26
@ormsbee ormsbee self-assigned this Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants