-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: normalize component types with ComponentType #148
Conversation
Prior to this commit, Components encoded their namespace + type info as two columns on the Component model itself. This wasted a lot of space on a very common index lookup–many rows of ('xblock.v1', 'video'), ('xblock.v1', 'problem'), and ('xblock.v1', 'text'). But worse, the lack of a separate ComponentType model meant that there was no first class entity to store per-type policy data against. For example, we might want to say "the following types are supported by libraries, while these other types are experimental", or "these types are enabled for these orgs", etc. Components are required to have a ComponentType. We're rewriting the first migration for the components app here, since this app hasn't been added to edx-platform yet.
Supersedes #147 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple Qs, otherwise looks great.
|
||
|
||
@lru_cache(maxsize=128) | ||
def get_or_create_component_type_id(namespace: str, name: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it returns a pk, I think I would mildly prefer get_or_create_component_type
, since you're going to create a "compoment type", not a "component type id".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not great, but I decided to keep it with _id
at the end because of all the places we're returning a full model.
Honestly, I'm currently mulling over how this part works in my current PR. I realized that the way things are now, the cache is vulnerable to getting corrupted when there are errors, since we might cache a value for a ComponentType
that was later rolled back. So I might have to rethink this setup entirely. But I'll do that in the next PR. 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Would the rolled-back id be mis-cached in the lru_cache
, or somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in lru_cache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect that creating ComponentType
instances will occur outside of create_component
. So I suggest that you just put the ComponentType.objects.get_or_create(
call inline (without a cache) inside create_component
. Then if it gets rolled back, no problem.
If this gets simplified to @lru_cache def get_component_type_id()
then you have two nice simplifications:
-
If the
ComponentType
is not found, it won't be cached, due to theComponentType.NotFound
exception -
You can optimize lookups like this:
Component.with_publishing_relations \ .get( learning_package_id=learning_package_id, component_type__namespace=namespace, component_type__name=type_name, local_key=local_key, )
becomes
Component.with_publishing_relations \ .get( learning_package_id=learning_package_id, component_type_id=get_component_type_id(namespace, type_name), local_key=local_key, )
which will then cause your function to throw differentiated exceptions: Component.NotFound
OR ComponentType.NotFound
vs. only throwing Component.NotFound
. Arguably more explicit (but maybe a pain if callers need to plan to catch both).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ Nice, I like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I'm going to just kick the lru
to the curb entirely for now. The worry I have is that the lru
bleeds across transactions and probably even threads (?), and we don't know what combination of reads and writes of Components people are going to do in a big transaction (e.g. import). So it's possible that a process will add a bunch of stuff (including new ContentTypes), then do some reads of Components because they're grouping them somehow (thus loading things into the LRU cache). Then their process dies, and gets rolled back.
I guess the semantics I'm really looking for are transaction-local caches. Which actually sounds really cool now that I think about it... but I'm definitely not going there this week. A much simpler version of that would be a local object that proxies the requests through and goes out of scope at the end of whatever import operation happens. But things will probably be fine if I just punt on caching for a while.
Prior to this commit, Components encoded their namespace + type info as
two columns on the Component model itself. This wasted a lot of space on
a very common index lookup–many rows of ('xblock.v1', 'video'),
('xblock.v1', 'problem'), and ('xblock.v1', 'text'). But worse, the lack
of a separate ComponentType model meant that there was no first class
entity to store per-type policy data against. For example, we might want
to say "the following types are supported by libraries, while these
other types are experimental", or "these types are enabled for these
orgs", etc.
Components are required to have a ComponentType. We're rewriting the
first migration for the components app here, since this app hasn't been
added to edx-platform yet.