-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚚 Add ability to have multiple indexes with same PK in Dynamo layer #6034
Conversation
In the past, we used to disallow having multiple indexes with the same partition key; the reason was that we wanted to be able to determine from any `{ "input": "dict" }` immediately which index we should use. And since sort keys are optional, the simplest way we could do that was to make sure all PKs were unique: that way an index would always be unambiguously identified. This now blocks a use case we have, where we want to have 2 indexes with the same partition key but with 2 different sort keys (for the purposes of sorting the results differently). In this PR, make it possible to have multiple indexes with the same partition key, as long as the sort keys are different. This now does pose a problem if you query with only a partition key and there are two indexes that include that partition key: ```py indexes = [ Index('epoch', sort_key='created'), # Index 1 Index('epoch', sort_key='username'), # Index 2 ] results = table.get_many({ "epoch": 1 }) ``` Should this use index 1 (return users sorted by timestamp) or should this use index 2 (return users sorted by username?). I considered picking the first index in the array that applies because this would be backwards compatible when adding indexes to the end of the array... but I'm concerned that this is too implicit and could lead to unexpected results or action-at-a-distance problems: changing the order of the `indexes` array could all of a sudden make another part of the code base incorrect. So the query above is now an error. When there are multiple indexes that could apply, you need to explicitly disambiguate between them by putting a dummy condition on the sort key field you want to use: ```py results = table.get_many({ "epoch": 1, "created": UseThisIndex(), }) ``` This makes it clear which index should be used, at the expense of having to annotate all existing `get_many()` calls in the code base when you add a conflicting index. The maintainability advantages of the extra explicitness should hopefully win out in the long run.
for more information, see https://pre-commit.ci
…into no-condition-on-partition
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thank you for contributing! Your pull request is now going on the merge train (choo choo! Do not click update from main anymore, and be sure to allow changes to be pushed to your fork). |
Thank you for contributing! Your pull request is now going on the merge train (choo choo! Do not click update from main anymore, and be sure to allow changes to be pushed to your fork). |
In the past, we used to disallow having multiple indexes with the
same partition key; the reason was that we wanted to be able to
determine from any
{ "input": "dict" }
immediately which indexwe should use.
And since sort keys are optional, the simplest way we could do that was
to make sure all PKs were unique: that way an index would always be
unambiguously identified.
This now blocks a use case we have, where we want to have 2 indexes with
the same partition key but with 2 different sort keys (for the purposes of
sorting the results differently).
In this PR, make it possible to have multiple indexes with the same
partition key, as long as the sort keys are different. This now
does pose a problem if you query with only a partition key and there
are two indexes that include that partition key:
Should this use index 1 (return users sorted by timestamp) or should
this use index 2 (return users sorted by username?).
I considered picking the first index in the array that applies because
this would be backwards compatible when adding indexes to the end of
the array... but I'm concerned that this is too implicit and could lead
to unexpected results or action-at-a-distance problems: changing the
order of the
indexes
array could all of a sudden make another part ofthe code base incorrect.
So the query above is now an error. When there are multiple indexes
that could apply, you need to explicitly disambiguate between
them by putting a dummy condition on the sort key field you want
to use:
This makes it clear which index should be used, at the expense of having
to annotate all existing
get_many()
calls in the code base when youadd a conflicting index. The maintainability advantages of the extra
explicitness should hopefully win out in the long run.
How to test
No user visible changes, purely a feature change for @jpelay