Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: unique_ids has exponential running time if no ID fields #144

Merged
merged 5 commits into from
Jun 28, 2024
Merged

fix: unique_ids has exponential running time if no ID fields #144

merged 5 commits into from
Jun 28, 2024

Conversation

jpmckinney
Copy link
Contributor

@jpmckinney jpmckinney commented Feb 27, 2024

If instance has no ID fields, uniq(instance) isn't called once, but as many times as the length of instance.

Original commit d28dd01

@jpmckinney
Copy link
Contributor Author

cc @duncandewhurst for prioritization

@jpmckinney
Copy link
Contributor Author

jpmckinney commented Feb 27, 2024

Edit: I fixed the integration workflow.


You can ignore the ocds integration test failure. I assume it has something to do with how you install dependencies (it is not replicating how cove-ocds does it, which installs a single requirements file, whereas you install several which can cause different versions to be used).

        pip install -r cove-${{ matrix.cove }}/requirements.txt
        pip install -r cove-${{ matrix.cove }}/requirements_dev.txt
        pip install -r requirements_dev.txt
        # Make sure we're using local libs rather than one brought in
        # via requirements.
        pip install -e .
        pip install -e ./lib-cove-${{ matrix.cove }}/
        pip list

It's also possible you need to upgrade the workflow from Python 3.8. Our CoVEs use 3.10.

@jpmckinney
Copy link
Contributor Author

For my own learning: This performance issue was hard to pinpoint, because the process never terminated (otherwise, we could use many typical profiling tools). I used py-spy's top -p command to monitor the PID of the Django process. I saw that jsonschema was spending a lot of time in equal, which looked like it was called from uniq. I then checked our code to see where uniq was called. (I also put various raise Exception to see which code was reached before the performance issue – an actual debugger would have probably been useful here.)

…there are missing/invalid IDs and non-unique items
@jpmckinney
Copy link
Contributor Author

Note to self to switch back to libcove release in cove-ocds once this is merged and released.

@Bjwebb Bjwebb self-requested a review June 28, 2024 12:57
@Bjwebb Bjwebb merged commit 3623ecf into OpenDataServices:main Jun 28, 2024
11 checks passed
@jpmckinney jpmckinney deleted the unique-perf branch June 28, 2024 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants