Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't rely on the HTMLFile model to index and re-index files #7875

Closed
stsewd opened this issue Jan 27, 2021 · 1 comment
Closed

Don't rely on the HTMLFile model to index and re-index files #7875

stsewd opened this issue Jan 27, 2021 · 1 comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code Priority: low Low priority

Comments

@stsewd
Copy link
Member

stsewd commented Jan 27, 2021

So, we don't need these objects in the db to re-index, since we re-read all their data from storage to index/re-index.
The only benefit is to track changed files, but that is going to be removed in #7874.

Our current flow is:

  • Create html objects from all html files in storage
  • Create intersphinx data for existing html files (here we use the html file objects to check that, but it can be replaced by keeping files in memory)
  • Sync to ES
  • Delete html files from previous builds (afte this, the html objects aren't used for anything else other thant the re-index management command)

Benefits include not having that data in the db (more spaaaace!), and removing the django-elasticsearch-dsl package (fewer dependencies!).

Work to do:

  • Use a structure/class instead of an ORM model to pass around the data in the indexing step.
  • Change our re-indexing management command to not depend on the objects from the db, but do the re-index directly.
@stsewd stsewd added Improvement Minor improvement to code Accepted Accepted issue on our roadmap Priority: low Low priority labels Jan 27, 2021
@humitos
Copy link
Member

humitos commented Sep 15, 2023

This was implemented in #10696

removing the django-elasticsearch-dsl package (fewer dependencies!).

This is going to be done in #10730

@humitos humitos closed this as completed Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code Priority: low Low priority
Projects
None yet
Development

No branches or pull requests

2 participants