Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many locations are listed repeatedly in the search index #2413

Open
LilithHafner opened this issue Jan 20, 2024 · 2 comments
Open

Many locations are listed repeatedly in the search index #2413

LilithHafner opened this issue Jan 20, 2024 · 2 comments
Labels
Format: HTML Related to the default HTML output help wanted Type: Bug

Comments

@LilithHafner
Copy link
Contributor

Examining the search index on https://docs.julialang.org/en/v1.11-dev/#, I noticed that many items are listed multiple times in the search index under the same category, location, page, and title (though with different text).

d = documenterSearchIndex.docs; d.length
10083
all_but_text = d.map(function f(dd) {return dd.category + dd.location + dd.page + dd.title;}); all_but_text.length
10083
new Set(all_but_text).size
3854
all_incl_text = d.map(function f(dd) {return dd.category + dd.location + dd.page + dd.title + dd.text;}); all_incl_text.length
10083
new Set(all_incl_text).size
10046

I imagine that aggregation at index-creation time will improve runtime performance slightly without much alteration to result ordering.

One way to aggregate these semi-duplicates is to concatenate their texts.

@Hetarth02
Copy link
Contributor

There are talks of revisiting the index building logic to address all this and more. (CC @mortenpi )

@mortenpi mortenpi added Type: Bug help wanted Format: HTML Related to the default HTML output labels Jan 22, 2024
@mortenpi
Copy link
Member

Yeah, I think it would be awesome to overhaul the index generation. Not specifically a high priority item though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Format: HTML Related to the default HTML output help wanted Type: Bug
Projects
None yet
Development

No branches or pull requests

3 participants