-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent / support deep pagination #5838
Comments
Could be suggestions from the community on Blacklight and pagination performance to add to this issue. |
Some quick links to blacklight discussion about deep pagination for me to look at later: |
Thanks for the links! I had found some of the code linked from the slack discussions and issue 1665, but there was some good additional info in the other threads. PR 3094 is an interesting one and does not look like it is in any v7 releases, so would be another impetus for pushing to Blacklight 8 once it supports Rails 7.2. Stanford homebrewed a similar approach 6-7 years ago, so neat to see comparable behavior incorporated upstream. From reading through all of the links from @cjcolvar , the most common approach by other blacklight institutions has been to limit deep pagination.
Other site search behavior:
Blacklight does not support cursorMarks because Solr is set up to only work moving forward, not bi-directionally which could break paging through results. Also they would not help with bots/users jumping to arbitrary pages. If we wanted to investigate it anyways, we would have to roll our own implementation and with the limitations on the solr side I am not confident how much real benefit there would be. There is some discussion of sitemaps and schemas in the code4lib conversations and it seemed like people were saying that it can potentially result in reduced bot traffic but that deep pagination requests are heavy enough that sometimes a single request can cripple a large enough dataset. I do not think we are quite that large a dataset, but sitemaps seem like something that would be beneficial in general, but would not necessarily have a direct effect for this issue. So at this time, it seems like the main way forward would be to limit how deep users can paginate, and maybe upgrade to blacklight 8 to get the configurable pagination bar. |
Presumably a future release of Blacklight 8.* will support Rails 7.2. Current Blacklight 8 isn't there yet. |
Propose discussing first at Backlog Refinement, then we can schedule more time for discussion if needed. |
Looking at log data could be helpful as well to see what the requests are like in practice. |
I asked Digital Collections and IUCAT folks if they are doing anything about this. Digital Collections said no. David Elyea said about IUCAT:
|
Putting Blacklight 8.x on the roadmap would be a good next step in that part of the investigation. Next step: Look at the logs and retrieve service statistics: examining how much of an issue it is for us can be part of this; we don't need to fully block things off if it's not a large performance issue in our case. If the logs don't point to humans, it's best practice to disable this unless we could say it's not a problem. Ideal for us to not disable this. In practice real users are unlikely to be regularly doing deep pages of search results. Others report using https://github.com/rack/rack-attack successfully. See https://github.com/mastodon/mastodon/blob/a021dee64214fcc662c0c36ad4e44dc1deaba65f/config/initializers/rack_attack.rb#L93 for throttling setting in this library. Also: what is the current level of throttling at the proxy level? We can check in about the current production status of how this is handled in our server architecture. |
@joncameron to write a new investigation issue to carry on with the work here regarding paging and what we could do to investigate the real world load and performance issue mitigation. |
Follow-on issue: #6038 |
Description
It is a known issue with Solr / Blacklight that deep pagination into either facet sets or results will cause significant performance issues. We fairly regularly see occurrences in the MCO logs where requests are coming in past the 100 page mark. Although these don't normally exhibit paging (ex: requesting page 651, 651, 652, etc.) and are one-off requests, more targeted paging does sometimes occur and may be contributing to sudden slowdowns in Solr and CPU spikes on the Solr server.
How can we better handle these situations?
Done Looks Like
The text was updated successfully, but these errors were encountered: