-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SessionPool generates more sessions that needed and also does not respect "maxUsageCount" constraint. #1836
Comments
Thanks for report. The first test looks a bit weird. Where do the errors even happen? If the page would not load the localhost, your code in request handler would not run. |
Note that you are not awaiting the |
There is a local server in repo) This can be tested on any others. |
I've tried awaiting it as well but this does not change the result. |
@B4nan I've updated repo to await |
I can reproduce this. I use Even though this helps spreading, I can see more than 1 request being done per session by looking at logs. |
I found that every request increase the const crawler = new PlaywrightCrawler({
headless: true,
maxRequestRetries: -1,
useSessionPool: true,
persistCookiesPerSession: true,
async requestHandler({page, response, request, proxyInfo, enqueueLinks, browserController, session}) {
console.log(session?.getState().usageCount,)
},
})
await crawler.run(list(1, 100).map(i => `http://localhost:3000/mock/forCrawlee/cookie?order=10&q=${i}`)) |
I experienced something similar with playwright crawler. Upon investigation:
This point out to be a bug in either:
|
Which package is this bug report for? If unsure which one to select, leave blank
None
Issue description
sessionPoolOptions.sessionOptions.maxUsageCount
to 1, it can actually be anything else, it's just easier to see with 1.sessionId
in default request handler.sessionId
logs with content ofSDK_SESSION_POOL_STATE.json
file.usageCount
10 of them equals to 0 whileusageCount
of 5 another sessions equals to 8.Repro is here.
Code sample
And this one of the sessions in
SDK_SESSION_POOL_STATE.json
. As you can seeusageCount
is greater thanmaxUsageCount
anderrorScore
is greater thanmaxErrorScore
.This way it at least respects
maxUsageCount
but still generates twice as much sessions than needed.Package version
3.3.0
Node.js version
v18.13.0
Operating system
Ubuntu 22.04.1 LTS on WSL 2
Apify platform
I have tested this on the
next
release3.3.1-beta.10
Other context
No response
The text was updated successfully, but these errors were encountered: