feat: session locking #1839

barjin · 2023-03-20T14:17:40Z

Adds session locking for limiting session usage concurrency.

This caused inconsistencies with the user-provided options values (see #1836 )and (in extreme cases?) might have tripped bot detection.

fixes #627

B4nan · 2023-03-20T14:42:04Z

packages/basic-crawler/src/internals/basic-crawler.ts

@@ -1013,7 +1013,8 @@ export class BasicCrawler<Context extends CrawlingContext = BasicCrawlingContext

            // reclaim session if request finishes successfully
            request.state = RequestState.DONE;
-            crawlingContext.session?.markGood();
+            session?.markGood();


I remember we had some bug in here that was resolved by using crawlingContext.session instead of just session, or something similar. I think it was about this part, so might be irrelevant for basic crawler:

https://github.com/apify/crawlee/blob/master/packages/browser-crawler/src/internals/browser-crawler.ts#L474-L478

Right, and it was me who was fixing it 🤦🏼 at least I have a better grip on the session management in basic-/browser-crawler.

That said, I find the session locking a much more minor issue now than before. Let's scratch that and rethink the session handling in the BrowserCrawler instead.

When not stated otherwise (using useIncognitoPages or experimentalContainers), one browser instance is tied to a one specific session (retrieved from a session pool in a preLaunchHook). Basically, the session constraints (maxErrorScore, maxUsageCount) are only checked at the time of session retrieval from the pool - that's what is happening in the #1836 with the maxUsageCount - and it's also why CheerioCrawler is not riddled with this problem. Not sure why the AutoscaledPool.maxConcurrency affects this though.

Maybe this could be solved by adding a getSession(currentSessionId) call to a prePageCreate hook and - if this call returns null - give the browser a new session (by calling getSession())?

Maybe this could be solved by adding a getSession(currentSessionId) call to a prePageCreate hook and - if this call returns null - give the browser a new session (by calling getSession())?

Sounds good to me. (as long as it works, I though we can't just change the proxy for existing browser, at least in PW?)

But maybe we should stop thinking about workarounds and start working on the user pool, as that should help with this exact problem.

barjin · 2023-03-27T15:57:02Z

Closing in favour of user pool development.

mnmkng · 2024-02-02T17:03:09Z

Closing in favour of user pool development.

This did not age well 😄

feat: session locking - initial implementation

6595d03

B4nan reviewed Mar 20, 2023

View reviewed changes

fix: BasicCrawler Session vs crawlingContext.session

ebe5087

barjin closed this Mar 27, 2023

B4nan deleted the feat/session-locking branch July 28, 2023 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: session locking #1839

feat: session locking #1839

barjin commented Mar 20, 2023

B4nan Mar 20, 2023

barjin Mar 23, 2023

barjin Mar 23, 2023

B4nan Mar 23, 2023 •

edited

Loading

barjin commented Mar 27, 2023

mnmkng commented Feb 2, 2024

feat: session locking #1839

feat: session locking #1839

Conversation

barjin commented Mar 20, 2023

B4nan Mar 20, 2023

Choose a reason for hiding this comment

barjin Mar 23, 2023

Choose a reason for hiding this comment

barjin Mar 23, 2023

Choose a reason for hiding this comment

B4nan Mar 23, 2023 • edited Loading

Choose a reason for hiding this comment

barjin commented Mar 27, 2023

mnmkng commented Feb 2, 2024

B4nan Mar 23, 2023 •

edited

Loading