You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does anyone have an issue with Dataset.pushData when "maxConcurrency" is set to anything greater then 1? Few items are not written to dataset. Can't find any topic on how it would be better to handle this. The only way I see is to write data after crawler is finished running.
There's a bit more discussion there, but there's definitely no stuff like missing await or something, at least I don't see anything obvious. I tried the repro locally and it happens randomly indeed. Never with maxConcurrency: 1, every second run without. I verified that items are there - e.g. if you also push items to array in memory and save it in the end of the run - it works as expected. Another way to verify that items are received - dataset could miss some items, but If I will push the final array (the one kept in memory) to Key-Value store - they are all there.
cc @vladfrangu this sounds like another issue with waiting for the writes. the repro fails for me consistently, and when I put 1s sleep after the run method resolves, it seems to help.
It did fail once with the 1s sleep too. IIRC we are waiting for the storages to complete the writes in Actor.exit but we need the same for crawler.run.
Which package is this bug report for? If unsure which one to select, leave blank
None
Issue description
Was reported on discord by @yellott:
There's a bit more discussion there, but there's definitely no stuff like missing await or something, at least I don't see anything obvious. I tried the repro locally and it happens randomly indeed. Never with
maxConcurrency: 1
, every second run without. I verified that items are there - e.g. if you also push items to array in memory and save it in the end of the run - it works as expected. Another way to verify that items are received - dataset could miss some items, but If I will push the final array (the one kept in memory) to Key-Value store - they are all there.Reproduction here: https://github.com/yellott/crawlee-odd-behaviour-mre
Code sample
No response
Package version
3.2.2
Node.js version
18.14.2
Operating system
macOS
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
The text was updated successfully, but these errors were encountered: