-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Increase Crawler standardization regarding Pipelines #4122
Conversation
+Output Documents +Optional file saving +Optional Document meta about file path
@vblagoje This is about that Issue. Crawler will only save files if the user sets a parameter. It will, as most other Nodes, generate Documents. |
@danielbichuetti Looking good, much better than what we had before. @masci your turn. |
It appears that CI is failing to add |
We merged #4146 so I took the freedom to update the PR with |
Hi @masci Any updates to be made? |
No worries. Just pinged here in case it was lost in the middle of so many PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just questions and nits.
@masci When I checked for a new review, GH removed @agnieszka-m from the PR automaticaly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some docs comments while Agnieszka is out, everything else is good to go and I'll merge it asap.
Co-authored-by: Massimiliano Pippi <[email protected]>
Co-authored-by: Massimiliano Pippi <[email protected]>
Co-authored-by: Massimiliano Pippi <[email protected]>
Co-authored-by: Massimiliano Pippi <[email protected]>
Co-authored-by: Massimiliano Pippi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Oh my it's finally green 🎉 merging before the CI changes their mind |
Related Issues
.crawl()
does not have thereturn_documents
option #4188Proposed Changes:
Crawler implementation was changed to adhere to Pipeline flows and improved support for Agents. Now, its main function is to extract Documents. It can save to files and it, optionally, allows keeping track at the Document (meta).
+Output Documents primarily
+Optional file save
+Optional add file path to Document meta
How did you test it?
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.