Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: support html file in knowledge base #1703

Merged
merged 1 commit into from
Feb 15, 2025
Merged

Conversation

wenwei-lin
Copy link
Contributor

@wenwei-lin wenwei-lin commented Feb 15, 2025

恢复 HTML Loader。上次我 PR 了对 HTML 类型的解析 (#1260),但被后来的 commit 覆盖了

@wenwei-lin
Copy link
Contributor Author

又增加了对 Drafts 导出笔记的支持。Drafts 导出的文件是 json,有很多无关的信息,不利于 RAG。
image

新增的导入方法,只会提取content, tags, modified_at 这三个属性,精简信息。
image

由于使用了 embedJs 的 JSONLoader,来源名称显示的不太友好。 修复这个问题,可能需要给上游的库提 PR(还没做)

@wenwei-lin wenwei-lin changed the title feat: Support json and html file types in knowledge base feat: Support json, html, and draftsExport file types in knowledge base Feb 15, 2025
@kangfenmao
Copy link
Collaborator

我建议拆分出两个 pr 先修复问题上线,再加新功能

@wenwei-lin
Copy link
Contributor Author

我建议拆分出两个 pr 先修复问题上线,再加新功能

好,我晚点改一下

@wenwei-lin wenwei-lin changed the title feat: Support json, html, and draftsExport file types in knowledge base fix: support html file in knowledge base Feb 15, 2025
@kangfenmao kangfenmao merged commit d574a09 into CherryHQ:main Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants