-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: markdown parsing for react-native-render-html #3983
Comments
Triggered auto assignment to @johncschuster ( |
To help me understand, can you confirm:
Thanks! |
Nice, markdown-it looks like it has a very active community, which we love to see! I'm also excited about the rule customization and plugin architecture it offers.
Let me know what questions you have and how I can help! I am envisioning a phased scope-of-work that might look something like this:
Also, as @quinthar has suggested above, if you could benchmark
This is my understanding as well. |
My understanding is that WASM is not officially supported at the moment and there is an open ticket for Hermes engine, but you can use
Yes, the goal is still to translate markdown to DOM! The benchmark is a quick glimpse at each parser capability, assuming that the "whichever-kind-of-syntax-tree-structure to HTML part" performance would be sensibly the same for every engine. I could rewrite the benchmark for accuracy, but honestly I looked for the fastest-to-implement path. To develop the statement in the OP, the plan is to use markdown-it tokenizer and plug it in a
The thing is, all the tested markdown parsers claim compliance with CommonMark, and I doubt this ad-hoc ExpensiMark parser does. So to benchmark your parser, I would need to tailor tests scenarios and make sure the parsers are working in equal conditions. I'd be happy to do that in a preliminary "0" step! That sounds like a solid plan. As |
Triggered auto assignment to @tgolen ( |
To clarify, this is somewhat true- we won't be able to use pure Theoretically this might net us performance results between |
@robertjchen Thanks for pointing that out! I must confess that I'm not proficient in WASM nor asm.js. After reading |
I changed my mind and thought that if it does perform worse, that will be flagrant whether or not the same set of syntax rules are covered. I managed to incorporate ExpensiMark in the benchmark (although in a hacky way because it uses ES modules , I just copied the required sources). As I had anticipated, ExpensiMark turns out to be pretty slow in comparison to other parsers (I'm still dumbfounded by the bad performance of remark). This is probably because of its regex-based parsing, which requires vastly more input traversals than a automaton "finite state machine" based parser. |
I would expect that over time we're going to want our own parser, to do nonstandard things -- we want to own our own grammar. But this is a huge testament to the power of WASM and we should absolutely be aiming for that. Can we just port ExpensiMark to C++ and compile to WASM? No need to use regexp. |
@quinthar If you can get your hands on C/C++, I'm not sure I see any advantage of using WASM over JSI. AFAIK WASM is still interpreted, while C/C++ will be run in an architecture-specific binary. Plus, JSI is the future standard for performant low-level react-native modules... An important thing to consider if you want to follow that scheme (porting ExpensiMark to C++ based on md4c), is converting back HTML to markdown. But there are good and fast C++ HTML parsers and that part should be much easier. No hurry, but please let me know if you're still interested in markdown being supported directly in RNRH, most likely with Hopefully at some point the whole transient render engine will be ported to C++ where we could use md4c for Markdown. In any case we would make sure you can reasonably easily plug-in your own parser implementation to generate a TRT and render that in RNRH. |
@jsamr, this Monthly task hasn't been acted upon in 6 weeks; closing. If you disagree, feel encouraged to reopen it -- but pick your least important issue to close instead. |
This is a follow-up on a request from @roryabraham. I have investigated the options to integrate Markdown parsing a little further, and here are my findings:
htmlparser2
Tokenizer class to emit a DOM.Chosing a Tokenizer
I've forked this benchmark to add micromark which I found very well structured and solid (via remark-html), and below are my findings (Intel i7-8809G, 32GB of RAM, Nodejs 14.16.0).
Average Ops per second
Minmax parse time
Average Throughput
Conclusion
Markdown-it is the clear winner, since there is no official web assembly support in React Native. Other plus:
Implementation Plan
Get inspiration from
MarkdownIt.Renderer
:consume a token tree from
MarkdownIt.parse
and invoke correspondinghtmlparser2
callbacks while walking the tree.I'll also need some help to assess which features you want to enable for Expensify.cash.
Package Design
I need to think of a new package design since I don't want
@native-html/core
to depend directly on markdown-it.Testing Strategy
The parser will be tested against the official commonmark-spec repository.
The text was updated successfully, but these errors were encountered: