Skip to content

The fast, most optimal, and correct HTML & XML parsing library for Python written in Rust.

License

Notifications You must be signed in to change notification settings

awolverp/markupever

Repository files navigation

MarkupEver

The fast, most optimal, and correct HTML & XML parsing library

Documentation | Releases | Benchmarks

text image image image python-test


MarkupEver is a modern, fast (high-performance), XML & HTML languages parsing library written in Rust.

KEY FEATURES:

  • 🚀 Fast: Very high performance and fast (thanks to html5ever and selectors).
  • 🔥 Easy: Designed to be easy to use and learn. Completion everywhere.
  • Low-Memory: Written in Rust. Uses low memory. Don't worry about memory leaks. Uses Rust memory allocator.
  • 🧶 Thread-safe: Completely thread-safe.
  • 🎯 Quering: Use your CSS knowledge for selecting elements from a HTML or XML document.

Installation

You can install MarkupEver by using pip:

It's recommended to use virtual environments.

$ pip3 install markupever

Example

Parse

Parsing a HTML content and selecting elements:

import markupever as mr

dom = mr.parse_file("file.html", mr.HtmlOptions())
# Or parse a HTML content directly:
# dom = markupever.parse("... content ...", mr.HtmlOptions())

for element in dom.select("div.section > p:child-nth(1)"):
    print(element.text())

Create DOM

Creating a DOM from zero:

from markupever import dom

dom = dom.TreeDom()
root: dom.Document = dom.root()

root.create_doctype("html")

html = root.create_element("html", {"lang": "en"})
body = html.create_element("body")
body.create_text("Hello Everyone ...")

print(root.serialize())
# <!DOCTYPE html><html lang="en"><body>Hello Everyone ...</body></html>

TODO List

  • Rewrite TreeDom __repr__ and __str__
  • Write benchmarks
  • Write memory usage
  • Add PyPI version, test coverage, and python versions badges
  • Complete docs
  • Add prettier feature
  • Provide more control on serializer
  • Add advanced examples to docs (such as socket and http streams)