Implement lexer #339

victor-pogor · 2024-06-22T16:34:13Z

Background and Motivation

Implementing a lexer for PDF is essential for efficiently parsing and analyzing PDF documents. Unlike traditional programming languages, PDF documents have a unique structure and encoding, requiring a specialized lexer to interpret the document's syntax and content accurately.

By creating a dedicated lexer, we can ensure more precise parsing, improve performance, and enhance the maintainability of the PDF codebase, ultimately leading to a better user experience in handling and displaying PDF files.

Acceptance criteria

The lexer should handle all PDF types and tokens, according to the ISO 32000-1:2008 specification
The lexer should include the trivia, used later for Lossless syntax trees
The lexer should have error handling
Minimal performance improvements are required as part of this story
Unit Tests
Mutation Testing

Open questions

Do we need a quick scanner like Roslyn has?
How to lex the trivia? Should they be included in tokens or be separated in the syntax tree?
How do different languages/libs handle the lexing phase?
- Roslyn returns a full SyntaxToken object that includes text, value, errors, and syntax trivia
- Swift lexer works in a similar way as Roslyn
- Rust does not attach the whitespace characters as trivia to tokens, but there was a discussion on that. Rust Analyzer however is implemented like Roslyn or Swift
- pdf.js has a scannerless parser

Resources

The text was updated successfully, but these errors were encountered:

victor-pogor mentioned this issue Jun 22, 2024

[Feature] Lexer #337

Open

github-project-automation bot added this to Off.NET Board Jun 22, 2024

github-project-automation bot moved this to 🆕 New in Off.NET Board Jun 22, 2024

victor-pogor moved this from 🆕 New to 📋 Backlog in Off.NET Board Jun 22, 2024

victor-pogor added the 🎯 user story Short requirements or requests written from the perspective of an end user label Jun 22, 2024

victor-pogor self-assigned this Jun 22, 2024

victor-pogor moved this from 📋 Backlog to 🏗 In progress in Off.NET Board Jun 26, 2024

victor-pogor linked a pull request Nov 19, 2024 that will close this issue

Enhanced the text cursor #363

Merged

victor-pogor closed this as completed in #363 Nov 19, 2024

github-project-automation bot moved this from 🏗 In progress to ✅ Done in Off.NET Board Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement lexer #339

Implement lexer #339

victor-pogor commented Jun 22, 2024 •

edited

Loading

Implement lexer #339

Implement lexer #339

Comments

victor-pogor commented Jun 22, 2024 • edited Loading

Background and Motivation

Acceptance criteria

Open questions

Resources

victor-pogor commented Jun 22, 2024 •

edited

Loading