Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lexer #339

Closed
Tracked by #337
victor-pogor opened this issue Jun 22, 2024 · 0 comments · Fixed by #363
Closed
Tracked by #337

Implement lexer #339

victor-pogor opened this issue Jun 22, 2024 · 0 comments · Fixed by #363
Assignees
Labels
🎯 user story Short requirements or requests written from the perspective of an end user

Comments

@victor-pogor
Copy link
Member

victor-pogor commented Jun 22, 2024

Background and Motivation

Implementing a lexer for PDF is essential for efficiently parsing and analyzing PDF documents. Unlike traditional programming languages, PDF documents have a unique structure and encoding, requiring a specialized lexer to interpret the document's syntax and content accurately.

By creating a dedicated lexer, we can ensure more precise parsing, improve performance, and enhance the maintainability of the PDF codebase, ultimately leading to a better user experience in handling and displaying PDF files.

Acceptance criteria

  • The lexer should handle all PDF types and tokens, according to the ISO 32000-1:2008 specification
  • The lexer should include the trivia, used later for Lossless syntax trees
  • The lexer should have error handling
  • Minimal performance improvements are required as part of this story
  • Unit Tests
  • Mutation Testing

Open questions

  • Do we need a quick scanner like Roslyn has?
  • How to lex the trivia? Should they be included in tokens or be separated in the syntax tree?
  • How do different languages/libs handle the lexing phase?
    • Roslyn returns a full SyntaxToken object that includes text, value, errors, and syntax trivia
    • Swift lexer works in a similar way as Roslyn
    • Rust does not attach the whitespace characters as trivia to tokens, but there was a discussion on that. Rust Analyzer however is implemented like Roslyn or Swift
    • pdf.js has a scannerless parser

Resources

@github-project-automation github-project-automation bot moved this to 🆕 New in Off.NET Board Jun 22, 2024
@victor-pogor victor-pogor moved this from 🆕 New to 📋 Backlog in Off.NET Board Jun 22, 2024
@victor-pogor victor-pogor added the 🎯 user story Short requirements or requests written from the perspective of an end user label Jun 22, 2024
@victor-pogor victor-pogor self-assigned this Jun 22, 2024
@victor-pogor victor-pogor moved this from 📋 Backlog to 🏗 In progress in Off.NET Board Jun 26, 2024
@victor-pogor victor-pogor linked a pull request Nov 19, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Off.NET Board Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎯 user story Short requirements or requests written from the perspective of an end user
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant