Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releasing MEDFORD 2.0 #18

Merged
merged 34 commits into from
Jun 11, 2024
Merged

Releasing MEDFORD 2.0 #18

merged 34 commits into from
Jun 11, 2024

Conversation

infispiel
Copy link
Member

PR to move all the development that has been done on medford parser 2.0 into the main branch, such that we can work on reorganizing the repository for future development.

The core logic of the medford parser has been completely redone to (hopefully) be more object-oriented and amenable to adding features. A document fully describing these changes will be coming in the near future.

Importantly to note, this will break all external tool support, which I believe is currently only the medford vscode extension (described here), as well as its associated LSP (described here). This will hopefully be fixed in the medford parser v2.1.

Creating new linereader object that appropriately creates lines objects.

Lines objects contain lines from linereader that automatically parse the
input line to remove inline comments, find used macros, etc.
- adding details (NovelDetailLine, ContinueLine collections)
- adding blocks (detail collections)
Getting closer to a full reimplementation. Added the LineProcessor
function to start collecting Line objects into logical collections.
(e.g. NovelDetailLine + Continue Lines + NovelDetailLine form a Block)
Also started a tmp_medford to hold a sketch of the logic of a standard
run.
Also created processed_lines to start collecting Line collections. Macro
has already been moved; Block will follow.
Went ahead and moved Block and Detail to be beside Macro.
Made Block and Detail subclasses of LineCollection.
Detail now uses get_raw_content and get_content instead of payloads.
processed_lines has been renamed to linecollections to better represent
the purpose of these data types.
LineProcessor has been renamed to LineCollector to better represent its
purpose.
Fixed an accidental bug in LineReader that caused it to treat everything
as a Macro.
lines: renamed payload to raw_content for consistency...
linecollector: additional logic to actually create the first block.
* should go back and double-check this logic when writing linecollector
tests.
linecollections: minor bug fixes, such as:
 * correct var used for indexing
 * temporarily remove Detail.validate() until logic is written
 * set self.is_header as False when not a header Detail
 * fixed ^ use order of operations
 * Block now supports name-only blocks (e.g. placeholder blocks)
major_token was defined as a single string. This has been adjusted
back to being major_tokens and List[str] to allow for the possibility
of compounding tokens. (e.g. File-Primary, File-Remote, etc.)

Started adding Dictionizer tools that, given a list of Blocks,
converts them into the Dictionary format expected by Pydantic.

Added some simple beginning Dictionizer tests.
Began rewriting Pydantic models to take advantage of new custom classes
(e.g. Detail, Block) to store information like line #, etc.

Bugfixes include:
 - changing references to Macro dict to be Dict[str, Macro] instead of
Dict[str, str].
 - add a headDetail attribute to Block to access detail that defines
its name.
 - fix order of name, detail in Dictionizer.
 - return Detail when setting name attribute in Dictionizer.
Dictionizer's Dict[str, List[Dict[...]]] typing wasn't playing nice
with Pydantic because of the fact that many models now have the 'Block'
attribute, which is not a List of a Dict.
Typing needs to be fixed later to properly represent the fact that
Dict values can either be List[Dict[...]] or Block.
Separated "get content" from "resolve" for Macros.
To get the content of a Macro after its macros have been replaced, use
the "resolve" function.

This is because Macros keep track as to whether or not they've been
resolved before, to save processing time.

For example, if two macros reference Macro1, the second time Macro1
has resolve() called, it should immediately return its .resolution
attribute.

Added a ludicrous amount of tests to try and make sure macro logic
works.

Removed test_obj_lineprocessor, because it's a holdover from before
I renamed it to linecollector.
Added helper function for Blocks to provide a str version of their
major token chain.

Started adding command line arguments for actually running MEDFORD
in the temporary new MEDFORD main file.

Adjusted LineCollector to separate named blocks by major token. Now,
two blocks with different major tokens may share a name.

Added functions to LineCollector to provide all blocks and to provide
macros, rather than the main function having to scoop them out itself.
Forgot to update LineCollector tests to use new internal representation
of LineCollector.

Fixed bug where LineCollector never instantiated sub-Dict of
self.named_blocks() when trying to add blocks.
Figured out how to make an Enum accept different cases. Now can use any
capitalization of a known Mode for setting the -m when running MEDFORD.
This removes the \n and spaces leading and tailing output that is passed
to Pydantic.
ErrorManager is now called from the Medford script
itself, if an error is encountered while pydantic is
parsing the dictionary representation.

ErrorManager now has some concept of Pydantic errors.

Adjusted Block to no longer be a subclass of
LineCollection since it shouldn't have the same
properties (e.g. HeadLine, it should instead ask its
head detail for its head line, if necessary).
Old tests have been moved to DEPRECIATED_tests
folder because they no longer compile due to
Pydantic updating to version 2.

Began implementing @-@ capabilities. Involves:
- new Line type
- new regex recognition in LineReader
- New collection type in LineCollections
    ('AtAt')
- New validate_atat function for all
    LineCollections. Returns True except
    for the AtAt type, which actually validates.
- Dictionizer now also takes a dictionary of
    strs (names) to Blocks on initialization
- ... Which is now output from LineCollector,
    using the 'get_1lvl_blocks' function.
- Dictionizer got a new validate_atat function
    that is called from generate_dict.

Also, additional fixes:
- Fix for Details not storing has_macro and
    used_macro_names information.
- renamed used_macros to used_macro_names for
    consistency in Block objects.

TODO: add tests that actually test the @-@
validation functionality.
LineCollection __eq__ now properly compares macro
usages.

AtAt detection regex now uses the right string
termination flag.

Getting the content of a line now defaults to
removing the inline comment.
@infispiel infispiel merged commit 3b0b44b into main Jun 11, 2024
@infispiel infispiel deleted the novel_linereader branch June 25, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant