Releasing MEDFORD 2.0 #18

infispiel · 2024-06-11T19:41:58Z

PR to move all the development that has been done on medford parser 2.0 into the main branch, such that we can work on reorganizing the repository for future development.

The core logic of the medford parser has been completely redone to (hopefully) be more object-oriented and amenable to adding features. A document fully describing these changes will be coming in the near future.

Importantly to note, this will break all external tool support, which I believe is currently only the medford vscode extension (described here), as well as its associated LSP (described here). This will hopefully be fixed in the medford parser v2.1.

Creating new linereader object that appropriately creates lines objects. Lines objects contain lines from linereader that automatically parse the input line to remove inline comments, find used macros, etc.

- adding details (NovelDetailLine, ContinueLine collections) - adding blocks (detail collections)

Getting closer to a full reimplementation. Added the LineProcessor function to start collecting Line objects into logical collections. (e.g. NovelDetailLine + Continue Lines + NovelDetailLine form a Block) Also started a tmp_medford to hold a sketch of the logic of a standard run. Also created processed_lines to start collecting Line collections. Macro has already been moved; Block will follow.

Went ahead and moved Block and Detail to be beside Macro. Made Block and Detail subclasses of LineCollection. Detail now uses get_raw_content and get_content instead of payloads. processed_lines has been renamed to linecollections to better represent the purpose of these data types. LineProcessor has been renamed to LineCollector to better represent its purpose. Fixed an accidental bug in LineReader that caused it to treat everything as a Macro.

lines: renamed payload to raw_content for consistency... linecollector: additional logic to actually create the first block. * should go back and double-check this logic when writing linecollector tests. linecollections: minor bug fixes, such as: * correct var used for indexing * temporarily remove Detail.validate() until logic is written * set self.is_header as False when not a header Detail * fixed ^ use order of operations * Block now supports name-only blocks (e.g. placeholder blocks)

major_token was defined as a single string. This has been adjusted back to being major_tokens and List[str] to allow for the possibility of compounding tokens. (e.g. File-Primary, File-Remote, etc.) Started adding Dictionizer tools that, given a list of Blocks, converts them into the Dictionary format expected by Pydantic. Added some simple beginning Dictionizer tests.

Began rewriting Pydantic models to take advantage of new custom classes (e.g. Detail, Block) to store information like line #, etc. Bugfixes include: - changing references to Macro dict to be Dict[str, Macro] instead of Dict[str, str]. - add a headDetail attribute to Block to access detail that defines its name. - fix order of name, detail in Dictionizer. - return Detail when setting name attribute in Dictionizer.

Dictionizer's Dict[str, List[Dict[...]]] typing wasn't playing nice with Pydantic because of the fact that many models now have the 'Block' attribute, which is not a List of a Dict. Typing needs to be fixed later to properly represent the fact that Dict values can either be List[Dict[...]] or Block.

Separated "get content" from "resolve" for Macros. To get the content of a Macro after its macros have been replaced, use the "resolve" function. This is because Macros keep track as to whether or not they've been resolved before, to save processing time. For example, if two macros reference Macro1, the second time Macro1 has resolve() called, it should immediately return its .resolution attribute. Added a ludicrous amount of tests to try and make sure macro logic works. Removed test_obj_lineprocessor, because it's a holdover from before I renamed it to linecollector.

Added helper function for Blocks to provide a str version of their major token chain. Started adding command line arguments for actually running MEDFORD in the temporary new MEDFORD main file. Adjusted LineCollector to separate named blocks by major token. Now, two blocks with different major tokens may share a name. Added functions to LineCollector to provide all blocks and to provide macros, rather than the main function having to scoop them out itself.

Forgot to update LineCollector tests to use new internal representation of LineCollector. Fixed bug where LineCollector never instantiated sub-Dict of self.named_blocks() when trying to add blocks.

Figured out how to make an Enum accept different cases. Now can use any capitalization of a known Mode for setting the -m when running MEDFORD.

This removes the \n and spaces leading and tailing output that is passed to Pydantic.

ErrorManager is now called from the Medford script itself, if an error is encountered while pydantic is parsing the dictionary representation. ErrorManager now has some concept of Pydantic errors. Adjusted Block to no longer be a subclass of LineCollection since it shouldn't have the same properties (e.g. HeadLine, it should instead ask its head detail for its head line, if necessary).

Old tests have been moved to DEPRECIATED_tests folder because they no longer compile due to Pydantic updating to version 2. Began implementing @-@ capabilities. Involves: - new Line type - new regex recognition in LineReader - New collection type in LineCollections ('AtAt') - New validate_atat function for all LineCollections. Returns True except for the AtAt type, which actually validates. - Dictionizer now also takes a dictionary of strs (names) to Blocks on initialization - ... Which is now output from LineCollector, using the 'get_1lvl_blocks' function. - Dictionizer got a new validate_atat function that is called from generate_dict. Also, additional fixes: - Fix for Details not storing has_macro and used_macro_names information. - renamed used_macros to used_macro_names for consistency in Block objects. TODO: add tests that actually test the @-@ validation functionality.

LineCollection __eq__ now properly compares macro usages. AtAt detection regex now uses the right string termination flag. Getting the content of a line now defaults to removing the inline comment.

infispiel added 30 commits February 3, 2023 13:25

Begin adding novel line parsing logic to MEDFORD

53e3b91

Move logic for tex/comment overlap; add tests

8336f18

Creation of Line objs using Mixins; new LineReader

0363f1e

Creating new linereader object that appropriately creates lines objects. Lines objects contain lines from linereader that automatically parse the input line to remove inline comments, find used macros, etc.

Implementing further up the object tree

09d939e

- adding details (NovelDetailLine, ContinueLine collections) - adding blocks (detail collections)

Forgot to actually remove block and detail files

c783d6f

Start testing Line Collector; add Equality funcs

f714106

Oops all broken tests! Fixed LineCollector & Tests

7b87bda

Forgot to update LineCollector tests to use new internal representation of LineCollector. Fixed bug where LineCollector never instantiated sub-Dict of self.named_blocks() when trying to add blocks.

Fix more broken tests to use get_flat_blocks

aaad00e

Enabled different cases for Mode parameter

4b38c62

Figured out how to make an Enum accept different cases. Now can use any capitalization of a known Mode for setting the -m when running MEDFORD.

Start adding Pydantic models and Custom Errors

590b5f7

Remove whitespaces from Detail get_content output

0b1ca15

This removes the \n and spaces leading and tailing output that is passed to Pydantic.

Bugfixes: macros, atat, comments

c6cf403

LineCollection __eq__ now properly compares macro usages. AtAt detection regex now uses the right string termination flag. Getting the content of a line now defaults to removing the inline comment.

Finish AtAt implementation

472b9ae

INCOMPLETE: Add missing missing content, atat errs

60cd423

Disable old AtAt definition; adding error objs

4a84e29

Add pypi tooling to git tracking; continue linting

285b3d2

Add concept of Globals - for now just ErrorManager

da03ed2

Orbital laser to old MEDFORD implementation to clean repo

5749331

Rename ErrorManager to Validator (more accurate); some linting

e2a9e4e

More renaming

062691a

infispiel added 4 commits May 29, 2024 15:10

Add new type of Pydantic error

b73ed14

change how errors are caught; add new error types

5f2e774

resolved bug where comment in a block split the block in two

0383c2c

Merge branch 'main' into novel_linereader

2983b45

infispiel merged commit 3b0b44b into main Jun 11, 2024

infispiel deleted the novel_linereader branch June 25, 2024 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releasing MEDFORD 2.0 #18

Releasing MEDFORD 2.0 #18

infispiel commented Jun 11, 2024

Releasing MEDFORD 2.0 #18

Releasing MEDFORD 2.0 #18

Conversation

infispiel commented Jun 11, 2024