Skip to content

Allow <![CDATA[]]> to preserve whitespace in XML text content #821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

OWL-Varun
Copy link
Contributor

@OWL-Varun OWL-Varun commented Aug 9, 2022

Add a new constructor parameter (an enum XMLTextEscapeStyle with
Standard and CDATA as values) to XMLTextInfosetOutputter. When CDATA is
passed instead of Standard, wrap simple XML elements' text contents in
CDATA brackets to preserve any whitespace they contain.

DAFFODIL-2346-preserve-whitespace-in-dfdl-infoset

@OWL-Varun OWL-Varun force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch 2 times, most recently from 8908852 to 74ab456 Compare December 1, 2022 17:42
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Minor fixes to doc strings in XSD.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some questions.

@OWL-Varun OWL-Varun force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch 2 times, most recently from 0420201 to c7dabd7 Compare December 7, 2022 16:38
Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Looks good except for some inconsistent formatting. Please fix and then you're good to squash and merge.

@OWL-Varun OWL-Varun force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch from c7dabd7 to 0d813e1 Compare December 16, 2022 17:07
Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of changes suggested by Steve and me.

@OWL-Varun OWL-Varun force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch 2 times, most recently from e5b2e62 to db285db Compare December 21, 2022 18:32
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THere are a very small number of non-resolved conversations still to be fixed.

@OWL-Varun OWL-Varun force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch from db285db to c2ec22f Compare December 22, 2022 20:03
@tuxji
Copy link
Contributor

tuxji commented Dec 30, 2022

We need to update this pull request's title and description since they no longer match what this pull request actually does. We collectively decided not to create a tunable XMLOutputStyle, only pass an enum XMLTextEscapeStyle as a new constructor parameter to XMLTextInfosetOutputter. We decided not to address the question how to let a CLI option change infoset outputter settings in this pull request, even though this pull request's issue, DAFFODIL-2346, originally requested a CLI option to use <!CDATA[]]> to preserve whitespace in XML infosets' simple elements' text content.

Note that another issue, DAFFODIL-2234, also wants a CLI option to not pretty print XML or JSON infosets. We already pass a boolean "pretty" as a constructor parameter to XMLTextInfosetOutputter, but we don't have any way for the CLI to change this constructor parameter's value from true to false in the places where the codebase passes true.

If we still punt the question of how to let the CLI, or TDML Runner, change infoset outputter settings to DAFFODIL-2234 instead of this pull request, then I suggest we edit this pull request's new title and description to:

Allow <!CDATA[]]> to preserve whitespace in XML text content

Add a new constructor parameter (an enum XMLTextEscapeStyle with Standard and CDATA values) to XMLTextInfosetOutputter.  When CDATA is passed instead of the default Standard, wrap simple XML elements' text content in CDATA brackets to preserve any whitespace they contain.

DAFFODIL-2346

Also, I think we should consider adding new XMLTextInfosetOutputter methods which can change infoset outputter settings using "withXXX" calls instead of changing them in constructor parameters. Calling "withXXX" methods is a more upward compatible and extensible mechanism to change objects' settings since we can simply add new "withXXX" methods in the future. A "withXXX" method can also eliminate the need to pass a boolean or enum value since the "withXXX" method's name can describe the new setting as well, such as "withPrettyPrint" or "withCDATA".

@stevedlawrence
Copy link
Member

Agreed one the new PR title and commit message.

Also agreed on new withXXX functions for setting parameters, that's a much cleaner API. I'd suggest we do that as a separate PR though.

@OWL-Varun OWL-Varun force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch from c2ec22f to 0ecb138 Compare January 4, 2023 15:39
@OWL-Varun OWL-Varun changed the title Add support for new tunable "xmlOutputStyle" Add a "*" in comments between parameter descriptions Jan 4, 2023
@OWL-Varun OWL-Varun changed the title Add a "*" in comments between parameter descriptions Allow <![CDATA[]]> to preserve whitespace in XML text content Jan 4, 2023
Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Yes, this pull request is ready for squashing and merging after a fix to some comments.

@OWL-Varun OWL-Varun force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch from 0ecb138 to 6656012 Compare January 5, 2023 18:03
@mbeckerle
Copy link
Contributor

Feature that can allow for this parameterizing of the infoset outputters is in #912

@stevedlawrence
Copy link
Member

I've updated this PR and I believe all issues are resolved, including the latest of only escaping with CDATA if special chars or whitespace exist. Please take one last look at this to make sure it's ready to merge.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I gave this one more quick pass over the latest changes. Looks good to me.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Also looks good to me, even though a future PR to switch from constructor parameters to withXXX() functions possibly might make the enums unnecessary depending on whether it makes sense to have two differently named no-argument withXXX() methods or still pass an enum to a withXXX() method. Let's merge anyway.

Add a new constructor parameter (an enum XMLTextEscapeStyle with
Standard and CDATA as values) to XMLTextInfosetOutputter. When CDATA is
passed instead of Standard, wrap simple XML elements' text contents in
CDATA brackets to preserve any whitespace they contain.

DAFFODIL-2346
@stevedlawrence stevedlawrence force-pushed the daffodil-2346-preserve-whitespace-in-dfdl-infoset branch from 7a7995e to 1eb1b19 Compare February 1, 2023 16:59
@stevedlawrence stevedlawrence merged commit 10c520a into apache:main Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants