Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace xml.NewEncoder with xml.EscapeText #2100

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

artur-chopikian
Copy link
Contributor

@artur-chopikian artur-chopikian commented Mar 6, 2025

PR Details

Memory allocations

Description

xml.NewEncoder uses bufio.NewWriter, which allocates 4096 bytes to every call (every sell with text in the xlsx, you can imagine how much it can be).

const (
	defaultBufSize = 4096
)

func NewWriter(w io.Writer) *Writer {
	return NewWriterSize(w, defaultBufSize)
}

And this xml.EscapeText shows new lines properly in the xlsx file.

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@artur-chopikian
Copy link
Contributor Author

artur-chopikian commented Mar 6, 2025

@xuri, please take a look at this. I hope we can roll back this change

The commit where this change was added: 9999221

Copy link

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 99.19%. Comparing base (aef20e2) to head (1442644).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
xml.go 85.71% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2100      +/-   ##
==========================================
- Coverage   99.20%   99.19%   -0.02%     
==========================================
  Files          32       33       +1     
  Lines       30096    30142      +46     
==========================================
+ Hits        29858    29898      +40     
- Misses        158      162       +4     
- Partials       80       82       +2     
Flag Coverage Δ
unittests 99.19% <87.50%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xuri xuri added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 7, 2025
Copy link
Member

@xuri xuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will cause xml:space="preserve" attribute of t element missing. The \n new line will doesn't work.

Before:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text
</t>
    </is>
</c>

After this PR change:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t>text&#xA;</t>
    </is>
</c>

For example:

package main

import (
	"fmt"

	"github.com/xuri/excelize/v2"
)

func main() {
	f := excelize.NewFile()
	defer func() {
		if err := f.Close(); err != nil {
			fmt.Println(err)
		}
	}()
	sw, err := f.NewStreamWriter("Sheet1")
	if err != nil {
		fmt.Println(err)
		return
	}
	styleID, err := f.NewStyle(&excelize.Style{
		Alignment: &excelize.Alignment{WrapText: true},
	})
	if err != nil {
		fmt.Println(err)
		return
	}
	if err := sw.SetRow("A1", []interface{}{excelize.Cell{Value: "text\n", StyleID: styleID}}); err != nil {
		fmt.Println(err)
		return
	}
	if err := sw.Flush(); err != nil {
		fmt.Println(err)
		return
	}
	if err = f.SaveAs("Book1.xlsx"); err != nil {
		fmt.Println(err)
	}
}

This change will caused no-new line after A1 cell value text:

text

After this PR change:

text

So, I don't think we need to roll back the change 9999221.

@artur-chopikian
Copy link
Contributor Author

@xuri Thanks, I got it! Then I do not see another way like copy this small method and make it work as we expect it.

@artur-chopikian
Copy link
Contributor Author

artur-chopikian commented Mar 7, 2025

@xuri Or what if we check it before? Can you imagine some problem that can cause it?

// trimCellValue provides a function to set string type to cell.
func trimCellValue(value string, escape bool) (v string, ns xml.Attr) {
	if utf8.RuneCountInString(value) > TotalCellChars {
		value = string([]rune(value)[:TotalCellChars])
	}
	if value != "" {
		prefix, suffix := value[0], value[len(value)-1]
		for _, ascii := range []byte{9, 10, 13, 32} {
			if prefix == ascii || suffix == ascii {
				ns = xml.Attr{
					Name:  xml.Name{Space: NameSpaceXML, Local: "space"},
					Value: "preserve",
				}
				break
			}
		}

		if escape {
			var buf bytes.Buffer
			_ = xml.EscapeText(&buf, []byte(value))
			value = buf.String()
		}
	}
	v = bstrMarshal(value)
	return
}

And we have this one

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text&#xA;</t>
    </is>
</c>

@artur-chopikian artur-chopikian requested a review from xuri March 7, 2025 14:47
Copy link
Member

@xuri xuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, your lasted change will escape \n in different way:

Before:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text
</t>
    </is>
</c>

After this PR change:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text&#xA;</t>
    </is>
</c>

This change will caused no-new line after A1 cell value text in Windows Office 2007, but works on Windows Office 2010, Excel for Mac.

@artur-chopikian
Copy link
Contributor Author

artur-chopikian commented Mar 7, 2025

What about others? I think we also have a problem with those symbols because we will replace them with:

\t -> &#x9;
\r -> &#xD;

@xuri
Copy link
Member

xuri commented Mar 8, 2025

The xml.EscapeText will not transform \t to &#x9;, it could be works in all version Excel applications.

The \r symbol cannot be used to add a new line in the cell, so it may not function correctly in all versions of Excel.

Therefore, I suggest maintaining the current trimCellValue code for better compatibility.

@artur-chopikian artur-chopikian requested a review from xuri March 10, 2025 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants