You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using this library to read text out of XLSX files, and generally it works incredibly well. Typically the performance is fantastic, well, well under 1 second. However, I have on file, which sadly I cannot share, where the time to extract the text is ~10minutes. What I have been able to do isprofile that file and see that the issue is in mergeCellsParser. The exact sheet in question does have merged cells, among other things.
Steps to reproduce the issue:
Using a file I can share you can still see the issue. The file, merge_test.xlsx, has two columns with ~5,000 rows each, but these columns are merged A-B, and merged C-D.
This file takes 1m50s to extract text from.
A snippet of the code I am using, though not the full code:
f, err := excelize.OpenFile(xlsxFile, excelOpts)
if err != nil {
return nil, fmt.Errorf("excelize failed to open file %s: %w", xlsxFile, err)
}
defer func() {
// Close the spreadsheet.
if err := f.Close(); err != nil {
logger.WithError(err).Error("failed to close XLSX file")
}
}()
// Get extras that we need, comments first
comments, err := f.GetComments()
if err != nil {
return nil, fmt.Errorf("failed to GetComments: %w", err)
}
sheetRels, err := msx.findRels(arc, ExcelRelsXML, "xl/")
if err != nil {
return nil, err
}
// Now we can process the main data from the file using excelize
for i, sheet := range f.GetSheetList() {
rows, err := f.GetRows(sheet, excelOpts)
if err != nil {
return nil, fmt.Errorf("failed to GetRows for %s: %w", xlsxFile, err)
}
if _, err := sb.WriteString(sheet + "\n"); err != nil {
return nil, fmt.Errorf("failed to write the sheet title %q: %s", sheet, err)
}
for j, row := range rows {
for k, colCell := range row {
txt := colCell
// cellName e.g. "A1" is needed to get cell hyperlinks.
cellName, err := excelize.CoordinatesToCellName(k+1, j+1) // this is not 0 based
if err != nil {
return nil, fmt.Errorf("failed to get cell name for %d-%d: %w", k+1, j+1, err)
}
hasLink, link, err := f.GetCellHyperLink(sheet, cellName)
if err != nil {
// log this, it is not worth aborting the whole extraction for
logger.WithError(err).WithField("cellName", cellName).Warn("failed to get link for cell")
}
if hasLink {
txt = colCell + " " + link
}
if _, err := sb.WriteString(txt + " "); err != nil {
return nil, fmt.Errorf("failed to write string to builder: %w", err)
}
}
}
As I say go's profiling shows the issue is.
This profile was generated using the worst case scenario file that takes 10mins to read
Describe the results you received:
The text is extracted correctly, however it takes ~10minutes
Describe the results you expected:
I would expect it to be faster, even if it were longer than normal, I think under 30s is reasonable
Output of go version:
go version go1.19.4 darwin/amd64
Excelize version or commit ID:
github.com/xuri/excelize/v2 v2.7.0
Environment details (OS, Microsoft Excel™ version, physical, etc.):
This performance is particularly
The text was updated successfully, but these errors were encountered:
nathj07
changed the title
Slow Performance when there are lots of calculations
Slow Performance when there are lots of merge cells & calculations
Jan 16, 2023
Description
I am using this library to read text out of XLSX files, and generally it works incredibly well. Typically the performance is fantastic, well, well under 1 second. However, I have on file, which sadly I cannot share, where the time to extract the text is ~10minutes. What I have been able to do isprofile that file and see that the issue is in
mergeCellsParser
. The exact sheet in question does have merged cells, among other things.Steps to reproduce the issue:
Using a file I can share you can still see the issue. The file,
merge_test.xlsx, has two columns with ~5,000 rows each, but these columns are merged A-B, and merged C-D.
This file takes 1m50s to extract text from.
A snippet of the code I am using, though not the full code:
As I say go's profiling shows the issue is.
This profile was generated using the worst case scenario file that takes 10mins to read
Describe the results you received:
The text is extracted correctly, however it takes ~10minutes
Describe the results you expected:
I would expect it to be faster, even if it were longer than normal, I think under 30s is reasonable
Output of
go version
:Excelize version or commit ID:
Environment details (OS, Microsoft Excel™ version, physical, etc.):
This performance is particularly
The text was updated successfully, but these errors were encountered: