System.IO.Compression: ZipArchive loads entire file in memory on .Dispose #1543

qmfrederik · 2016-09-13T13:21:05Z

When you open a ZipArchive in Update mode, the entire zip file will be loaded in memory when the .Dispose method is invoked.

This is because .Dipose calls .WriteFile, which:

Calls LoadLocalHeaderExtraFieldAndCompressedBytesIfNeeded for all entries, which loads the compressed data into memory for those entries
Sets the size of the .zip archive to 0, by calling _archiveStream.SetLength(0);
Writes out all entries one by one.

As a result:

A lot of memory is used, the compressed data for each entry is loaded into memory
A lot of unnecessary disk I/O is performed, because all entries are written out again, even if they were not modified.

An alternative may be to incrementally update the zip archive, and only update entries which changes.

The text was updated successfully, but these errors were encountered:

jakubsuchybio · 2019-02-12T09:13:12Z

This is a real pain. We use full .NET Framework and we have some really large zip files where we need to update small files inside, but because of this loading into memory, we are getting OutOfMemoryException, because we have 32b application and we cannot change into 64bit, because of dependent driver DLLs that are in 32b.

carlossanlop · 2019-12-16T22:35:14Z

Triage:
We should scope this to fix only loading entries that have changes.

Jlalond · 2020-01-17T18:42:00Z

@carlossanlop Hmm Looks like it's only writing files on creates/updates.

But it looks like it's still iterating over them all , I don't see any properties in ZipArchiveEntry to denote if it's been modified, I'm going to keep doing research once I get home, but I think this would be fun to pick up

IDisposable · 2023-04-07T15:26:25Z

Anyone working on this? I could pick it up if not.

ulrichb · 2023-11-07T17:15:43Z

Just found this ticket after creating a duplicate (#94455), and wanted to crosspost our use case where the current behavior is a big issue:

a) We're dealing with potentially large user-provided ZIP files (in the GB range),
b) we need to update entries (actually not directly but via System.IO.Packaging.ZipPackage for OPC file processing), and
c) the whole can happen in parallel.

This means with the current ZipArchive implementation we need to reserve dozens of GBs of virtual memory just for System.IO.Packaging.ZipPackage processing, otherwise we're risking container memory limit violations and therefore OOM exceptions.

edwardneal · 2025-01-24T22:38:38Z

With #102704 now merged, there's been some progress to improving this. The PR includes some infrastructure to track the type of changes made to an individual ZipArchiveEntry, and alters the way that entries are written to the output stream.

The present behaviour is:

If an entry is added to the archive and the archive is then disposed of, only the new entry (and the new central directory) is written.
If an existing entry's fixed-length metadata (e.g. the last write time) is modified, the entry headers are rewritten in-place. The entry contents are not.
If an existing entry's dynamic-length metadata (e.g. the filename) or contents are modified, that entry is written. Every entry which follows it in the archive is also written.
If an existing entry is deleted, every entry which follows it in the archive is written.

Appending an entry to a ZipArchive is now faster and uses less memory, particularly when appending to large archives. If a ZipArchive was opened in Update mode and never modified, this will no longer write to the output stream at all.

Making ad-hoc modifications to entries in ZipArchives now has less consistent performance, but it should still be faster on average. Deleting or modifying the first entry in the archive results in performance which is a modest improvement over .NET 9.0; doing the same thing to the last entry in the archive should be significantly faster than .NET 9.0 (because fewer entries are being rewritten out to the output stream.)

If we've got control over the source ZIP file, ZipArchive will perform best when the largest and the most frequently modified entries are placed at the end of the archive.

I had a few ideas while writing the PR, but most of these involve archive size/write speed/memory usage tradeoffs. I'm leaving them below in case anyone wants to develop them.

We currently rewrite the entry when dynamic-length metadata changes, because that metadata could otherwise overwrite the contents. This isn't guaranteed: if the total length of the dynamic-length metadata doesn't change, we could safely rewrite the header without touching the entry contents or the entries which follow.
When an entry's contents are modified, there's no guarantee that they'll get larger - that's just the safest assumption. If it shrinks, we could overwrite the now-free space; we wouldn't need to rewrite every entry which follows.
- We wouldn't want to do this if the total size of the following entries is greater than the amount of now-free space.
When an entry's contents do become larger, they might only intersect a few entries which follow it. In this situation, we might only need to rewrite a few of the following entries. There'll be a some now-free space between the end of the enlarged entry and the start of the first untouched entry which follows.
If someone deletes an entry in the ZipArchive, it might be more efficient to simply overwrite that space in the archive (marking it as free) and not rewrite the entries which follow.
If there's now free space in the ZipArchive and we create a new entry, (or move an existing one) then that entry might be able to fit in the free space.

carlossanlop transferred this issue from dotnet/corefx Jan 9, 2020

Dotnet-GitSync-Bot added area-System.IO.Compression untriaged New issue has not been triaged by the area owner labels Jan 9, 2020

carlossanlop added enhancement Product code improvement that does NOT require public API changes/additions help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jan 9, 2020

carlossanlop added this to the Future milestone Jan 9, 2020

carlossanlop mentioned this issue Oct 16, 2020

More helpful exception when compressing large files in seek mode #43542

Open

carlossanlop mentioned this issue Nov 18, 2021

ZipArchive: Apply strategy pattern depending on ZipArchiveMode #61820

Closed

carlossanlop mentioned this issue Dec 10, 2021

System.IO.Compression work planned for .NET 7 #62658

Closed

28 tasks

jeffhandley modified the milestones: Future, 7.0.0 Jan 9, 2022

jeffhandley modified the milestones: 7.0.0, 8.0.0 Jul 9, 2022

ViktorHofer modified the milestones: 8.0.0, Future Jul 12, 2023

ulrichb mentioned this issue Nov 7, 2023

ZipArchive in Update mode loads whole ZIP file into memory #94455

Closed

clauderichardgeotab mentioned this issue Jan 18, 2024

Large parts cannot be written on .NET Core due to OutOfMemoryException dotnet/Open-XML-SDK#807

Closed

edwardneal mentioned this issue May 26, 2024

Reduce memory usage when updating ZipArchives #102704

Merged

edwardneal mentioned this issue Dec 6, 2024

System.IO.Compression: ZipArchiveEntry always stores uncompressed data in memory #1544

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.IO.Compression: ZipArchive loads entire file in memory on .Dispose #1543

System.IO.Compression: ZipArchive loads entire file in memory on .Dispose #1543

qmfrederik commented Sep 13, 2016

jakubsuchybio commented Feb 12, 2019

carlossanlop commented Dec 16, 2019

Jlalond commented Jan 17, 2020 •

edited

Loading

IDisposable commented Apr 7, 2023

ulrichb commented Nov 7, 2023

edwardneal commented Jan 24, 2025

System.IO.Compression: ZipArchive loads entire file in memory on .Dispose #1543

System.IO.Compression: ZipArchive loads entire file in memory on .Dispose #1543

Comments

qmfrederik commented Sep 13, 2016

jakubsuchybio commented Feb 12, 2019

carlossanlop commented Dec 16, 2019

Jlalond commented Jan 17, 2020 • edited Loading

IDisposable commented Apr 7, 2023

ulrichb commented Nov 7, 2023

edwardneal commented Jan 24, 2025

Jlalond commented Jan 17, 2020 •

edited

Loading