-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.IO.Compression: ZipArchive loads entire file in memory on .Dispose #1543
Comments
This is a real pain. We use full .NET Framework and we have some really large zip files where we need to update small files inside, but because of this loading into memory, we are getting OutOfMemoryException, because we have 32b application and we cannot change into 64bit, because of dependent driver DLLs that are in 32b. |
Triage: |
@carlossanlop Hmm Looks like it's only writing files on creates/updates. But it looks like it's still iterating over them all , I don't see any properties in ZipArchiveEntry to denote if it's been modified, I'm going to keep doing research once I get home, but I think this would be fun to pick up |
Anyone working on this? I could pick it up if not. |
Just found this ticket after creating a duplicate (#94455), and wanted to crosspost our use case where the current behavior is a big issue: a) We're dealing with potentially large user-provided ZIP files (in the GB range), This means with the current |
With #102704 now merged, there's been some progress to improving this. The PR includes some infrastructure to track the type of changes made to an individual ZipArchiveEntry, and alters the way that entries are written to the output stream. The present behaviour is:
Appending an entry to a ZipArchive is now faster and uses less memory, particularly when appending to large archives. If a ZipArchive was opened in Update mode and never modified, this will no longer write to the output stream at all. Making ad-hoc modifications to entries in ZipArchives now has less consistent performance, but it should still be faster on average. Deleting or modifying the first entry in the archive results in performance which is a modest improvement over .NET 9.0; doing the same thing to the last entry in the archive should be significantly faster than .NET 9.0 (because fewer entries are being rewritten out to the output stream.) If we've got control over the source ZIP file, ZipArchive will perform best when the largest and the most frequently modified entries are placed at the end of the archive. I had a few ideas while writing the PR, but most of these involve archive size/write speed/memory usage tradeoffs. I'm leaving them below in case anyone wants to develop them.
|
When you open a
ZipArchive
in Update mode, the entire zip file will be loaded in memory when the.Dispose
method is invoked.This is because
.Dipose
calls.WriteFile
, which:LoadLocalHeaderExtraFieldAndCompressedBytesIfNeeded
for all entries, which loads the compressed data into memory for those entries_archiveStream.SetLength(0);
As a result:
An alternative may be to incrementally update the zip archive, and only update entries which changes.
The text was updated successfully, but these errors were encountered: