Skip to content

Commit

Permalink
Add atomic save to checkpoint routine (#20011)
Browse files Browse the repository at this point in the history
  • Loading branch information
corwinjoy authored Jun 27, 2024
1 parent 3f69134 commit 967413a
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 3 deletions.
2 changes: 1 addition & 1 deletion src/lightning/fabric/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

-
- Made saving non-distributed checkpoints fully atomic ([#20011](https://github.com/Lightning-AI/pytorch-lightning/pull/20011))

-

Expand Down
5 changes: 4 additions & 1 deletion src/lightning/fabric/utilities/cloud_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,10 @@ def _atomic_save(checkpoint: Dict[str, Any], filepath: Union[str, Path]) -> None
bytesbuffer = io.BytesIO()
log.debug(f"Saving checkpoint: {filepath}")
torch.save(checkpoint, bytesbuffer)
with fsspec.open(filepath, "wb") as f:

# We use a transaction here to avoid file corruption if the save gets interrupted
fs, urlpath = fsspec.core.url_to_fs(str(filepath))
with fs.transaction, fs.open(urlpath, "wb") as f:
f.write(bytesbuffer.getvalue())


Expand Down
2 changes: 1 addition & 1 deletion src/lightning/pytorch/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

-
- Made saving non-distributed checkpoints fully atomic ([#20011](https://github.com/Lightning-AI/pytorch-lightning/pull/20011))

-

Expand Down

0 comments on commit 967413a

Please sign in to comment.