Skip to content

Updating existing files in an ADF image #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
t-w opened this issue Jan 11, 2023 · 22 comments
Closed

Updating existing files in an ADF image #30

t-w opened this issue Jan 11, 2023 · 22 comments

Comments

@t-w
Copy link
Collaborator

t-w commented Jan 11, 2023

I have started working on write support for fuseadf and have just found out that the library does not support updating existing files in ADF image. The write support modes (for adfOpenFile()) are:

  • "w" - only creating new files
  • "a" - only appending to existing files

This is probably not the simplest feature to implement but a pretty crucial one (eg. at the moment, there is no reasonable way to implement write support in a filesystem using ADFlib) so it would be good to add it.

@lclevy
Copy link
Collaborator

lclevy commented Jan 11, 2023

Hi, nice idea!

  • Truncate mode is the hardest. Maybe implement truncate first, then it is like write at the end
  • write in middle is easier : like read but remplace data

@lclevy
Copy link
Collaborator

lclevy commented Jan 11, 2023

About cve, I do not know how it is been fixed by Debian, maybe compiler memory safety options

@kyz
Copy link
Collaborator

kyz commented Jan 12, 2023

Hi Laurent, what did you mean by Debian CVEs? There haven't been any since 2016.

@lclevy
Copy link
Collaborator

lclevy commented Jan 12, 2023

How is it fixed? Code patch or makefile patch?

@lclevy
Copy link
Collaborator

lclevy commented Jan 12, 2023

Cve Listed in readme.md

@kyz
Copy link
Collaborator

kyz commented Jan 12, 2023

  • CVE-2016-1243 and CVE-2016-1244 were fixed by 8e973d7 (code change in unadf.c)
  • I fixed arbitrary directory traversal (I didn't raise a CVE) in 4ce14b2 lines 450-455
  • the two other crashing bugs listed above don't have CVEs, but should be checked to see if all the recent code changes by t-w have fixed them

@lclevy
Copy link
Collaborator

lclevy commented Jan 12, 2023

Thanks for reminding this Stuart!

@t-w
Copy link
Collaborator Author

t-w commented Jan 12, 2023

I briefly looked at writing files, it seems to me that:

  1. the existing append functionality does not work (and there is no test for it... I have just modified the overfilling test to check it)
  2. creating a 0-lenght file (just opening for writing and closing) seems problematic and buggy:
    • if no write operation is performed, there is no data block written
    • opening such file (any file) expects possibility to read the first data block and since the file header point to block/sector 0 - it probably reads the block 0 of the volume as the first data block. Not a big problem for reading, but for writing/appending... Volume becomes corrupted immediately (block 0 is overwritten!).

So basically at the moment writing files seems to work properly only if a new file is created, and immediately (without closing and reopening) written and closed. (write support should rather be considered experimental until fixed...)

Some of these could be caused eg. by my changes in the seek code - but since there was no test how could I know...
If the things above were working for anyone before let me know - there will be something to compare with eventually.

Concerning the 2nd point - does anyone know how exactly such an empty file looks like on Amiga filesystem? I mean, if it is a 0-length file does it have any data blocks allocated (as current code seems to expect) or there is only a file header block? And if there is only file header block, how this lack of data blocks is (or should be) indicated in file header block (struct bFileHeaderBlock)? As NULL-pointer in firstData or dataBlocks[], or only 0 in dataSize?

@lclevy
Copy link
Collaborator

lclevy commented Jan 12, 2023

I totally agree about limitations of current code.
The ground truth is how Amiga is working.
So create such case and see what happens, using an emulator and csh in df0: and test adf in df1:

@lclevy
Copy link
Collaborator

lclevy commented Mar 26, 2023

Hummm, is adflib or amigados vulnerable to this? https://twitter.com/David3141593/status/1639785897657769986

@kyz
Copy link
Collaborator

kyz commented Mar 29, 2023

Almost every filesystem is "vulnerable" to you modifying a file (or even deleting it), and it being possible to recover all or part of the old file, because most filesystems do not blank out freed data blocks. It's desirable that they don't - faster, less disk wear, possible to recover files.

But no, this is not a "vulnerability" as it's expected behaviour for filesystems. Whereas if you are cropping an image and saving it, it is a vulnerability that cropped parts of the image remain in the file.

If someone wants to ensure their files are unrecoverable, even to forensic investigators, there are tools like srm which overwrite data blocks before deletion.

If someone wants to create a disk image with nothing but the expected files, they should do their editing somewhere else, format a new disk and copy the files over.

@kyz
Copy link
Collaborator

kyz commented Mar 29, 2023

The root cause of the "acropalypse" is that Google copied the C idea of fopen(fname, "w"), i.e. "file mode" string saying how you want access the file. The familiar C standard they copied the idea from is very clear that "w" is for "writing", which always truncates the file. You use "r+" if you want to read/write a file without truncating it.

Then, some idiot at Google decided in Android 10 (released 2019) that "w" should not truncate any more, and you have to write "wt" if you want the standard behaviour. "w" was standard since Android was created, and is the familiar standard in other languages like C and Python. BREAKING. API. CHANGE. And they didn't document it until 2021, when someone filed a bug to them noting the unannounced change. UNDOCUMENTED. BREAKING. API. CHANGE. Possibly every Android app, ever, is affected when running on Android 10 or later. And one such app affected is Google's flagship screenshot/cropping app! Yes, even Google didn't know that Google silently broke their API.

So perhaps to avoid this in ADFLib, write in the API documentation that adfOpenFile(vol, fname, mode) follows the familiar semantics of the C fopen() call (which means it supports modes "r", "r+", "w", "w+", "a", "a+", each with their defined semantics in the C standard)

@kyz
Copy link
Collaborator

kyz commented Mar 29, 2023

Concerning the 2nd point - does anyone know how exactly such an empty file looks like on Amiga filesystem? I mean, if it is a 0-length file does it have any data blocks allocated (as current code seems to expect) or there is only a file header block? And if there is only file header block, how this lack of data blocks is (or should be) indicated in file header block (struct bFileHeaderBlock)? As NULL-pointer in firstData or dataBlocks[], or only 0 in dataSize?

To create an empty file, there is a file header block allocated, but there are no data blocks allocated. The file header looks like this:

  • highSeq=0 (no dataBlocks[] are used)
  • firstData=0
  • all dataBlocks[] entries are zero
  • byteSize=0

When adfWriteFile() is called, it might require extending the file to use more datablocks:

  1. allocate block(s) and mark them as used in the bitmap
  2. write the data to those blocks
  3. update the file header:
    1. append to dataBlocks[] array
    2. set highSeq to the number of entries in the dataBlocks array
    3. set firstData to 0 if highSeq is 0, otherwise the first dataBlocks entry
    4. if needed: allocate/write/update file extension block, more data blocks, set extension field, repeat...
    5. update byteSize, last modified time...

I haven't check the Amiga code for extension blocks, so I don't know if it's a better strategy to allocate blocks in a row like HEADER EXTENSION EXTENSION DATA DATA DATA DATA .. or like HEADER DATA DATA DATA EXTENSION DATA DATA DATA EXTENSION DATA DATA DATA ...

When adfCloseFile() is called, let's assume that adfOpenFile(..., "w") means the standard C behaviour that you truncate the file to how many bytes were written. If what was written is shorter than the file's original length, this means truncating the file:

  1. update byteSize, last modified time...
  2. Disconnect as many complete file extension blocks as possible, by setting the extension field to 0 on the final header/extension block that is still needed
  3. On the final needed header (whether extension or normal header), update dataBlocks[] if needed, setting unused blocks to zero, and update highSeq for that block
  4. Update firstData to 0 if the file header block's highSeq is now 0
  5. update the bitmap with all the blocks you freed

@t-w
Copy link
Collaborator Author

t-w commented Apr 3, 2023

Thanks for the info.

So far any file truncation is not implemented (even in the update I am preparing). But it seems like possibly next step to do so your info will be helpful in that.

@kyz
Copy link
Collaborator

kyz commented Apr 4, 2023

* crashing bug 1: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=716419

... was that the command line unadf -s - caused a segfault in unadf up to commit 8e973d7, it was fixed by commit 4ce14b2

* crashing bug 2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862740

... is still a crashing bug, even today in master and in https://github.com/t-w/ADFlib/tree/devel

adfGetCacheEntry() in src/adf_cache.c copies a cache block entry.

  • it does not validate that the name length is between 1 and 30
  • iit does not validate that the comment length is between 0 and (whatever the upper length of a comment is)
  • it does not validate that either of these fit in the disk block

It then passes these lengths direct to memcpy().

The example file attached to the debian bug has a cache entry with name length -117 and comment length -1, e.g. 0xFFFFFFFF when cast to size_t

$ LD_LIBRARY_PATH=./src/.libs gdb ./examples/.libs/unadf 
...
(gdb) run -lc crash.adf
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fb1ac6 in memcpy (__len=18446744073709551499, __src=0x7fffffffe0e0, __dest=0x7fffffffde35) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29
29	  return __builtin___memcpy_chk (__dest, __src, __len,
(gdb) bt
#0  0x00007ffff7fb1ac6 in memcpy (__len=18446744073709551499, __src=0x7fffffffe0e0, __dest=0x7fffffffde35)
    at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29
#1  adfGetCacheEntry (dirc=dirc@entry=0x7fffffffe0b0, p=p@entry=0x7fffffffde1c, cEntry=cEntry@entry=0x7fffffffde20) at adf_cache.c:170
#2  0x00007ffff7fb213d in adfGetDirEntCache (vol=0x55555555c500, dir=<optimized out>, recurs=0) at adf_cache.c:87
...
(gdb) print *((struct CacheEntry *)0x7fffffffde20)
$2 = {header = 1146050438, size = 1146050439, protect = 1146050442, days = 17487, mins = 22410, ticks = 17487, type = 87 'W', nLen = -117 '\213', 
  cLen = -1 '\377', name = "DOW\216DOW\217DOW\216DOW\217DOW\222DOW\223DOW\222DOW", 
  comm = "\223DOW\226DOW\227DOW\226DOW\227DOW\232DOW\233DOW\232DOW\233DOW\236DOW\237DOW\236DOW\237DOW\242DOW\243DOW\242DOW\243DOW\246DOW\247DOW\246DOW"}

@t-w
Copy link
Collaborator Author

t-w commented Apr 10, 2023

@kyz, thanks for the info about the security bug you mention above (in adfGetCacheEntry()), however, it probably should go to a separated issue. I rather expect that there may be more things like this in misc. places, some code seems unfinished/experimental. Concerning dir. cache, I was not checking anything there, except the interface and eventual changes done elsewhere.

This issue is focused rather on making the write support working. So please, go ahead and create a separated issue (as for other matters, if there are such). It will be easier to check topic by topic.

@t-w
Copy link
Collaborator Author

t-w commented Apr 10, 2023

@kyz, concerning adfOpenFile() (btw. currently adfFileOpen()) operating like fopen() with r/r+/w/w+/ etc. - IMHO this would be a bit overshot for ADFlib, at least at this stage. I am wondering even whether "a" (append) makes here any sense - the only thing it is doing is seeking to the EOF. I think it could be removed from the library, leaving only "r" and "w" (which differ only that in "w" a file, besides being read, can be also (over)written). Nothing else is really needed. Seeking to the end of the file can be done in the client code.

Implementing full POSIX-like interface for opening file... seems overcomplicated. fopen() like that includes several other basic operations: seek, truncate. I think in the library they should be implemented as a separated functions and opening a file should only distinguish between "read" and "read-write", and open just on the beginning of the file. Having that, you can implement everything else on top of that, if needed (ag. seek to the EOF for append, or truncate to whatever place).

For pure filesystem interface, it may make more sense (to give client code a high level interface). For a library like ADFlib, which gives a rather low-level interface to the ADF devices/volumes/files, I would rather propose to keep it as simple as possible, at least until all needed basic things are implemented (for now, truncate is missing, so eg. "w" in POSIX way cannot be added).

@kyz
Copy link
Collaborator

kyz commented Apr 11, 2023

@kyz, thanks for the info about the security bug you mention above (in adfGetCacheEntry()), however, it probably should go to a separated issue.

I agree it's a separate issue, I was just responding to Laurent's comment

As the bug already has a Debian bug entry, I don't see the need for duplicating it in Github, so I simply added a pull request referencing the Debian bug, and with a fix: #35

@kyz
Copy link
Collaborator

kyz commented Apr 11, 2023

@kyz, concerning adfOpenFile() (btw. currently adfFileOpen()) operating like fopen() with r/r+/w/w+/ etc. - IMHO this would be a bit overshot for ADFlib, at least at this stage.

A filesystem that can "update files" should be able to truncate them, extend them, seek within them and read and write parts of them. But I also recognize that's a lot of work, and you've already done a huge amount.

It's OK if file update support is only partially implemented. APIs can return error codes where specific operations aren't implemented yet

My comments here for the future, while also wearing a security hat. What is good API design? It is when the API does what you expect and does not surprise you

The aCropalypse bug, where millions of people might have overshared sensitive information for years, is all due to a change of API design that went against expectations and surprised programmers

It's my argument that any programmer on the planet, who has never seen ADFLib, but has seen C, Python, etc., when they want to update files in an ADF image, they will see look at the docs, they will see, adfOpenFile, adfCloseFile, adfReadFile, adfWriteFile, adfFileSeek, and they will expect these functions work like fopen, fclose, fread, fwrite, fseek from C... or that they work like open, close, read, write, seek from Unix... or that they work like CreateFile, CloseHandle, ReadFile, WriteFile, SetFilePointer from Win32... or that they work like Open, Close, Read, Write, Seek from AmigaDOS...

When looking at adfOpenFile(), they will see it takes a string as an access mode. So they will immediately think that this works like fopen() in C. We shoudn't surprise them

Seeking to the end of the file can be done in the client code.

Implementing full POSIX-like interface for opening file... seems overcomplicated. fopen() like that includes several other basic operations: seek, truncate.

The library can call its own functions to do that for the client. That's much simpler than finding bugs where the programmer has misunderstood the API because it goes against expectations

I would recommend designing the API now, being happy it is coherent, and then filling in the implementation, rather than change the API to match the current state of implementation. And don't ask clients to work around missing implementation and then change the API later when it's implemented and break them again

"w" in POSIX way cannot be added.

Think about the future. If "w", which implies truncate, can't be done because the code's not there, ADFLib should allow only "r" (read-only) and "r+" (open for read and write) and return an error for any other mode. This would be the least surprising API and requires no extra code today, just coherent API design

The alternative is that "w" means "truncate" everywhere, except ADFLib where it means "no truncate", for today, and then later when ADFLib gains the ability to truncate files, either we change "w" from "no truncate" to "truncate" (which is a breaking API change), or we decide to keep ADFLib backwards-compatible with itself and remain unlike fopen() forever, perhaps adopting "wt" to mean what everyone else uses "w" to mean, and therefore the ADFLib API remains surprising and unexpected and a source of bugs due to mismatched developer expectations

@lclevy
Copy link
Collaborator

lclevy commented Apr 11, 2023

Yes, I suggest to keep adflib simple. Even for windows NTFS, you have a separate tool for data sanitisation : sdelete from sysinternals. R, w and a seems enough for adflib. Sdelete / secure delete could be offered as a separate cli tool, as well as undelete, but not via os like API

@t-w
Copy link
Collaborator Author

t-w commented Apr 11, 2023

@kyz, while you do have a point, the discussion about API design is a separated issue, while this issue is just about fixing/implementing a very basic functionality possibly within the existing API design, not about changing it.
So please, start a separated issue, put the proposition there so it can be reviewed / discussed etc.

In the meantime, please have a look at #34 (test if possible), so that we can close that part and go on.

The next things to do that I would foresee (rather than an extensive discussion about details to change in the API):

These are internals, not the external API (which can be discussed separately).

@t-w
Copy link
Collaborator Author

t-w commented Jun 26, 2023

This is done (eventually to improve, see TODO ).

@t-w t-w closed this as completed Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants