Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--equal does not work on files in hidden folders #233

Closed
hungrywolf27 opened this issue Jun 6, 2017 · 8 comments
Closed

--equal does not work on files in hidden folders #233

hungrywolf27 opened this issue Jun 6, 2017 · 8 comments
Labels

Comments

@hungrywolf27
Copy link
Contributor

$ mkdir test1
$ mkdir .test2
$ cp ~/Downloads/rmlint-2.6.0.tar.gz test1/
$ cp ~/Downloads/rmlint-2.6.0.tar.gz .test2/
$ /usr/bin/rmlint --equal test1/rmlint-2.6.0.tar.gz .test2/rmlint-2.6.0.tar.gz 
$ echo $?
1
$ mv .test2/ test2/
$ /usr/bin/rmlint --equal test1/rmlint-2.6.0.tar.gz test2/rmlint-2.6.0.tar.gz 
$ echo $?
0

This causes problems, for example, if you use rmlint --hidden and then rmlint.sh -p.

@sahib
Copy link
Owner

sahib commented Jun 6, 2017

For now --equal includes --hidden, but that's not a proper fix since this introduces a slightly worse bug in the opposite direction (imagine running without --hidden and having two directories that differ in their hidden files only). We should probably select a fitting --equal commandline based on what was passed to the first run of rmlint. This will likely also affect other options that affect the traversal like --size and so on... 😢

@sahib sahib added the bug label Jun 6, 2017
@hungrywolf27
Copy link
Contributor Author

I'm just thinking out loud here: if all that's necessary is to compare two individual files, wouldn't it be simpler and cleaner for rmlint.sh -p to call cmp (which it did before 2.6.0) instead of invoking rmlint again? rmlint.sh already relies on standard utilities like ls and rm for basic functions, so why not do so in this case?

@sahib
Copy link
Owner

sahib commented Jun 7, 2017

Sure, I'd love to outsource this work to some other tool. 😄

The reason why I changed that: cmp does not work for directories, but --equal does.
Since we have duplicate directories we need something that is able to compare directories in the same way rmlint finds them. And no, we can't just compare all files of the directory with each other: We
do not care by default what path layout the directories have (and it would be slower...).

Regarding the problem I saw yesterday: I forgot that -D / --merge-directories has it's own traversal logic, independent of --hidden. So, it will not report duplicate directories that differ in a single byte or directories that do not completely consist of duplicates (--hidden will skip some duplicates, that's why your original bug happened).

So, if there is an entry with a duplicate directory in the shell script, it should be really completely equal. Thus using --hidden always when --equal is enabled should be fine. Same for most other options.

Some specialties like -see-symlinks and --followlinks might be a problem though since those actually alter the behaviour of treemerge. I still need to invest some thought on that but overall your original issue should be already fixed and the remaining issue should be fixable.

@hungrywolf27
Copy link
Contributor Author

hungrywolf27 commented Jun 7, 2017

I've just compiled from 6bd739b on the develop branch, and the problem is still there.

I should have also mentioned before that using --hidden made no difference for this issue.

$ mkdir test1
$ mkdir .test2
$ cp ~/Downloads/rmlint-2.6.0.tar.gz test1/
$ cp ~/Downloads/rmlint-2.6.0.tar.gz .test2/
$ rmlint -kmr /home/me/test1/ // /home/me/.test2/

[...]
==> In total 2 files, whereof 1 are duplicates in 1 groups.
==> This equals 2.35 MB of duplicates which could be removed.
==> Scanning took in total 0.066s. Is that good enough?
[...]

$ ./rmlint.sh -np
 # //////////////////////////////////////////////////////////// 
 # ///  This is only a dry run; nothing will be modified!   /// 
 # //////////////////////////////////////////////////////////// 

[   0%]  Keeping:   /home/me/.test2/rmlint-2.6.0.tar.gz
[  50%]  Deleting:  /home/me/test1/rmlint-2.6.0.tar.gz
 ^^^^^^ Error: files no longer identical - cancelling..... 
[ 100%]  Done! 

$ rmlint --equal --hidden test1/rmlint-2.6.0.tar.gz .test2/rmlint-2.6.0.tar.gz 
$ echo $?
1
$ rmlint --version
version 2.6.0 compiled: Jun  7 2017 at [09:12:38] "Penetrating Pineapple" (rev 6bd739b)

@sahib
Copy link
Owner

sahib commented Jun 7, 2017

Ah, thanks for testing. Then my shot-in-the-blue-fix did not work out.
I will have a look into that again later. Sorry for the inconvenience.

@sahib
Copy link
Owner

sahib commented Jun 8, 2017

This should be fixed as of 7d9a216.
(Will leave open until I ruled out if there are any other problem cases)

sahib added a commit that referenced this issue Jun 8, 2017
@hungrywolf27
Copy link
Contributor Author

hungrywolf27 commented Jun 8, 2017

Thanks, this particular case appears to be fixed, but now see #234! 😄

@sahib sahib closed this as completed Jun 12, 2017
@sahib sahib reopened this Jun 12, 2017
@sahib
Copy link
Owner

sahib commented Jun 12, 2017

I added some special safety net for some special corner cases in 187d7b5. This should round this issue up since special cases like --followlinks / --no-followlinks, --no-hardlinked and --honour-dir-layout are handled. These options affected the way --merge-directories will find duplicate directories and thus may have an effect on --equal. To make @hungrywolf27 happy cmp is used now again for regular files since it's faster than rmlint --equal (no surprise here...).

If there are no complaints by tomorrow, I'll close this issue tomorrow.

@sahib sahib closed this as completed Jun 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants