Commit Graph

23 Commits

Author SHA1 Message Date
Saksham Khanna 4d8ef7bca7
cmd/dedupe: make largest directory primary to minimize data moved (#3648)
This change makes dedupe recursively count elements in same-named directories
and make the largest one primary. This allows to minimize the amount of data
moved (or at least the amount of API calls) when dedupe merges them.
It also adds a new fs.Object interface `ParentIDer` with function `ParentID` and
implements it for the drive and opendrive backends. This function returns
parent directory ID for objects on filesystems that allow same-named dirs.
We use it to correctly count sizes of same-named directories.

Fixes #2568

Co-authored-by: Ivan Andreev <ivandeex@gmail.com>
2021-03-11 20:40:29 +03:00
Nick Craig-Wood 86014cebd7 dedupe: add --dedupe-mode list to just list dupes, changing nothing 2020-12-02 16:52:12 +00:00
Nick Craig-Wood 507f861c67 dedupe: add --by-hash to dedupe on hash not file name - fixes #1674 2020-12-02 16:52:12 +00:00
Nick Craig-Wood 2e21c58e6a fs: deglobalise the config #4685
This is done by making fs.Config private and attaching it to the
context instead.

The Config should be obtained with fs.GetConfig and fs.AddConfig
should be used to get a new mutable config that can be changed.
2020-11-26 16:40:12 +00:00
Nick Craig-Wood 62f0bbb598 dedupe: Make it obey the --size-only flag for duplicate detection #4321 2020-07-28 11:40:37 +01:00
Nick Craig-Wood a3f6fe5287 dedupe: fix logging to be easier to understand #4321 2020-06-16 11:41:56 +01:00
Nick Craig-Wood b23cf58a41 operations: Add skip all, do all, quit operations to --interactive - fixes #3886
This also adds SkipDestructive into all the remaing places --dry-run
was used and adds documentation.
2020-06-10 12:33:53 +01:00
fishbullet ba5eb230fb operations: interactive mode -i/--interactive for destructive operations #3886 2020-06-10 12:33:53 +01:00
Nick Craig-Wood 5fa6a28f70 dedupe: Stop dedupe deleting files with identical IDs #4013
Before this change if there were two files with the same name and the
same ID in the same directory, dedupe would delete one of them but
since these are actually the same file (with the same ID) then both
files would be deleted leading to data loss.

This should never actually happen, however it did happen as part of a
bug introduced in rclone which was fixed by

dfc7215bf9 drive: fix duplicate items when using --drive-shared-with-me #4018

This change checks to see if any of the duplicates have the same ID
and if they do it refuses to delete them.
2020-03-31 17:28:26 +01:00
Nick Craig-Wood 81002747c5 dedupe: implement keep smallest too
This is to help deduping google docs and their exported versions if
they accidentally get uploaded to the source again.

See: https://forum.rclone.org/t/my-stupidity-or-a-bug/13861
2020-01-17 13:08:37 +00:00
Nick Craig-Wood 11f501bd44 operations: move interface assertion to tests to remove pflag dependency #3792 2020-01-09 13:25:37 +00:00
SezalAgrawal c3751e9a50 operations: fix dedupe continuing on errors like insufficientFilePermisson - fixes #3470
* Fix dedupe on merge continuing on errors like insufficientFilePermisson
* Sorted the directories to remove recursion logic
2019-11-26 10:58:52 +00:00
Ankur Gupta 75a6c49f87 Fix error counter - fixes #3650
For few commands, RClone counts a error multiple times. This was fixed by
creating a new error type which keeps a flag to remember if the error has
already been counted or not. The CountError function now wraps the original
error eith the above new error type and returns it.
2019-11-18 14:13:02 +00:00
SezalAgrawal 7712b780ba operations: display 'Deleted X extra copies' only if dedupe successful - fixes #3551 2019-10-08 16:35:53 +01:00
Sezal Agrawal 5c2dfeee46 operations: display 'All duplicates removed' only if dedupe successful -fixes rclone#3550 2019-10-08 16:34:13 +01:00
Nick Craig-Wood 57d5de6fba build: fix up package paths after repo move
git grep -l github.com/ncw/rclone | xargs -d'\n' perl -i~ -lpe 's|github.com/ncw/rclone|github.com/rclone/rclone|g'
goimports -w `find . -name \*.go`
2019-07-28 18:47:38 +01:00
Aleksandar Jankovic f78cd1e043 Add context propagation to rclone
- Change rclone/fs interfaces to accept context.Context
- Update interface implementations to use context.Context
- Change top level usage to propagate context to lover level functions

Context propagation is needed for stopping transfers and passing other
request-scoped values.
2019-06-19 11:59:46 +01:00
Nick Craig-Wood 14ef4437e5 dedupe: fix bug introduced when converting to use walk.ListR #2902
Before the fix we were only de-duping the ListR batches.

Afterwards we dedupe everything.

This will have the consequence that rclone uses more memory as it will
build a map of all the directory names, not just the names in a given
directory.
2019-03-17 11:01:20 +00:00
Nick Craig-Wood 4376019062 dedupe: Use walk.ListR for listing commands.
This dramatically increases the speed (7x in my tests) of the de-dupe
as google drive supports ListR directly and dedupe did not work with
`--fast-list`.

Fixes #2902
2019-03-16 17:41:12 +00:00
ssaqua 3d81b75f44 dedupe: check for existing filename before renaming a dupe file 2018-11-02 16:51:52 +00:00
Nick Craig-Wood cb5bd47e61 build: fix errors spotted by ineffassign linter
These were mostly caused by shadowing err and a good fraction of them
will have caused errors not to be propagated properly.
2018-05-05 17:32:41 +01:00
Richard Yang a81ec00a8c dedupe: Add dedupe largest functionality - fixes #2269 2018-04-26 16:21:07 +01:00
Nick Craig-Wood d97fe3b824 fs/operations: make dedupe work with mega
* factor into its own files
  * remove assumptions about having a given hash type
  * make tests work if the remote has no hash
2018-04-13 13:23:55 +01:00