s3: update docs with a Reducing Costs section - Fixes #2889

2020-11-26 15:00:10 +00:00 · 2020-11-26 15:00:10 +00:00 · 506342317b
parent 979bb07c86
commit 506342317b
1 changed files with 81 additions and 19 deletions
--- a/docs/content/s3.md
+++ b/docs/content/s3.md
@ -248,25 +248,6 @@ d) Delete this remote
 y/e/d> 
 ```

-### --fast-list ###
-
-This remote supports `--fast-list` which allows you to use fewer
-transactions in exchange for more memory. See the [rclone
-docs](/docs/#fast-list) for more details.
-
-### --update and --use-server-modtime ###
-
-As noted below, the modified time is stored on metadata on the object. It is
-used by default for all operations that require checking the time a file was
-last updated. It allows rclone to treat the remote more like a true filesystem,
-but it is inefficient because it requires an extra API call to retrieve the
-metadata.
-
-For many operations, the time the object was last uploaded to the remote is
-sufficient to determine if it is "dirty". By using `--update` along with
-`--use-server-modtime`, you can avoid the extra API call and simply upload
-files whose local modtime is newer than the time it was last uploaded.
-
 ### Modified time ###

 The modified time is stored as metadata on the object as
@ -280,6 +261,87 @@ storage the object will be uploaded rather than copied.
 Note that reading this from the object takes an additional `HEAD`
 request as the metadata isn't returned in object listings.

+### Reducing costs
+
+#### Avoiding HEAD requests to read the modification time
+
+By default rclone will use the modification time of objects stored in
+S3 for syncing.  This is stored in object metadata which unfortunately
+takes an extra HEAD request to read which can be expensive (in time
+and money).
+
+The modification time is used by default for all operations that
+require checking the time a file was last updated. It allows rclone to
+treat the remote more like a true filesystem, but it is inefficient on
+S3 because it requires an extra API call to retrieve the metadata.
+
+The extra API calls can be avoided when syncing (using `rclone sync`
+or `rclone copy`) in a few different ways, each with its own
+tradeoffs.
+
+- `--size-only`
+    - Only checks the size of files.
+    - Uses no extra transactions.
+    - If the file doesn't change size then rclone won't detect it has
+      changed.
+    - `rclone sync --size-only /path/to/source s3:bucket`
+- `--checksum`
+    - Checks the size and MD5 checksum of files.
+    - Uses no extra transactions.
+    - The most accurate detection of changes possible.
+    - Will cause the source to read an MD5 checksum which, if it is a
+      local disk, will cause lots of disk activity.
+    - If the source and destination are both S3 this is the
+      **recommended** flag to use for maximum efficiency.
+    - `rclone sync --checksum /path/to/source s3:bucket`
+- `--update --use-server-modtime`
+    - Uses no extra transactions.
+    - Modification time becomes the time the object was uploaded.
+    - For many operations this is sufficient to determine if it needs
+      uploading.
+    - Using `--update` along with `--use-server-modtime`, avoids the
+      extra API call and uploads files whose local modification time
+      is newer than the time it was last uploaded.
+    - Files created with timestamps in the past will be missed by the sync.
+    - `rclone sync --update --use-server-modtime /path/to/source s3:bucket`
+
+These flags can and should be used in combination with `--fast-list` -
+see below.
+
+If using `rclone mount` or any command using the VFS (eg `rclone
+serve`) commands then you might want to consider using the VFS flag
+`--no-modtime` which will stop rclone reading the modification time
+for every object. You could also use `--use-server-modtime` if you are
+happy with the modification times of the objects being the time of
+upload.
+
+#### Avoiding GET requests to read directory listings
+
+Rclone's default directory traversal is to process each directory
+individually.  This takes one API call per directory.  Using the
+`--fast-list` flag will read all info about the the objects into
+memory first using a smaller number of API calls (one per 1000
+objects). See the [rclone docs](/docs/#fast-list) for more details.
+
+    rclone sync --fast-list --checksum /path/to/source s3:bucket
+
+`--fast-list` trades off API transactions for memory use. As a rough
+guide rclone uses 1k of memory per object stored, so using
+`--fast-list` on a sync of a million objects will use roughly 1 GB of
+RAM.
+
+If you are only copying a small number of files into a big repository
+then using `--no-traverse` is a good idea. This finds objects directly
+instead of through directory listings. You can do a "top-up" sync very
+cheaply by using `--max-age` and `--no-traverse` to copy only recent
+files, eg
+
+    rclone copy --min-age 24h --no-traverse /path/to/source s3:bucket
+
+You'd then do a full `rclone sync` less often.
+
+Note that `--fast-list` isn't required in the top-up sync.
+
 ### Hashes ###

 For small objects which weren't uploaded as multipart uploads (objects