s3: update docs with a Reducing Costs section - Fixes #2889

This commit is contained in:
Nick Craig-Wood 2020-11-26 15:00:10 +00:00
parent 979bb07c86
commit 506342317b
1 changed files with 81 additions and 19 deletions

View File

@ -248,25 +248,6 @@ d) Delete this remote
y/e/d>
```
### --fast-list ###
This remote supports `--fast-list` which allows you to use fewer
transactions in exchange for more memory. See the [rclone
docs](/docs/#fast-list) for more details.
### --update and --use-server-modtime ###
As noted below, the modified time is stored on metadata on the object. It is
used by default for all operations that require checking the time a file was
last updated. It allows rclone to treat the remote more like a true filesystem,
but it is inefficient because it requires an extra API call to retrieve the
metadata.
For many operations, the time the object was last uploaded to the remote is
sufficient to determine if it is "dirty". By using `--update` along with
`--use-server-modtime`, you can avoid the extra API call and simply upload
files whose local modtime is newer than the time it was last uploaded.
### Modified time ###
The modified time is stored as metadata on the object as
@ -280,6 +261,87 @@ storage the object will be uploaded rather than copied.
Note that reading this from the object takes an additional `HEAD`
request as the metadata isn't returned in object listings.
### Reducing costs
#### Avoiding HEAD requests to read the modification time
By default rclone will use the modification time of objects stored in
S3 for syncing. This is stored in object metadata which unfortunately
takes an extra HEAD request to read which can be expensive (in time
and money).
The modification time is used by default for all operations that
require checking the time a file was last updated. It allows rclone to
treat the remote more like a true filesystem, but it is inefficient on
S3 because it requires an extra API call to retrieve the metadata.
The extra API calls can be avoided when syncing (using `rclone sync`
or `rclone copy`) in a few different ways, each with its own
tradeoffs.
- `--size-only`
- Only checks the size of files.
- Uses no extra transactions.
- If the file doesn't change size then rclone won't detect it has
changed.
- `rclone sync --size-only /path/to/source s3:bucket`
- `--checksum`
- Checks the size and MD5 checksum of files.
- Uses no extra transactions.
- The most accurate detection of changes possible.
- Will cause the source to read an MD5 checksum which, if it is a
local disk, will cause lots of disk activity.
- If the source and destination are both S3 this is the
**recommended** flag to use for maximum efficiency.
- `rclone sync --checksum /path/to/source s3:bucket`
- `--update --use-server-modtime`
- Uses no extra transactions.
- Modification time becomes the time the object was uploaded.
- For many operations this is sufficient to determine if it needs
uploading.
- Using `--update` along with `--use-server-modtime`, avoids the
extra API call and uploads files whose local modification time
is newer than the time it was last uploaded.
- Files created with timestamps in the past will be missed by the sync.
- `rclone sync --update --use-server-modtime /path/to/source s3:bucket`
These flags can and should be used in combination with `--fast-list` -
see below.
If using `rclone mount` or any command using the VFS (eg `rclone
serve`) commands then you might want to consider using the VFS flag
`--no-modtime` which will stop rclone reading the modification time
for every object. You could also use `--use-server-modtime` if you are
happy with the modification times of the objects being the time of
upload.
#### Avoiding GET requests to read directory listings
Rclone's default directory traversal is to process each directory
individually. This takes one API call per directory. Using the
`--fast-list` flag will read all info about the the objects into
memory first using a smaller number of API calls (one per 1000
objects). See the [rclone docs](/docs/#fast-list) for more details.
rclone sync --fast-list --checksum /path/to/source s3:bucket
`--fast-list` trades off API transactions for memory use. As a rough
guide rclone uses 1k of memory per object stored, so using
`--fast-list` on a sync of a million objects will use roughly 1 GB of
RAM.
If you are only copying a small number of files into a big repository
then using `--no-traverse` is a good idea. This finds objects directly
instead of through directory listings. You can do a "top-up" sync very
cheaply by using `--max-age` and `--no-traverse` to copy only recent
files, eg
rclone copy --min-age 24h --no-traverse /path/to/source s3:bucket
You'd then do a full `rclone sync` less often.
Note that `--fast-list` isn't required in the top-up sync.
### Hashes ###
For small objects which weren't uploaded as multipart uploads (objects