s3: update docs with a Reducing Costs section - Fixes #2889
This commit is contained in:
parent
979bb07c86
commit
506342317b
|
@ -248,25 +248,6 @@ d) Delete this remote
|
||||||
y/e/d>
|
y/e/d>
|
||||||
```
|
```
|
||||||
|
|
||||||
### --fast-list ###
|
|
||||||
|
|
||||||
This remote supports `--fast-list` which allows you to use fewer
|
|
||||||
transactions in exchange for more memory. See the [rclone
|
|
||||||
docs](/docs/#fast-list) for more details.
|
|
||||||
|
|
||||||
### --update and --use-server-modtime ###
|
|
||||||
|
|
||||||
As noted below, the modified time is stored on metadata on the object. It is
|
|
||||||
used by default for all operations that require checking the time a file was
|
|
||||||
last updated. It allows rclone to treat the remote more like a true filesystem,
|
|
||||||
but it is inefficient because it requires an extra API call to retrieve the
|
|
||||||
metadata.
|
|
||||||
|
|
||||||
For many operations, the time the object was last uploaded to the remote is
|
|
||||||
sufficient to determine if it is "dirty". By using `--update` along with
|
|
||||||
`--use-server-modtime`, you can avoid the extra API call and simply upload
|
|
||||||
files whose local modtime is newer than the time it was last uploaded.
|
|
||||||
|
|
||||||
### Modified time ###
|
### Modified time ###
|
||||||
|
|
||||||
The modified time is stored as metadata on the object as
|
The modified time is stored as metadata on the object as
|
||||||
|
@ -280,6 +261,87 @@ storage the object will be uploaded rather than copied.
|
||||||
Note that reading this from the object takes an additional `HEAD`
|
Note that reading this from the object takes an additional `HEAD`
|
||||||
request as the metadata isn't returned in object listings.
|
request as the metadata isn't returned in object listings.
|
||||||
|
|
||||||
|
### Reducing costs
|
||||||
|
|
||||||
|
#### Avoiding HEAD requests to read the modification time
|
||||||
|
|
||||||
|
By default rclone will use the modification time of objects stored in
|
||||||
|
S3 for syncing. This is stored in object metadata which unfortunately
|
||||||
|
takes an extra HEAD request to read which can be expensive (in time
|
||||||
|
and money).
|
||||||
|
|
||||||
|
The modification time is used by default for all operations that
|
||||||
|
require checking the time a file was last updated. It allows rclone to
|
||||||
|
treat the remote more like a true filesystem, but it is inefficient on
|
||||||
|
S3 because it requires an extra API call to retrieve the metadata.
|
||||||
|
|
||||||
|
The extra API calls can be avoided when syncing (using `rclone sync`
|
||||||
|
or `rclone copy`) in a few different ways, each with its own
|
||||||
|
tradeoffs.
|
||||||
|
|
||||||
|
- `--size-only`
|
||||||
|
- Only checks the size of files.
|
||||||
|
- Uses no extra transactions.
|
||||||
|
- If the file doesn't change size then rclone won't detect it has
|
||||||
|
changed.
|
||||||
|
- `rclone sync --size-only /path/to/source s3:bucket`
|
||||||
|
- `--checksum`
|
||||||
|
- Checks the size and MD5 checksum of files.
|
||||||
|
- Uses no extra transactions.
|
||||||
|
- The most accurate detection of changes possible.
|
||||||
|
- Will cause the source to read an MD5 checksum which, if it is a
|
||||||
|
local disk, will cause lots of disk activity.
|
||||||
|
- If the source and destination are both S3 this is the
|
||||||
|
**recommended** flag to use for maximum efficiency.
|
||||||
|
- `rclone sync --checksum /path/to/source s3:bucket`
|
||||||
|
- `--update --use-server-modtime`
|
||||||
|
- Uses no extra transactions.
|
||||||
|
- Modification time becomes the time the object was uploaded.
|
||||||
|
- For many operations this is sufficient to determine if it needs
|
||||||
|
uploading.
|
||||||
|
- Using `--update` along with `--use-server-modtime`, avoids the
|
||||||
|
extra API call and uploads files whose local modification time
|
||||||
|
is newer than the time it was last uploaded.
|
||||||
|
- Files created with timestamps in the past will be missed by the sync.
|
||||||
|
- `rclone sync --update --use-server-modtime /path/to/source s3:bucket`
|
||||||
|
|
||||||
|
These flags can and should be used in combination with `--fast-list` -
|
||||||
|
see below.
|
||||||
|
|
||||||
|
If using `rclone mount` or any command using the VFS (eg `rclone
|
||||||
|
serve`) commands then you might want to consider using the VFS flag
|
||||||
|
`--no-modtime` which will stop rclone reading the modification time
|
||||||
|
for every object. You could also use `--use-server-modtime` if you are
|
||||||
|
happy with the modification times of the objects being the time of
|
||||||
|
upload.
|
||||||
|
|
||||||
|
#### Avoiding GET requests to read directory listings
|
||||||
|
|
||||||
|
Rclone's default directory traversal is to process each directory
|
||||||
|
individually. This takes one API call per directory. Using the
|
||||||
|
`--fast-list` flag will read all info about the the objects into
|
||||||
|
memory first using a smaller number of API calls (one per 1000
|
||||||
|
objects). See the [rclone docs](/docs/#fast-list) for more details.
|
||||||
|
|
||||||
|
rclone sync --fast-list --checksum /path/to/source s3:bucket
|
||||||
|
|
||||||
|
`--fast-list` trades off API transactions for memory use. As a rough
|
||||||
|
guide rclone uses 1k of memory per object stored, so using
|
||||||
|
`--fast-list` on a sync of a million objects will use roughly 1 GB of
|
||||||
|
RAM.
|
||||||
|
|
||||||
|
If you are only copying a small number of files into a big repository
|
||||||
|
then using `--no-traverse` is a good idea. This finds objects directly
|
||||||
|
instead of through directory listings. You can do a "top-up" sync very
|
||||||
|
cheaply by using `--max-age` and `--no-traverse` to copy only recent
|
||||||
|
files, eg
|
||||||
|
|
||||||
|
rclone copy --min-age 24h --no-traverse /path/to/source s3:bucket
|
||||||
|
|
||||||
|
You'd then do a full `rclone sync` less often.
|
||||||
|
|
||||||
|
Note that `--fast-list` isn't required in the top-up sync.
|
||||||
|
|
||||||
### Hashes ###
|
### Hashes ###
|
||||||
|
|
||||||
For small objects which weren't uploaded as multipart uploads (objects
|
For small objects which weren't uploaded as multipart uploads (objects
|
||||||
|
|
Loading…
Reference in New Issue