Using the "rclone" command-line tool with Content Gateway S3

The open source command-line tool "rclone" is a fast and stable command-line utility for listing and copying files between storage systems and local file systems. It is also cross-platform, available for Linux, OS X, and Microsoft Windows.

http://rclone.org/
http://linoxide.com/file-system/configure-rclone-linux-sync-cloud/

rclone v1.55.1 has known issues specific to object versioning and v1.57 (the default package in EPEL) has checksum issues, so please make sure to install a current rclone release

Install and configuration:

Download rclone for your platform from http://rclone.org/downloads/, unzip, and put the binary in your PATH. OS packages and containers are also available. For CentOS/RHEL just "yum install epel-release" and "yum install rclone" but then you must run "rclone selfupdate" to avoid issues in the older v.1.57 version.

You can skip "rclone config" by using this template.

[datacore]
type=s3
provider=Other
access_key_id=${S3_ACCESS_KEY}
secret_access_key=${S3_SECRET_KEY}
# Must NOT include default port 443 or 80 with rclone >1.47 to avoid signature errors!
endpoint=${S3_PROTOCOL}://${DOMAIN}:${S3_PORT}
location_constraint=
# The default 5MB part size is inefficient
chunk_size=100M
# This --s3-no-check-bucket option breaks mkdir but is required with gateway < 7.1 and rclone > v1.50 to avoid 409 errors (CLOUD-3213).
# no_check_bucket=true

For example, if your S3 domain / endpoint is "https://mydomain.cloud.datacore.com" you can create a token with:

$ curl -i -u dcuser -X POST --data-binary '' -H 'X-User-Secret-Key-Meta: secret' \
-H 'X-User-Token-Expires-Meta: +90' https://mydomain.cloud.datacore.com/.TOKEN/ HTTP/1.1 201 Created ... Token c63d5b1034c6b41b119683a5e264abd0 issued for dcuser in [root] with secret secret


Then add this entry to a ~/.rclone.conf (or newer location ~/.config/rclone/rclone.conf) file:

[datacore]
type = s3
# Do NOT use V2 sigs, have seen signature problems.
# region = other-v2-signature
access_key_id = c63d5b1034c6b41b119683a5e264abd0
secret_access_key = secret
endpoint = https://mydomain.cloud.datacore.com
location_constraint=
# The --s3-no-check-bucket option is only required with rclone > v1.50 and gateway < 7.1 to avoid 409 errors (CLOUD-3213).
# no_check_bucket=true


(info) If you prefer a GUI client try the web-ui that rclone itself is able to serve: https://rclone.org/gui/

Here are some example rclone commands:

  • List the buckets in your domain
    $ rclone lsd datacore:
              -1 2015-03-16 20:13:52        -1 public
             -1 2015-11-28 23:10:32        -1 inbox
    Transferred:            0 Bytes (   0.00 kByte/s)
    Errors:                 0
    Checks:                 0
    Transferred:            0
    Elapsed time:  5.653212245s
  • Copy your Pictures directory (recursively) to a "old-pics" bucket. It will be created if it does not exist.
    $ rclone copy --s3-upload-concurrency 10 --s3-chunk-size 100M '/Volumes/Backup/Pictures/' datacore:old-pics
    2016/01/12 13:55:47 S3 bucket old-pics: Building file list
    2016/01/12 13:55:48 S3 bucket old-pics: Waiting for checks to finish
    2016/01/12 13:55:48 S3 bucket old-pics: Waiting for transfers to finish
    2016/01/12 13:56:45 
    Transferred:      2234563 Bytes (  36.36 kByte/s)
    Errors:                 0
    Checks:                 0
    Transferred:            1
    Elapsed time:  1m0.015171105s
    Transferring:  histomapwider.jpg
    ...
  • List the files in the bucket
    $ rclone ls datacore:old-pics
        6148 .DS_Store
     4032165 histomapwider.jpg
    ...
  • Quickly see the size of the objects in a bucket:
    $ rclone size jam:old-pics
    Total objects: 173
    Total size: 9.550 GBytes (10254108727 Bytes)
  • Verify all files were uploaded (note trailing slash is necessary on local directory!). The check command can also compare two buckets.
    $ rclone check ~/Pictures/test/ datacore:old-pics
    2016/01/12 14:01:18 S3 bucket old-pics: Building file list
    2016/01/12 14:01:18 S3 bucket old-pics: 1 files not in Local file system at /Users/.../Pictures/test
    2016/01/12 14:01:18 .DS_Store: File not in Local file system at /Users/.../Pictures/test
    2016/01/12 14:01:18 Local file system at /Users/..../Pictures/test: 0 files not in S3 bucket old-pics
    2016/01/12 14:01:18 S3 bucket old-pics: Waiting for checks to finish
    2016/01/12 14:01:18 S3 bucket old-pics: 1 differences found
    2016/01/12 14:01:18 Failed to check: 1 differences found

    Note that "check" appears to be confused by the Mac OS X hidden directory ".DS_Store".
  • Tips: use "-v" and "--dump headers" or "--dump bodies" to see verbose details. 
  • To ignore system files you don't want compared or uploaded use something like:
       --excludes '.DS_Store' --exclude '.Trashes**' --exclude '.fseventsd**' --exclude '.Spotlight**' --exclude '._*'
  • Increase the part size with --s3-chunk-size 100M (defaults to 5M) to improve the speed and storage efficiency of resulting large streams. 
  • Speed up large transfers with "--transfers=10" and "--s3-upload-concurrency 4".
  • You might want to use --s3-disable-checksum when uploading huge files.
  • Unfortunately rclone does not copy or let you add metadata, though there are some enhancement requests on github.
  • See if using "rclone ls --fast-list datacore:mybucket" speeds up your large bucket listings. This does not use "delimiter" listings, which starting with Gateway 7.6 are much faster than ?delimiter=/ listings.
  • Copy a file from a plain http website into Swarm by streaming it directly:
    • # rclone -v --dump headers copy commondatastorage:gtv-videos-bucket/sample/ElephantsDream.mp4 datacore:mybucket/sample-videos/
      [commondatastorage]
      type = http
      url = https://commondatastorage.googleapis.com
  • Configure rclone using four environment variables instead of a config file:
    $ export RCLONE_S3_ENDPOINT=https://support.cloud.datacore.com
    # handy one-liner to create an S3 token
    $ curl -fsS -u USERNAME -XPOST -H "X-User-Secret-Key-Meta: secret" ${RCLONE_S3_ENDPOINT}/_admin/manage/tenants/datacore/tokens | jq -r ".token,.secret" | { read RCLONE_S3_ACCESS_KEY_ID && read RCLONE_S3_SECRET_ACCESS_KEY && echo "export RCLONE_S3_ACCESS_KEY_ID=${RCLONE_S3_ACCESS_KEY_ID} RCLONE_S3_SECRET_ACCESS_KEY=${RCLONE_S3_SECRET_ACCESS_KEY} RCLONE_CONFIG_MYS3_TYPE=s3" ; } 
    Enter host password for user 'USERNAME':
    export RCLONE_S3_ACCESS_KEY_ID=0d4506108b8aa15f784d6ada317abb90 RCLONE_S3_SECRET_ACCESS_KEY=secret RCLONE_CONFIG_MYS3_TYPE=s3

    # Copy and paste that output setting the remaining three env variables into your shell
    $ export RCLONE_S3_ACCESS_KEY_ID=0d4506108b8aa15f784d6ada317abb90 RCLONE_S3_SECRET_ACCESS_KEY=secret RCLONE_CONFIG_MYS3_TYPE=s3
    # Now you can e.g. move all numbered directories 1-100 into Swarm
    $ seq 1 100 | sed 's#$#/**#g' > /tmp/xx
    $ rclone move -v --include-from /tmp/xx --delete-empty-src-dirs . MYS3:archive/old-builds/
    $ seq 1 100 | xargs -n 1 rmdir  # rclone only deletes the contents
  • Mount a bucket as folder in your file system. If you do not use --use-server-modtime rclone will HEAD every object in the bucket which is very slow.
    $ mkdir /tmp/tickets
    $ rclone mount --read-only --use-server-modtime support:tickets /tmp/tickets &
    $ ls -lSh /tmp/tickets
    WARNING: trying to use object storage like a file system usually makes neither client nor server happy. If you use this be sure the workload is consistent and test it well first.
  • Use rclone purge to delete all object versions in a bucket and then delete the bucket. Note it uses individual DELETE requests instead of multi-delete requests (a POST bucket?delete with the list of objects in the body).
    $ rclone purge --dry-run --transfers 20 datacore:old-pics

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.