Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Created 1/16/2016 jamshid.afshar · Updated 4/14/2017 jamshid.afshar

The S3CMD commandThe s3cmd command-line utility is a popular open-source tool. http://s3tools.org/s3cmd [http://s3tools.org/s3cmd]

It has two main uses with Content GatewayContent Gateway:

  • Easy command-line syncing of files to and from a Swarm domain

  • Help with diagnosing and verifying a

    Content Gateway environment

INSTALLING AND CONFIGURING S3CMD

...

  • Content Gateway environment

Installing and Configuring s3cmd

The .s3cfg file configures the s3cmd utility so that it can access your Caringo Content Gateway domainContent Gateway domain. In this example, the domain you've Created created and want to access is MYDOMAINis mydomain.EXAMPLE.COM and the cloudgateway example.com and the cloudgateway S3 endpoint is running at at 192.168.99.100:8082.

Important: Your  Your machine must be able to resolve the domain name as the Content Gateway S3 gateway IP address. In the domain name as the Content Gateway S3 gateway IP address. In a production environment, this would involve DNS configuration of wildcard domains, but you can simply edit your hosts file your hosts file when using s3cmd locally. 

...

  1. Using OS X brew or python pip install s3cmd Windows: Install python 2.7 and pip.

...

  1.  For more info

...

  1. see README
    sudo pip install s3cmd 

  2. Verify that that s3cmd is version 1.5.2 or later:

...

  1.    s3cmd --version

  2. Edit

...

  1. your /etc/

...

  1. hosts (

...

  1. or c:\Windows\System32\etc\hosts) file and add a mapping for your domain to your Content Gateway IP address.

       192.168.99.

...

  1. 100 mydomain.example.com
  2. Edit

...

  1. your ~/.

...

  1. s3cfg file and paste into it all of these settings. Note:

...

  1. if you don't increase

...

  1. part size here, use command-line argument --multipart-chunk-size-mb=100 on s3cmd put/sync:

    # This should be your ~/.s3cfg file. It configures the s3cmd utility
    # to access your Caringo CloudScaler domain. 
    [default]
    access_key = {access-key-for-token}
    secret_key = {secret-key-for-token}
    host_base = mydomain.example.com:8082
    host_bucket = mydomain.example.com:8082
    Below format might be needed under older s3cmd versions, but requires wildcard
    # Below format might be needed under older s3cmd versions, but requires wildcard dns.
    #host_bucket = %(bucket)s.mydomain.example.com:8082
    signature_v2 = True
    check_ssl_certificate = False
    use_https = False
    # Important for improving Swarm performance and reducing storage overhead!
    multipart_chunk_size_mb = 100
  2. Remember

...

  1. to replace "mydomain.example.com:8082" in all places with your actual

...

  1. Content Gateway domain and S3 port!

  2. Generate a new access key (token)

...

  1.  via the Content Portal or a command-line curl, e.g.:

    Create


    # Create an S3 token that expires in 90 days, assumes gateway's scsp port is 8081
    $ curl $ curl -v -u "caringoadmin" -X POST --data-binary "" -H "X-User-Secret-Key-Meta: secret" -H "X-User-Token-Expires-Meta: +90" "http://mydomain.example.com:8081/.TOKEN/"

...

  1. Set access_key to the 32-

...

  1. character token uuid and

...

  1. set secret_key to the secret string that was used.

...

  1.  You're now ready to use s3cmd

...

  1. to list and create buckets, and copy files in or out.

    # List all your buckets in the domain
    $ s3cmd ls ls ...

    # Problems connecting, signature mismatch?

    Show debug

    Show debug 
    # output to see exactly what's sent and returned.
    $ s3cmd ls -d

    # Download all the files from your "images" bucket
    $ mkdir headshots && s3cmd  s3cmd get -r s3://images headshots

    # Generate a signed url that expires in an hour
    $ s3cmd signurl $ s3cmd signurl s3://mybucket/file.html +3600
    http://mbyucket.mydomain.example.com:8082/file.html?AWSAccessKeyId=0e71169c9ab10b293bda2b454bf20c35&Expires=1447998649&Signature=KKwTgl0x%2Fk96jaPzp60LQ97ozO0%3D [http://health-reports.support.cloud.caringo.com/gem.tx.caringo.com-583b6bf0d26bdd285006c4dfde66a514.html?AWSAccessKeyId=0e71169c9ab10b293bda2b454bf20c35&Expires=1447998649&Signature=KKwTgl0x%2Fk96jaPzp60LQ97ozO0%3D]
    The bucket can be moved from the hostname into the path. It always  It always outputs "http", but you can use "https"  -- make sure your frontyour front-end proxy routes requests with the "AWSAccessKeyId" query arg to the Content Gateway S3 Content Gateway S3 port.

    # List S3 multipart uploads in progress that were begun

    in 2015

    in 2015 and delete them, including parts:
    $ s3cmd multipart s3://inbox | grep '^2015-' | sed 's/ /%20/g' | awk -F$'\t' '{print $2, $3}' | xargs -p -r -t -n 2 s3cmd abortmp

    -------------------------

    THE FOLLOWING S3 MULTIPART / SCSP PARALLEL WRITE REQUESTS RELY ON INTERNAL IMPLEMENTATION DETAILS THAT WILL CHANGE AND ARE INTENDED FOR DIAGNOSTIC USE ONLY.

    CLOUDSCALER 4.X (S3 MULTIPART)




The following S3 multipart / SCSP parallel write requests rely on internal implementation details that will change and are intended for diagnostic use only.

CloudScaler 4.x (S3 Multipart)

# SCSP: list S3 multipart uploads in progress
$ curl -u "${myusername}" 'http://mydomain.example.com:8081/?content-type=application/caringo-multipart-id&fields=x-multipart-id,x-multipart-part-meta,X-Multipart-Content-Bucket-Meta,X-Multipart-Object-Meta,name,tmborn,etag,content-md5,content-type,X-Multipart-Content-type-Meta&stype=unnamed&format=json&sort=x-multipart-id-meta,x_multipart_part_meta'
...

{"content_type":"application/caringo-multipart-id", "name":"4bbc3b023f5d8e38d8da5064a9168d5d", "x_multipart_object_meta":"3076_20151017201832_mwi_9_3.iso", "hash":"4a66ed2e13c8a2b5e5165a288d8d02b2", "last_modified":"2015-11-17T18:18:33.898100Z", "x_multipart_content_type_meta":"application/octet-stream"},
...

# SCSP

...

:

...

 And you can list the uploaded parts for a specific "upload id":

...

$ curl -u "${myusername}" 'http://mydomain.example.com:8081/?x-multipart-id-meta=4bbc3b023f5d8e38d8da5064a9168d5d&fields=x-multipart-id-meta,x-multipart-part-meta,X-Multipart-Bucket-Meta,X-Multipart-Object-Meta,name,tmborn,etag,content-md5,content-type,X-Multipart-Content-type-Meta&stype=unnamed&format=json&sort=x-multipart-id-meta,x_multipart_part_meta&size=10000'
...

{"content_type":"application/caringo-multipart-part", "name":"97d528ebcb0545248ed57980f562a062", 
"x_multipart_id_meta":"4bbc3b023f5d8e38d8da5064a9168d5d", "x_multipart_part_meta":"02479", "x_multipart_bucket_meta":"inbox", "x_multipart_object_meta":"biglogs.tgz", "hash":"97d528ebcb0545248ed57980f562a062", "content_md5":"fD8MJjqMOwoUBNuSYz586A==", "last_modified":"2016-01-05T08:16:23.042100Z"},
...
 

...

Swarm 9 (SCSP

...

parallel write) /

...

 Gateway 5.

...

x (SCSP

...

parallel write and S3 Multipart)

...

# SCSP: list multipart uploads in progress (POST-initiated or PUT-initiated)
$ curl -i --location-trusted 'Host:mydomain.example.com' 'http://${SWARM_ENDPOINT}/?stype=all&castor_system_partnumber=0&fields=context,name,tmborn,content-length,castor_system_uploadid,castor_system_partnumber&format=json&sort=tmborn:ASC'

...

# Direct to elasticsearch query to list the uploadIds of all uploads in progress, 
# even if initiated ("part 0") stream is missing
$ curl -i -XPOST "http://ELASTICSEARCH:9200/CARINGO-CLUSTER-NAME/IMMUTABLE/_search?pretty" -d '{ "size" : 0, "aggregations" : { "castor_system_uploadid" : { "terms" : { "field" : "castor_system_uploadid" } } } }'
...
{
"took" : 3,
"timed_out" : false,
...
"hits" : {
"total" : 5645,
...
},
 "aggregations" : {
"castor_system_uploadid" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "

...

f8e96f441d2e32b57a8f3a3af84dc00ad7f9644799b818a158ad60b25abe3ac6d7f9644799b818a158ad60b25abe3ac60U",
"doc_count" : 2048
}, {
"key" : "93e9937cf0b1e1e282b17d9b3c2fae301fe01052b949300bde8d1ed34c69507f1fe01052b949300bde8d1ed34c69507f0U",
"doc_count" : 1863
}, {
      "key" : "

...

f207289dae46079bd182a9c3a41bb8993f10b199ea37143f3dcb1fa062a40d083f10b199ea37143f3dcb1fa062a40d081P",
"doc_count" : 965
}, {
"key" : "

...

f207289dae46079bd182a9c3a41bb899e7be48f920601b8c3f1a4f4ece5e7a3be7be48f920601b8c3f1a4f4ece5e7a3b1P",
"doc_count" : 449
}, {
"key" : "0fb87a6d6c64af9db6e315ba76980da236afdcbe99ccff8f310637bede00b77c36afdcbe99ccff8f310637bede00b77c0U",
"doc_count" : 289
}, {
"key" : "5539b3f8ad46a76b5f54a892c02e41032284fe283d4a8724597c58b1a34287de2284fe283d4a8724597c58b1a34287de1P",
"doc_count" : 10
...