Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Created 1/16/2016 jamshid.afshar · Updated 4/14/2017 jamshid.afshar

The S3CMD commandThe s3cmd command-line utility is a popular open-source tool. http://s3tools.org/s3cmd [http://s3tools.org/s3cmd]

It has two main uses with Content GatewayContent Gateway:

  • Easy command-line syncing of files to and from a Swarm domain

  • Help with diagnosing and verifying a

    Content Gateway environment

INSTALLING AND CONFIGURING S3CMD

...

  • Content Gateway environment

Installing and Configuring s3cmd

The .s3cfg file configures the s3cmd utility so that it can access your Caringo Content Gateway domainContent Gateway domain. In this example, the domain you've Created created and want to access is MYDOMAINis mydomain.EXAMPLE.COM and the cloudgateway example.com and the Gateway S3 endpoint is running at at 192.168.99.100:808280.

Important: Your  Your machine must be able to resolve the domain name as the Content Gateway S3 gateway IP address. In the domain name as the Content Gateway S3 gateway IP address. In a production environment, this would involve DNS configuration of wildcard domains, but you can simply edit your hosts file your hosts file when using s3cmd locally.

...

 

  1. Using OS X brew or python pip install s3cmd Windows: Install python 2.7 and pip.

...

  1.  For more info

...

  1. see README

    Code Block
    sudo pip install s3cmd
  2. Verify that that s3cmd is version 1.5.2 or later:

    Code Block
    s3cmd --version
  3. Edit

...

  1. your /etc/

...

  1. hosts (

...

  1. or c:\Windows\System32\etc\hosts) file and add a mapping for your domain to your Content Gateway IP address.

          192.168.99.

...

  1. 100 mydomain.example.com

  2. Edit

...

  1. your ~/.

...

  1. s3cfg file and paste into it all of these settings. Note:

...

  1. if you don't increase

...

  1. part size here, use command-line argument --multipart-chunk-size-mb=100 on s3cmd put/sync:

    # This should be your ~/.s3cfg file. It configures the s3cmd utility
    # to access your

...

  1. Swarm Content Gateway domain. 
    [default]
    access_key = {access-key-for-token}
    secret_key = {secret-key-for-token}
    # Must use default port 80 to avoid "S3 error: 403 (SignatureDoesNotMatch)".
    # Or you can use a custom S3 port if you configure V2 signatures below.
    host_base = mydomain.example.com:

...

  1. 80
    host_bucket = mydomain.example.com:

...

  1. 80
    # Below

...

  1. format might be needed under older s3cmd versions, but requires wildcard dns.
    #host_bucket = %(bucket)s.mydomain.example.com:

...

  1. 80
    signature_v2 = True
    check_ssl_certificate = False
    use_https = False
    # Important for improving Swarm performance and reducing storage overhead!
    multipart_chunk_size_mb = 100

  2. Remember

...

  1. to replace "mydomain.example.com:

...

  1. 80" in all places with your actual

...

  1. Content Gateway domain and S3 port!

  2. Generate a new access key (token)

...

  1.  via the Content Portal or a command-line curl, e.g.:

...


  1. # Create an S3 token that expires in 90 days, assumes gateway's scsp port is 8081

...

  1. $ curl -v -u "caringoadmin" -X POST --data-binary "" -H "X-User-Secret-Key-Meta: secret" -H "X-User-Token-Expires-Meta: +90" "http://mydomain.example.com:8081/.TOKEN/"

...

  1. Set access_key to the 32-

...

  1. character token uuid and

...

  1. set secret_key to the secret string that was used.

...

  1.  You're now ready to use s3cmd

...

  1. to list and create buckets, and copy files in or out.

    # List all your buckets in the domain
    $ s3cmd

...

  1. ls ...

    # Problems connecting, signature mismatch?

...

  1. Show debug 
    # output to see exactly what's sent and returned.
    $ s3cmd ls -

...



  1. # Download all the files from your "images" bucket
    $ mkdir headshots &&

...

  1.  s3cmd get -r s3://images headshots

    # Generate a signed url that expires in an hour

...

  1. $ s3cmd signurl s3://mybucket/file.html +3600
    http://mbyucket.mydomain.example.com:

...

  1. 80/file

...

  1. .

...

  1. html?AWSAccessKeyId=0e71169c9ab10b293bda2b454bf20c35&Expires=1447998649&Signature=KKwTgl0x%2Fk96jaPzp60LQ97ozO0%3D

...


  1. The bucket can be moved from the hostname into the path.

...

  1.  It always outputs "http", but you can use "https"

...

  1.  -- make sure

...

  1. your front-end proxy routes requests with the "AWSAccessKeyId" query arg to the

...

  1. Content Gateway S3 port.

    # List S3 multipart uploads in progress that were begun

...

  1. in 2015 and delete them, including parts:
    $ s3cmd multipart s3://

...

  1. inbox | grep '^2015-' | sed 's/ /%20/g' | awk -F$'\t' '{print

...

  1. $2,

...

  1. $3}' | xargs -p -

...

  1. -t -n 2 s3cmd abortmp

-------------------------

THE FOLLOWING S3 MULTIPART / SCSP PARALLEL WRITE REQUESTS RELY ON INTERNAL IMPLEMENTATION DETAILS THAT WILL CHANGE AND ARE INTENDED FOR DIAGNOSTIC USE ONLY.

CLOUDSCALER 4.X (S3 MULTIPART)

...

The following S3 multipart / SCSP parallel write requests rely on internal implementation details that will change and are intended for diagnostic use only.

CloudScaler 4.x (S3 Multipart)

# SCSP: list S3 multipart uploads in progress
$ curl -u "${myusername}" 'http://mydomain.example.com:8081/?content-type=application/caringo-multipart-id&fields=x-multipart-id,x-multipart-part-meta,X-Multipart-Content-Bucket-Meta,X-Multipart-Object-Meta,name,tmborn,etag,content-md5,content-type,X-Multipart-Content-type-Meta&stype=unnamed&format=json&sort=x-multipart-id-meta,x_multipart_part_meta'
...

{"content_type":"application/caringo-multipart-id", "name":"4bbc3b023f5d8e38d8da5064a9168d5d", "x_multipart_object_meta":"3076_20151017201832_mwi_9_3.iso", "hash":"4a66ed2e13c8a2b5e5165a288d8d02b2", "last_modified":"2015-11-17T18:18:33.898100Z", "x_multipart_content_type_meta":"application/octet-stream"},
...

# SCSP

...

:

...

 And you can list the uploaded parts for a specific "upload id":

...

$ curl -u "${myusername}" 'http://mydomain.example.com:8081/?x-multipart-id-meta=4bbc3b023f5d8e38d8da5064a9168d5d&fields=x-multipart-id-meta,x-multipart-part-meta,X-Multipart-Bucket-Meta,X-Multipart-Object-Meta,name,tmborn,etag,content-md5,content-type,X-Multipart-Content-type-Meta&stype=unnamed&format=json&sort=x-multipart-id-meta,x_multipart_part_meta&size=10000'
...

{"content_type":"application/caringo-multipart-part", "name":"97d528ebcb0545248ed57980f562a062", 
"x_multipart_id_meta":"4bbc3b023f5d8e38d8da5064a9168d5d", "x_multipart_part_meta":"02479", "x_multipart_bucket_meta":"inbox", "x_multipart_object_meta":"biglogs.tgz", "hash":"97d528ebcb0545248ed57980f562a062", "content_md5":"fD8MJjqMOwoUBNuSYz586A==", "last_modified":"2016-01-05T08:16:23.042100Z"},
... 

...

Swarm 9 (SCSP

...

parallel write) /

...

 Gateway 5.

...

x (SCSP

...

parallel write and S3 Multipart)

...

# SCSP: list multipart uploads in progress (POST-initiated or PUT-initiated)
$ curl -i --location-trusted 'Host:mydomain.example.com' 'http://${SWARM_ENDPOINT}/?stype=all&castor_system_partnumber=0&fields=context,name,tmborn,content-length,castor_system_uploadid,castor_system_partnumber&format=json&sort=tmborn:ASC'

...

# Direct to elasticsearch query to list the uploadIds of all uploads in progress, 
# even if initiated ("part 0") stream is missing
$ curl -i -XPOST "http://ELASTICSEARCH:9200/CARINGO-CLUSTER-NAME/IMMUTABLE/_search?pretty" -d '{ "size" : 0, "aggregations" : { "castor_system_uploadid" : { "terms" : { "field" : "castor_system_uploadid" } } } }'

...

{
"took" : 3,
"timed_out" : false,
...
"hits" : {
"total" : 5645,
...
},

"aggregations" : {
"castor_system_uploadid" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "

...

f8e96f441d2e32b57a8f3a3af84dc00ad7f9644799b818a158ad60b25abe3ac6d7f9644799b818a158ad60b25abe3ac60U",
"doc_count" : 2048
}, {
"key" : "93e9937cf0b1e1e282b17d9b3c2fae301fe01052b949300bde8d1ed34c69507f1fe01052b949300bde8d1ed34c69507f0U",
"doc_count" : 1863
}, {

"key" : "

...

f207289dae46079bd182a9c3a41bb8993f10b199ea37143f3dcb1fa062a40d083f10b199ea37143f3dcb1fa062a40d081P",
"doc_count" : 965
}, {
"key" : "

...

f207289dae46079bd182a9c3a41bb899e7be48f920601b8c3f1a4f4ece5e7a3be7be48f920601b8c3f1a4f4ece5e7a3b1P",
"doc_count" : 449
}, {
"key" : "0fb87a6d6c64af9db6e315ba76980da236afdcbe99ccff8f310637bede00b77c36afdcbe99ccff8f310637bede00b77c0U",
"doc_count" : 289
}, {
"key" : "5539b3f8ad46a76b5f54a892c02e41032284fe283d4a8724597c58b1a34287de2284fe283d4a8724597c58b1a34287de1P",
"doc_count" : 10
...