The Apache Hadoop S3 connector "S3A" works with Content Gateway S3. Here is a small but complete example, using a single-node hadoop system that can be easily run on any docker server. It shows bucket listing, distcp, and a simple mapreduce job against a bucket in a Caringo Swarm domain.
- Create a container using the docker image from
$ docker run -it sequenceiq/hadoop-docker:2.7.0 /etc/ -bash
- At the container's shell prompt, add the path containing the "hadoop" binary:
- Copy the jar's needed for the S3A libraries:
cd /usr/local/hadoop-2.7.0/share/hadoop/tools/lib/ && cp -p hadoop-aws-2.7.0.jar aws-java-sdk-1.7.4.jar jackson-core-2.2.3.jar jackson-databind-2.2.3.jar jackson-annotations-2.2.3.jar /usr/local/hadoop-2.7.0/share/hadoop/hdfs/lib/
- Make sure your domain ( and bucket (hadoop-test) have been Created and that your /etc/hosts or DNS are configured to resolve [] to your cloudgateway server's S3 port.
- Create an S3 token
curl -i -u USERNAME -X POST --data-binary '' -H 'X-User-Secret-Key-Meta: secret' -H 'X-User-Token-Expires-Meta: +90'
HTTP/1.1 201 Created
Token e181dcb1d01d5cf24f76dd276b95a638 issued for USERNAME in [root] with secret secret
- List your bucket (should be empty)
hadoop fs -Dfs.s3a.access.key=e181dcb1d01d5cf24f76dd276b95a638 -Dfs.s3a.secret.key=secret -ls s3a://hadoop-test/
Note: error ls: `s3a://hadoop-test': No such file or directory is expected if bucket is empty OR if you forget the trailing slash.
- Copy the sample "input" files into your bucket:
hadoop distcp -Dfs.s3a.access.key=e181dcb1d01d5cf24f76dd276b95a638 -Dfs.s3a.secret.key=secret input s3a://hadoop-test/input
- Verify with "-ls" or in Content Gateway ui that the bucket now has ~31 streams.
hadoop fs -Dfs.s3a.access.key=e181dcb1d01d5cf24f76dd276b95a638 -Dfs.s3a.secret.key=secret -ls s3a://hadoop-test/input/