These are instructions for using docker-compose to deploy a DataCore Swarm environment in containers. The commands below set up a complete environment on a single server or laptop for demonstration purposes or for functional integration testing.
For more information see:
https://www.brighttalk.com/webcast/13173/413805/cloud-seeding-with-object-storage-containers-tech-tuesday-webinar
Prepare the Docker Host
Install Docker on a Linux server (e.g. using the convenience script https://docs.docker.com/engine/install/centos/#install-using-the-convenience-script ) or install Docker for Desktop on Windows or macOS (https://docs.docker.com/engine/install/#desktop ).
a. Verify the docker server has at least 8GB RAM available to containers (check Resources in Docker for Desktop) and 40GB disk space available.
b. Verify the sysctl
value vm.max_map_count = 262144
– it is required for the elasticsearch containers to start.
Verify the docker server has good sysctl settings.
docker run --privileged centos sysctl -a | grep -E 'file-max|max_user_instances|max_user_watches|max_map_count'
fs.file-max = 131072
fs.inotify.max_user_instances = 128
vm.max_map_count = 262144
docker run --rm --privileged centos:7.9.2009 free -h
total used free shared buff/cache available
Mem: 7.7G 263M 6.7G 163M 778M 7.0G
Swap: 1.0G 187M 836M
docker run --rm --privileged centos:7.9.2009 df -h .
Filesystem Size Used Avail Use% Mounted on
overlay 59G 7.3G 49G 14% /
The default vm.max_map_count
on macOS is fine but Windows and Linux users have to adjust. Temporarily make the change with this but unfortunately this must be performed every time a Windows machine is rebooted:
docker run --rm -ti --privileged centos:7.9.2009 sysctl vm.max_map_count=262144
Linux can make this change permanent by creating this file as root
and rebooting and installing docker-ce
:
echo 'vm.max_map_count = 262144' > /etc/sysctl.d/98-elasticsearch.conf
curl -fsS https://get.docker.com | sh # Or see https://docs.docker.com
c. This is no longer necessary with Swarm 15, but if you run an older version like Swarm 11.3.0 (use image caringo:v11
instead of caringo:demo
) be sure docker info
shows Cgroup Version: 1
. Recent Docker for Desktop and recent Linux (e.g. ubuntu 22) default to Cgroup Version 2 which causes WMM05 unexpected memory stats
errors in castor.log prior to Swarm 15.
Just restart Docker for Desktop after changing "deprecatedCgroupv1": true,
in ~/Library/Group Containers/group.com.docker/settings.json
or restart Linux after setting GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
in /etc/default/grub
.
Download and Initialize the DataCore Swarm Containers
The below bash
commands contain the access and secret keys needed to access DataCore’s public image repo, now at quay.io/perifery/
. Further instructions and files for an offline install are located at:
https://jam.cloud.caringo.com/public/offline-demo/README.md
Note: DOCKER_INTERFACE
is the IP or hostname of the docker server, so localhost
if running Docker for Mac.
For Linux, macOS, or WSL2 (running with elevated privileges, or as root
):
export REGISTRY_URL="quay.io/perifery/" REGISTRY_USER="perifery+demo" REGISTRY_PASSWORD="25VM6XA9JBRRT2ENFZ4KCWXK6Z65PCSHMM7QHD50QK7VYCJA07T4HYFUE5HMV4AW"
docker login --username "${REGISTRY_USER}" --password "${REGISTRY_PASSWORD}" quay.io
Login Succeeded
(ignore the WARNING about using --password)
docker pull ${REGISTRY_URL}caringo:demo
demo: Pulling from perifery/caringo
Digest: sha256:222449b510c6a9d680fa95bdead0a50ee2a0291416016e4aa8e1d5b6a4713be6
Status: Downloaded newer image for quay.io/perifery/caringo:demo
quay.io/perifery/caringo:demo
docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock -e DOCKER_INTERFACE=localhost -e REGISTRY_URL -e REGISTRY_USER -e REGISTRY_PASSWORD -e GATEWAY_ADMIN_PASSWORD=datacore ${REGISTRY_URL}caringo:demo init.sh
The init.sh
script outputs any errors. This can be checked in the container logs (e.g. docker logs caringo42_elasticsearch_1
). When successful it outputs the URLs for accessing the Swarm storage console and content portal e.g.
Content Portal: http://localhost/_admin/portal
Storage UI: http://localhost:91/_admin/storage
Swarm legacy console: http://localhost:4290/storage/swarm/
Grafana dashboards: http://localhost:4230/
Note the GATEWAY_ADMIN_USER:GATEWAY_ADMIN_PASSWORD now default to dcadmin:datacore
.
Use caringo:demo-min
instead of caringo:demo
if the machine has only 8GB RAM. That lowers some memory settings and simplifies elasticsearch.
Swarm on arm64 (EXPERIMENTAL)
Although Swarm is not supported on ARM64 there is an experimental build available and the other containers have been built for arm64. Thus there is experimental support for running the demo containers on a non-Intel Mac. Note the REGISTRY_URL is different from above (multi-arch images are not yet being used). The caringo:alpha
image is specified to pull upcoming releases and a more recent elasticsearch version.
export REGISTRY_URL="quay.io/perifery/arm64/" REGISTRY_USER="perifery+demo" REGISTRY_PASSWORD="25VM6XA9JBRRT2ENFZ4KCWXK6Z65PCSHMM7QHD50QK7VYCJA07T4HYFUE5HMV4AW" docker login --username "${REGISTRY_USER}" --password "${REGISTRY_PASSWORD}" quay.io docker run --pull always -ti --rm -e UNIQ_PORT=77 -e REGISTRY_USER -e REGISTRY_PASSWORD -e REGISTRY_URL -v /var/run/docker.sock:/var/run/docker.sock ${REGISTRY_URL}caringo:alpha init.sh
Next Steps to Attempt
A mini DataCore Swarm environment is now running. Use Content Portal and Storage UI as with a production environment.
Exec into this container that has a few S3 clients installed and configured:
$ docker exec -it caringo42_s3ql_1 bash # showconfigs # s3cmd ls # fallocate -l 1G 1G # rclone -v copy 1G caringo:mybucket
Configure an external S3 client to use this environment. An
/etc/hosts
(or\WINDOWS\system32\drivers\etc\hosts
) entry is needed on the S3 client machine to map the domainbackup42
to the IP of the machine running docker. Use 127.0.0.1 if using Docker for Desktop and the S3 client is on the local machine.
Create a different domain using Portal, or setdocker run ... -e DOMAIN=mylaptop.example.com ... init.sh
to change the name of the domain the init script creates.All logs in the
syslog
container are visible and support tools likeswarmctl
can be run to see or change swarm settings or runindexer-enumerator.sh
to list all objects.$ docker exec -it caringo42_syslog_1 bash # tail -F cloudgateway_audit.log & # swarmctl -d swarm -a # indexer-enumerator.sh
Bring up an existing environment after a reboot or stopped with
docker run … stop.sh
usingdocker run … up.sh
. Use the settingdocker run -e PROJECT_RESTART=always … init.sh
toautomatically start
on reboot.If the docker server has a service already using ports 80 and 443, resulting in “
ERROR: for caringo42_https_1 Cannot start service … 0.0.0.0:443: bind: address already in use
”, change those published ports by adding:docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock -e HTTPS_HTTP_PORT=4280 -e HTTPS_HTTPS_PORT=4243 -e REGISTRY_URL -e REGISTRY_USER -e REGISTRY_PASSWORD ${REGISTRY_URL}caringo:demo init.sh
Put any configuration to reuse into a text file and use
--env-file my.env
to simplify thedocker run
commands.cat > my.env <<EOF DOCKER_INTERFACE=mylaptop.example.com PROJECT_RESTART=always SWARM_CLUSTER_NAME=swarmtest.example.com SWARM_DISK_SIZE=10g DOMAIN=mylaptop.example.com GATEWAY_ADMIN_PASSWORD=datacore EOF
docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock --env-file my.env -e REGISTRY_URL -e REGISTRY_USER -e REGISTRY_PASSWORD ${REGISTRY_URL}caringo:demo init.sh
WARNING: changing the Swarm
cluster.name
loses the “persistent settings UUID”, including the Search Feed. It needs to be recreated by running re-run init.sh.The default 2TB license is sufficient. To use a license
devlicense.txt
addSWARM_CFG_1=license.url = file:///license/devlicense.txt
to themy.env
and copy the license to a volume shared to the syslog and swarm containers.Bring up the "syslog" service, it has the new license volume, using "--pull always" to download the latest caringo:demo image.
docker run -ti --pull always --rm -v /var/run/docker.sock:/var/run/docker.sock -e REGISTRY_URL -e REGISTRY_USER -e REGISTRY_PASSWORD --env-file my.env ${REGISTRY_URL}caringo:demo up.sh syslog
Copy the license file into the volume in the syslog container.
docker cp /tmp/devlicense.txt caringo42_syslog_1:/var/www/html/license/
Now rerun "up.sh" so swarm comes up with the new "license.url" setting and license volume.
docker run -ti --pull always --rm -v /var/run/docker.sock:/var/run/docker.sock -e REGISTRY_URL -e REGISTRY_USER -e REGISTRY_PASSWORD --env-file my.env ${REGISTRY_URL}caringo:demo up.sh
Add this to the
my.env
to allow anonymous read and write access to Gateway, e.g. to test an application that makes requests directly to Swarm. This assumes the docker environment is only accessible by trusted clients.EXTRA_ROOT_POLICY_STATEMENTS={"Effect": "Allow", "Sid": "Anonymous Full Access", "Action": ["*"], "Resource": "*", "Principal": {"anonymous": ["*"]}}
See all variables used to configure this environment and run
docker-compose
directly in thetest
container.% docker exec -it caringo42_test_1 # ./diff-env.sh
...shows non-default container config...
# docker-compose ps # less config.env
Use the images without creating Swarm, e.g. to run a support tool:
docker run -v /tmp/for-support:/tmp ${REGISTRY_URL}caringo-syslog:stable /root/dist/indexer-grab.sh -t elasticsearch1.example.com:9200 ls /tmp/for-support indexgrab-175f84275a6d-07234012172020.tar.gz
Add
-e ADD_COMPOSE_FILE=:docker-compose-systemd.yml
(the colon prefix is required) to make the gateway and elasticsearch containers use systemd, to more closely match a regular environment.
This currently requires"deprecatedCgroupv1": true
in~/Library/Group Containers/group.com.docker/settings.json
.Run multiple gateways behind the haproxy load balancer by adding
-e GATEWAY_SCALE=3
.Add these environment variables to bring up an environment that does not use elasticsearch. This means no Search Feed is created and object listings, Swarm metrics and Gateway metering and quotas are disabled.
-e ELASTICSEARCH_SCALE=0 -e ESHOST= -e INDEXER_HOSTS= -e SKIP_VERIFY_ELASTICSEARCH=true -e GATEWAY_METERING=false
Use tcpdump to monitor the http traffic to/from the Gateway S3 port:
docker run --net=container:caringo42_cloudgateway_1 fish/tcpdump-docker -i eth0 -vv -s 0 -A port 8085
caringo42_s3ql_1.caringo42_default.37592 > 914f010e83e6.8085: Flags [P.], cksum 0x5b00 (incorrect -> 0xc540), seq 1305:1957, ack 1051, win 501, options [nop,nop,TS val 1815248495 ecr 3455121625], length 652 E...g.@.@.x.............@...*g......[...... l2~o....PUT /locker/hello.txt?legal-hold HTTP/1.1 Host: backup42:8085 Accept-Encoding: identity Content-MD5: 1B2M2Y8AsgTpgAmY7PhCfg== User-Agent: aws-cli/2.2.22 Python/3.8.8 Linux/5.10.25-linuxkit exe/x86_64.centos.7 prompt/off command/s3api.put-object-legal-hold X-Amz-Date: 20210723T030338Z X-Amz-Content-SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Authorization: AWS4-HMAC-SHA256 Credential=4ed7e53e89b25a911a5c62557dd5fdc4/20210723/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=42496864845c0ddf8b450cd68756dd1ffe4ac1d01fff51dada3d0127a251a35d
Remove the environment, deleting all containers and volume and reclaiming any space it used with
clean.sh
:docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock -e DOCKER_INTERFACE=localhost -e REGISTRY_URL -e REGISTRY_USER -e REGISTRY_PASSWORD -e GATEWAY_ADMIN_PASSWORD=caringo ${REGISTRY_URL}caringo:demo clean.sh