Gateway Configuration

These configuration files reside on the system after installing the Content Gateway service:

/etc/caringo/cloudgateway/gateway.cfg  /etc/caringo/cloudgateway/logging.yaml

Logging: See Gateway Logging after completing the Gateway configuration. The configuration file for logging changed from logging.cfg to logging.yaml as of Gateway 6.0 to support newer versions of Elasticsearch and to add customizations to the YAML file. See the Apache documentation for logging.

Password Security

Plain-text passwords in both Gateway Configuration and IDSYS are replaced by encrypted versions on startup. Enter new passwords and restart Gateway when management passwords need to be changed, which replaces those strings with encrypted versions as part of startup. (v7.1)

These config items must be changed back to plain text so they can be encrypted with the new key if the adminDomain is deleted or changed.

Configuring the Content Gateway

Minimum Configuration

While cluster administrators must understand the details of configuring Content Gateway, this section summarizes the minimum steps required to configure and run Gateway. To deploy Gateway into production, additional customization is needed.

  1. Check either that IPTABLES are off or that inbound access for the front-end protocols is allowed. These commands turn off and disable the firewall daemon.

    systemctl disable firewalld systemctl stop firewalld
  2. Edit the /etc/caringo/cloudgateway/gateway.cfg file:

    1. Set adminDomain to the name of an administrative domain that is created.

    2. Set hosts for the storage cluster nodes. Including 4 to 5 nodes is sufficient for most deployments.

    3. Set indexerHosts to the Elasticsearch servers (required for S3 and Content Metering).

    4. Enable at least one of the front-end protocols: SCSP or S3.
      Alternatively, for Service Proxy use (to host the Swarm UI), set both to disabled and complete the [cluster_admin] section.

  3. Create the administrative domain by running the following on the first Gateway server:

    /opt/caringo/cloudgateway/bin/initgateway

    Password Security: This one-time step initializes password encryption for the Gateway configuration and IDSYS files. If upgrading from a version prior to 7.1, this initialization must be run again on one Gateway server to enable the feature. (v7.1)
    See Gateway Administrative Domain

  4. Start the Gateway service:

  5. Enable automatic startup of the Gateway service.

Production deployments require customizations of the configuration parameters, below.

Configuration Sections of gateway.cfg

The gateway.cfg file controls the core operations of the Content Gateway. It is a plain text, INI-formatted file read when the Gateway is first started. The parameters within the file are organized into the following sections, and colored rows are generally essential entries.

[gateway]

This section configures client communications.

adminDomain

gatewayAdminDomain

Required. The administrative domain where meta information about tenants and storage domains is kept.

Important

This parameter must be set to the same value for all Gateway servers.

Changing the adminDomain invalidates encrypted passwords in idsys.json and gateway.cfg and all tokens.

This is not recommended to match the Swarm default domain (cluster.name). Doing so leads to “Invalid token” errors if cluster.enforceTenancy=False, which is also not recommended.

threads

200

The number of threads allocated to handling client requests. Set for 100 times number of CPU cores. Minimum is 200.

For CPUs with hyperthreading enabled, this calculation is based on the number of virtual cores, not physical.

tokenTTLHours

24

The default number of hours an authentication token is valid if no time is defined when it is created.

multipartSpoolDir

/var/spool/cloudgateway

The location of the spool directory for HTTP multipart MIME upload temporary space.

Note

Uploads through the Content UI use SCSP multipart uploads rather than multipart MIME uploads. (Gateway v6.2)

multipartUsageAllowed

50

The percentage of the file system that can be used for multipart MIME upload temporary space.

recursiveDeleteMaxThreads

50

The maximum number of parallel delete operations to dispatch when processing recursive delete requests.

sanitizeErrors

false

Set to true to hide identity management configuration details from authentication errors.

cookieDomains



One or more base domains for the Set-Cookie response header to scope (instead of the FQDN from the request) if an authentication token is created within a child domain of one of these base domains. This can be useful when using the Content UI to access multiple storage domains that share a common base domain when wanting to use the same authentication token across domains. (v5.2.2)

Example:

cookieDomains = cloud.example.com cloud.example.net

veeamKbBlockSize

8192

Gateway implements the Veeam SOSAPI extension (v7.10.3). This config allows block size configuration. The default and recommended value is 8192. Set to 0 to disable SOSAPI handling.

The capacity and availability returned in a GET of pseudo-object .system-d26a9498-cb7c-4a87-a44a-8ae204f5ba6c/capacity.xml are estimated based on the bucket's evaluated EC setting which is cached for 5 minutes. The values are based on cluster capacity; bucket quotas are not currently used.

recursiveDeleteMaxItems

10000

The max multidelete request items, SCSP only. S3 has a fixed limit of 1000 which is defined by AWS.

recursiveDeleteMaxSize

 

2560000

The max multidelete request body size (~2.5Mb).

recursiveDeleteMaxRetries

3

Number of retries when hitting 503 on delete.

recursiveDeleteRetryDelay

500

Number of milliseconds to wait before retrying.

recursiveDeleteSynchronousIndexing

true

Whether to request synchronous ES index update during each delete.

[storage_cluster] 

This section configures the back-end storage cluster.

locatorType

"static"

Zeroconf is not supported.

hosts

server1 server2 server3

Space or comma delimited list of IP addresses or host names of the storage cluster nodes.

port

80

Integer socket port number for SCSP on the storage nodes.

clusterName



The name of the storage cluster.

indexerHosts

indexer1 indexer2 indexer3

Space or comma delimited list of the Elasticsearch metadata index servers used by the storage cluster. Must be from the same ES cluster: do not mix old and new clusters.

Required for the S3 protocol and for Content Metering

indexerPort

9200

The socket port on which the Elasticsearch servers listen.

managementPort
managementUser managementPassword

91

Provide these credentials for the storage cluster to enable Gateway version and component information to be included in the cluster health report that provides proactive support from DataCore. (v6.0)

Required when using [cluster_admin].

clientBindAddress

0.0.0.0

Set to the IP address of the network interface connected to the storage cluster subnet when using a multi-homed Gateway. The value must be defined as a non-default value when using a multi-homed Gateway server such as one connected to a front-end client network and a back-end storage network.

maxConnectionsPerRoute

100

The maximum number of open connections to a specific storage node.

maxConnections

250

The maximum number of open connections to allow. This includes both active and idle connections.

connectTimeout

60

The time in seconds allowed to connect to a node.

socketTimeout

10

The time in seconds allowed for an active connection to deliver data.
10 (seconds), default starting 8.1.0. Set to -1 to disable.

idleTimeout

120

The time in seconds an idle socket is allowed to remain in the connection pool.

indexerSocketTimeout

120

The time in seconds an indexer socket is allowed to remain in the connection pool. This affects the ability to list larger buckets. (v7.1)

continueWaitTimeout

30

The time in seconds to wait for client response after a 100 continue reply.

dataProtection

"immediate"

Controls whether synchronous (immediate, using replicate on write) or asynchronous (delayed) data protection is requested when writing to the storage cluster.

Values:

  • "immediate" (for replicate on write) - requires storage cluster setting of scsp.replicateOnWrite=true

  • "delayed" (disables replicate on write) - requires storage cluster setting of scsp.replicateOnWrite=false 

See Configuring ROW Replicate On Write

blockUndeletableWrites

true

When enabled, the Gateway rejects any SCSP write (PUT, POST, COPY, APPEND) that includes a deletable=no/false lifepoint. This restriction applies to both named and unnamed (alias and immutable) objects. The request is refused with a 400 error message, "Unable to write undeletable object".

[scsp]

This section configures the front-end SCSP protocol. This protocol must be enabled for any Gateway that services Content UI requests.

enabled

true

Activates this protocol: Values are: "true", "false".

bindAddress

0.0.0.0

The IP address of the network interface to which the listening socket binds. Defaults to all interfaces.

bindPort

80

Integer socket port number for protocol.

externalHTTPPort
externalHTTPSPort

80

443

Optional, one or both. Allows Gateway to be used either behind a proxy or within a Docker environment, taking effect when X-Forwarded-Proto is found on the request. Gateway uses X-Forwarded-Proto to determine which port to use. (v5.4)

allowSwarmAdminIP

undefined

Allows the use of internal Swarm requests for content replication to pass through the Gateway. This is useful if using replication feeds between clusters that use Gateway as the front-end.

Values are "all", full IP addresses, IP address prefixes, a list of IPs/prefixes, or CIDR format such as 172.30.15.0/24. 
When undefined, no clients are allowed to send Swarm admin requests through the Gateway.

[s3] 

This section configures the front-end S3 protocol, which is optional.

enabled

false

The protocol must be explicitly enabled. Values are: "true", "false".

bindAddress

0.0.0.0

The IP address of the network interface to which the listening socket binds. Defaults to all interfaces.

bindPort

80

Integer socket port number for protocol.

externalHTTPPort
externalHTTPSPort

80
443

Optional, one or both. Allows Gateway to be used either behind a proxy or within a Docker environment, taking effect when X-Forwarded-Proto is found on the request. Gateway uses X-Forwarded-Proto to determine which port to use. (v5.4)

enhancedListingConsistency

true

Improves compatibility with S3 clients and software libraries that expect consistent listings (despite the documented nature of listings to be eventually consistent). Can be disabled to boost write throughput (especially for small objects), if listing consistency is not critical. (v5.2.1)

Exceptions to synchronous indexing:

  • Deletes of manifests for canceled multipart uploads are done asynchronously.

  • On a delete, when there is not enough space on the local node to write a delete marker for a named object, Swarm writes to another node and indexes asynchronously.

  • On a rename, Swarm indexes the new name synchronously, but the old name is deleted asynchronously.

  • On a parallel write complete, the init stream is deleted asynchronously.

region

 

The Amazon S3 GET Bucket Location request returns the AWS region in which the bucket is located.
By default, Gateway returns an empty value for the location, which S3 clients interpret as us-east-1. If another region is required, there are two options:

  • Supply the location in the bucket creation operation using LocationConstraint.

  • Set the region option in the Gateway configuration file to the preferred region. This applies to all buckets unless the location is specified during creation.

If you require the behavior prior to Content Gateway 7.10.2 of returning the cluster name, set region to that cluster name.

forcedDomain

 

Set forcedDomain to the name of an existing domain to force Content Gateway to use that domain for S3 requests regardless of the incoming Host or X-Forwarded-Host header. This allows S3 clients to use gateway hostnames or IP addresses as the endpoint instead of requiring the endpoint to be a domain name. The S3 clients must use the "bucket in path" style of access for all requests, not the “bucket in Host” style. This feature is supported since v7.10.7.

[metering] 

This section configures usage metering, which is optional. See Content Metering

enabled

false

The feature must be explicitly enabled.

flushIntervalSeconds

300 (5 minutes)

How frequently to send usage reports to Elasticsearch. Minimum is 10 seconds. The default value is optimized for the resolution of the queries.

retentionDays

100 (days)

How long to retain usage records. Minimum is 2 days. Allow for additional storage space if significantly increasing the retention period.

storageSampleIntervalSeconds

3600 (1 hour)

How frequently to sample the disk usage. Minimum is 900 (15 minutes). Larger values reduce the query workload on Elasticsearch.

[caching]

This section configures cache expiration. Times are in seconds. To disable, set it to 0.

authRefresh

300

Time before authorization is revalidated with a request to the identity management system.

tokenRefresh

300

Time before an authentication token is revalidated with a request to the administration domain.

idsysRefresh

300

Time an IDSYS document, or its nonexistence, is cached in memory.

policyRefresh

300

Time a tenant, domain, or bucket Policy document, or its nonexistence, is cached in memory.

xformRefresh

300

Time an XFORM document, or its nonexistence, is cached in memory.

metadataRefresh

300

Time that metadata for a tenant, domain, or bucket, or its nonexistence, is cached in memory.
This includes the owner for a tenant/domain/bucket and whether a bucket exists.

domainExistenceRefresh

300

Time that the knowledge of a domain's existence or nonexistence is cached.

socketTimeout

10 (seconds)

Default timeout starting from v8.1.0. Set to -1 to disable this configuration.

[quota]

This section configures storage and network usage quotas. See Setting Quotas

The Gateway regularly refreshes the cache of quota information using an Elasticsearch query against usage metrics when enabled; it changes the quota state and performs the action specified by policy if any quota limit is reached. 

enabled

false

The feature must be explicitly enabled.

minRefreshDeadline

60

The global limits on the speed of quota data refreshing. To increase the precision of the usage data, lower these values. To reduce the load on Elasticsearch, increase these values.

To optimize the load on Elasticsearch, Gateway refreshes with a dynamic algorithm: slower when metrics are still far from the limit and faster when the limit approaches, slower when approaching a limit and faster as the overage nears an end. The minimum and maximum deadlines refer to the caps to apply to this refresh rate (no faster and no slower than these values).

maxRefreshDeadline

3600

numRefreshThreads

4

The number of threads in the pool that continuously look at the most urgent deadlines in the queue and perform the refreshes (Elasticsearch queries) as needed.

maxRefreshRetries

3

The number of times a refresh can fail due to a failing Elasticsearch query before an error is logged and the refresh is dropped.

maxQueueSize

10000

Maximum queue size for scope quota evaluations. The internal implementation uses a deadline queue and, If the queue is overflowed, the least urgent items are pushed out of the queue.

queryTTL

maxRefreshDeadline

This avoids unnecessary load on Elasticsearch by allowing the results of a quota check performed when a scope (tenant, domain, bucket) is accessed to be cached for this period of time. If the time since last access is less that this value, the scope is not scanned in the background. Setting this parameter to 0 disables the access caching function.

refreshRetryDelay

10

Number of seconds to wait before retrying a refresh after the previous failed due to a failing Elasticsearch query.

refreshIdleSleep

3

Seconds to wait after finishing the work in a queue and before starting again.

smtpHost

localhost

Required. The hostname or IP address of the SMTP server that sends the email notifications.

smtpPort

25

Optional. The port where the SMTP server listens.

smtpUser
smtpPassword



Optional. The user and password to authenticate with SMTP server.

mailFrom

donotreply@localhost

Email address for the sender of the notification.

mailSubjectTemplate

Quota state change notification

Email templates for subject line and body. These variables can be used in both the subject line and message body templates.

  • %metric%

  • %state%

  • %contextType%

  • %contextName%

The %xxx% strings render current values when the message is generated.

mailTemplate

Metric %metric% changed to %state% state in %contextType% %contextName%.

[dynamic_features]

Any configuration settings appear in this Dynamic Features section if optional, dynamic features such as Video Clipping for Partial File Restore are installed. (v11.0)

resultObjectLifetime

5

In days. Sets a lifepoint to trigger clean up of any JSON result objects for video clips are created asynchronously.

[folder_listings]

The section configures options related to object listings.

usePaths

A setting introduced in Gateway v7.10.0 to improve S3 delimiter listing performance. The default is true as of Gateway v8.0.3. Also see usePathsMaxDirs.

It requires Swarm 14.1 or later with search.enableDelimiterPaths set to True. This is the default for new Swarm 15.0+ clusters, see Settings Reference.
Set this setting explicitly if upgrading from Swarm 14.1.

  • Set swarmctl -C search.enableDelimiterPaths -V True and create a new search feed. See also, Add Search Feed.
    The new feed can use the same Elasticsearch cluster if it has at least half of its disk space free.

  • Make it default after completion and restart gateways with [folder_listings] usePaths=true in gateway.cfg.
    Now, this makes the top-level delimiter listings (used by Veeam and S3 Browser) faster.

usePathsMaxDirs

A setting introduced in Gateway v8.0.3 that determines which delimiter listings are affected by usePaths=true. It defaults to 5, which means that it affects delimiter listings without a prefix or with the prefix having up to five subdirectories.

[cluster_admin]

This section configures options related to the Service Proxy.

enabled

false

Enables the Service Proxy functionality.

bindAddress

<IP | hostname>

Specifies the IP address or host name where Service Proxy listens for incoming storage cluster management API and Metering Query requests.

bindPort

91

Specifies the port where Service Proxy listens. By convention, this is port 91.

externalHTTPPort
externalHTTPSPort

<port>
<port>

Optional, one or both. Allows Gateway to be used either behind a proxy or within a Docker environment, taking effect when X-Forwarded-Proto is found on the request. Gateway uses X-Forwarded-Proto to determine which port to use. (v5.4)

platformHost

<IP | hostname>

Required for Platform Server if running Service Proxy/Swarm UI on a standalone Gateway.
See Configuring Swarm for Platform Server

testMode

true | false

Enables testMode when troubleshooting, which stops obfuscation of the backend Swarm Storage and Elasticsearch node IPs.

[metrics]

This section configures the metrics server that gateway exposes for Prometheus. Prometheus is configured to poll /metrics on this address and port. Metrics are prefixed with caringo_gateway.

metricsEnabled

true

Metrics is enabled by default

metricsPort

9100

Port for Prometheus to poll

metricsHost

0.0.0.0

Address the metrics server bind to. 0.0.0.0, by default, refers to all IP addresses. This can be configured to a private address if Prometheus can connect to it. (v7.10.6)

[debug]

This section contains configuration that Support might ask to be temporarily enabled for diagnosis:

debugConnLeaks

true

Set this to true as directed by DataCore Support to diagnose connection pool or stuck thread issues.

Setting Ports for Docker or Proxies

Gateway manages communications through assigned ports. Gateway is configured to run either within a Docker environment or behind a proxy as of release 5.4. The configuration has two settings (externalHTTPPortexternalHTTPSPort) per protocol: [scsp] and [cluster_admin], the Service Proxy. These settings take effect when X-Forwarded-Proto appears on the request.

SCSP, S3, and Service Proxy request each route to the correct port. Browser requests must use the correct port:

Content UI

/_admin/portal

SCSP port

[scsp]

Swarm UI

/_admin/storage

Service Proxy port

[cluster_admin]

Gateway can redirect users if they attempt to access a UI on the wrong port; to accomplish this,

  • The load balancer must set X-Forwarded- headers, which Gateway uses to determine which port to use

  • Configure externalHTTP[S]Port correctly in gateway.cfg

Example Load Balancer Setup

Example Settings in gateway.cfg

Example Load Balancer Setup

Example Settings in gateway.cfg

If an HAProxy load balancer at haproxy.example.com is proxying requests for SCSP and S3 (on a shared port) and for Service Proxy:

...then expose both HTTP and HTTPS
in these sections:



Redirection: This is how redirection is achieved given the example above. A user incorrectly attempts to access /_admin/storage on the SCSP/S3 port exposed by HAProxy.

HAProxy proxies this request to Gateway's SCSP port as:

Gateway SCSP knows that it does not handle /_admin/storage requests and that /_admin/storage is handled by the [cluster_admin] port, so it responds with a redirect to the [cluster_admin] externalHTTPSPort (because X-Forwarded-Protocol specifies HTTPS; otherwise, it uses externalHTTPPort).

Enabling the Service Proxy

For most implementations, one Gateway is dedicated to running as Service Proxy to support cluster administration (using Swarm UI and Management API), and a pool of additional Gateways handles all content management at scale. For test or lightly used clusters, enable both cluster administration and content management on a single Gateway instance.

On the Gateway instance that runs as Service Proxy, make the following changes to the configuration (gateway.cfg file):

[cluster_admin]

enabled=true

Enables the Service Proxy functionality.

bindAddress=<IP|hostname>

Specifies the IP address or host name where Service Proxy listens for incoming storage cluster management API and Metering Query requests.

bindPort=91

Specifies the port where Service Proxy listens. By convention, this is port 91.

externalHTTPPort=<port>
externalHTTPSPort=<port>

Optional, one or both. Allows Gateway to be used either behind a proxy or within a Docker environment, taking effect when X-Forwarded-Proto is found on the request. Gateway uses X-Forwarded-Proto to determine which port to use. (v5.4)

platformHost=<IP|hostname>

platformPort=<port>

Required for Platform Server if running Service Proxy/Swarm UI on a standalone Gateway. 

See Configuring Swarm for Platform Server

testMode=<true|false>

Enables testMode when troubleshooting, which stops obfuscation of the backend Swarm Storage and Elasticsearch node IPs.

[storage_cluster]

managementPort=91

Specifies the port where Swarm listens for storage cluster management API requests. By convention, this is port 91.

managementUser=<Swarm·admin·user>

Specifies the user known to Swarm allowed to perform management API requests against the storage cluster.

managementPassword=<Swarm·admin·password>

Specifies the password of the managementUser.

[s3]

enabled=false

[scsp] 

enabled=false

Authentication and authorization for the Service Proxy use Content Gateway's root IDSYS and root Policy. The root Policy must grant all actions to the storage administrator users and/or groups:

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443816826 and https://perifery.atlassian.net/wiki/spaces/public/pages/2443816981

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.