The Content Gateway platform architecture is comprised of hardware and software components.
Client applications and users
Client applications use the Gateway over a network and communicate via the SCSP storage protocol or another protocol such as S3 translated by a protocol personality. Client applications include common web browsers such as Firefox and Chrome, or software from third-party ISVs, or custom in-house software developed by clients. Users are people utilizing client application software communicating with Gateway. The implication is users are making use of one or more client applications to interact with Gateway.
Load balancer
Load balancer appliances are a common method for automatically distributing client requests across all Gateways and for excluding Gateways offline due to failure or maintenance when deploying more than one Gateway. Load balancers may optionally implement upstream features such as SSL/TLS end-point termination, protocol firewall rules, quality of service, and geographic traffic management.
The Gateway appears as a normal web proxy to the upstream load balancers and, since the Gateways are stateless, it is not necessary to implement session affinity on the load balancers. Load balancing schemes such as weighted round-robin that prefer to dispatch to the most responsive Gateways are a good choice.
The load-balancing layer is optional for Gateway, and, while hardware appliance load balancers can be well-suited for sites with heavy traffic and sophisticated operational requirements, it is also possible to implement this layer using virtual machines or modest hardware running open-source software. For example, the Pound reverse proxy running on Linux provides transport encryption and load balancing with Layer 7 inspection capabilities.
Protocol personalities
Protocol personalities are optional protocol translators allowing client applications to communicate using a different storage protocol than SCSP. By translating communications in this manner, all client applications, regardless of the protocol they use, share the same content in the back-end cluster and they utilize a common authentication/ACL scheme. An analogy for this universal storage protocol access is ODBC for database communications.
The SCSP and S3 protocol handling is implemented natively within the Gateway. Third-party or user-developed personalities may also be added to, or in front of, the Gateway server.
Gateway
The Gateway is a value-added front-end for the Swarm storage cluster. At the core, it is a stateless reverse proxy deployed in an n+1 configuration for horizontal scaling and high availability. The value-added features provided by the Gateway enhance the Swarm SCSP client protocol and provide storage management and protection for the back-end storage cluster.
These value-added features include:
Authentication for users and access control for content
Usage metering of storage and bandwidth
Audit logging for client operations
Automatic object metadata transformation rules
Cluster node pool management for load balancing and handling offline nodes
Reverse proxy to handle SCSP redirects locally to optimize and simplify client communications
Token-based authentication
Multi-part MIME uploads
The Gateway is a Java software component running within a Jetty servlet container and provides front-end HTTP web services to client applications. The Gateway servers are typically deployed with dual-homed network interfaces to provide proxy services between a front-end client network and a private, back-end storage network.
The Gateway provides protocol isolation and performs SCSP protocol inspection for the incoming client storage requests before passing them along to the storage cluster. This allows for, among other things, the implementation of business rules for content metadata, access control and administrative override for tenant content, and audit/billing event logging.
The Gateway implements the following authentication and request authorization logic:
The following Swarm features cannot be used when communicating through the Gateway when using the Gateway front-end for Swarm:
Integrity Seal hash-type upgrades
Trailing Content-MD5 headers
DEPRECATED Swarm legacy auth/auth mechanism
Swarm legal hold mechanism
Metadata search servers
Elasticsearch servers provide a NoSQL data query engine enabling metadata searching with the Swarm storage cluster. The query engine software allows for n+1 deployments providing horizontal scalability for load sharing and high availability.
Swarm storage cluster
The Swarm storage cluster provides a scalable object storage engine for the Gateway platform. The storage engine consists of standard x86 hardware that manages and protects storage for multiple tenants.
See Storage Implementation and Swarm Storage Cluster.
Remote replication
The Swarm replication feeds between clusters can be used with Gateway in the following deployment scenarios:
Clusters communicate directly with each without passing through Gateway
The gateway acts as a front-end reverse proxy for a cluster
Direct communication between the clusters happens through internal routing rules between the storage networks or over a VPN connection between the storage networks. The key aspect of this communication is no inter-cluster traffic passes through the Gateway.
The allowSwarmAdminIP
setting must be configured in the [scsp]
section of the Gateway's configuration file if the Gateway is to act as a front-end reverse proxy for a storage cluster that is the target of a Swarm replication feed from another cluster. The value is the IP address list or prefix of every replication source contacting this Gateway.