Application Best Practices

The following are important concepts and approaches for building optimal integrations to Swarm.

Use an HTTP or S3 Library

The Content Gateway is used to handle redirects internally and allows using it with a modern HTTP library or S3 SDK using SCSP or S3 API rather than connecting an application directly to Swarm nodes. Gateway provides additional functionalities such as authentication, authorization, and Swarm node connection pooling.

For example: It is suggested to use the popular ‘requests’ library with Python. Use the official AWS SDKs on Java, C#, or Go programming language if an application uses the S3 API. Or to make SCSP requests, use a standard HTTP library like Apache HttpClient on Java.

Protect Data in Transit

The Content-MD5 metadata header provides an end-to-end message content integrity check (excluding metadata) of an object as it is sent and returned from Swarm.

A client application can:

  • Check this header to detect modification of the object's body in transit.

  • Provide this header to have Swarm compute and check it when storing or returning the data.

Swarm computes an MD5 digest during data transfer and then compares the computed digest to the one provided in the header if a Content-MD5 header is present on POST. Swarm returns a 400 Bad Request error response, abandons the object, and closes the client connection if the hashes do not match.

Content-MD5 headers are stored with the object metadata and returned on all subsequent GET or HEAD requests. Swarm computes the hash as the bytes are read if a Content-MD5 header is included with a GET request. The connection is closed before the last bytes are transmitted, which is the standard method to indicate something went wrong with the transfer if the computed and provided hashes do not match.

The Content-MD5 header provides an extra level of insurance, protecting against potential damage in transit as well as from damage while in storage.

See Configuring Swarm Storage for configuration parameters and how to edit the configuration files.

See Content-MD5 Checksums.

See Lifepoint Metadata Headers for more information about lifecycle management.

See Content Integrity Assurance.

Use Multithreading

Swarm is a multithreaded, multi-node cluster. Every node in a storage cluster can establish and maintain connections with many different client applications at the same time. Normally an application opens one SCSP connection to the cluster and sends requests and receive responses in a sequential manner. A single client application may choose to open more than one connection to Swarm to achieve better response times and read or write throughput for high-volume applications.

This multithreaded client strategy can be very effective in improving overall performance when necessary because Swarm automatically load balances requests by causing them to be redirected to a less busy node in the cluster that is capable of servicing the request. Each client thread (or process) can be connected to different nodes within the cluster.

Maintain One Open Connection

The Swarm software implements HTTP/1.1 persistent connections. That means a client application is not required to close the socket or connection after each request. Swarm holds connections open and allows the client to continue sending requests and receiving responses until either the client closes the connection explicitly, or it stops sending requests for some period of time.

The client must close the connection and reopen a new connection when a Swarm response includes the header Connection:close. This is performed when there is an error that causes confusion as to the meaning of the remaining bytes sent over the connection.

Have the client maintain one open connection at a time using one of these methods:

  • Close the old connection before opening the new, redirected connection.

  • Maintain a pool of connections to several nodes in the cluster. The pool approach can considerably improve response times because the client eventually has open connections to all nodes for smaller clusters.

Caution

Do not to exceed the operating system limits on the number of simultaneously open connections for very large storage clusters.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.