Content-MD5 checksums provide an end-to-end message integrity check of the content (excluding metadata) as it is sent to and returned from Swarm. A proxy or client can check the Content-MD5 header to detect modifications to the entity-body while in transit. A client can provide this header to indicate Swarm should compute and check it as it is storing or returning the object data.
See SCSP Headers.
Client-Provided Content-MD5
During a POST or PUT, the client can provide the following Content-MD5 header as specified in section 14.15 of the HTTP/1.1 RFC:
Content-MD5 = "Content-MD5" ":" md5-digest
where md5-digest is the base64 of the 128-bit MD5 digest (see RFC 1864 for more information).
The md5-digest is computed based on the content of the entity body, including any content coding that was applied, but not including any transfer-encoding applied to the message body.
If this header is present, Swarm computes an MD5 digest during data transfer and then compares the computed digest to the digest provided in the header.
When completed, the Content-MD5 data is stored with the object and returned with the GET or HEAD request.
If the hashes do not match, Swarm returns a 400 Bad Request error response, abandons the object, and closes the client connection.
Swarm-Provided Content-MD5
Another way to associate a Content-MD5 value with an object is to have Swarm compute the ContentMD5 for the body data of the request. To do this, include the gencontentmd5 query argument in the request. Swarm returns the Content-MD5 as a header in the 201 Created response. Once computed, the Content-MD5 data is stored with the object and returned as a response header for any subsequent GET or HEAD requests. Note that the gencontentmd5 query argument replaces use of the "Expect: Content-MD5" request header, which is deprecated per RFC 2731. (v9.2)
Tip
The Swarm setting scsp.autoContentMD5Computation automates Content-MD5 hashing. The gencontentmd5 query argument or the deprecated Expect: Content-MD5 header on writes does not need to be included (although a separate Content-MD5 header may want to be supplied for content integrity checking). This setting is ignored wherever it is invalid, such as on a multipart initiate/complete or an EC APPEND. (v9.1)
Ranges - When including ?gencontentmd5 on a GET request with a Range header, any Content-MD5 header stored with the object is omitted in the response headers. Instead, a Content-MD5 of the selected range is returned as a trailing header to the GET request.
For details about Range headers, see section 14.35 (Range) in the HTTP/1.1 RFC.
Validation failures
Because of the way Swarm reports a hash validation failure, SCSP reading operations that request a Content-MD5 hash validation and for which there is a hash mismatch causes a storage node to be removed for the Gateway's connection pool temporarily.
Storing Content-MD5 Headers
Content-MD5 headers are stored with the object metadata and returned on all subsequent GET or HEAD requests.
If a Content-MD5 header is included with a GET request, Swarm computes the hash as the bytes are read, regardless of whether the header was originally stored with the object
If the computed and provided hashes do not match, the connection is closed before the last bytes are transmitted, which is the standard way to indicate something went wrong with the transfer.
Content-MD5 and Replication
When providing the gencontentmd5 query argument in a request on a replicated object, the following applies:
On a write request (POST, PUT, COPY, or APPEND), the Content-MD5 is calculated, stored with the object, and returned as a response header for that write operation.
The Content-MD5 is always returned for any GET or HEAD request that was written with the gencontentmd5 query argument.
When including ?gencontentmd5 on a range read (a GET request with the Range header), Swarm suppresses any stored Content-MD5 from the response headers and instead return a Content-MD5 for the requested range as a trailing header.
Content-MD5 and Erasure-Coding
When providing the gencontentmd5 query argument in request on an erasure-coded object, the following applies:
The APPEND operation is no longer supported. If providing a gencontentmd5 query argument on an APPEND, it returns a 400 Bad Request error response.
The COPY operation is only supported if providing a gencontentmd5 query argument on the existing object's write. Otherwise the COPY operation fails.
For a range read (a GET request with the Range header), Swarm suppresses any stored Content-MD5 from the response headers and instead return a Content-MD5 for the requested range as a trailing header.