Swarm FAQ
- Anjali Tyagi
- Bala Harish A(AC)
- Donald Baker (Deactivated)
On this page, you will find answers to the most commonly asked questions by customers.
No. There is no such thing as caching. The data is fully protected when the write request completes. Unfortunately, we don't have the same performance as products that do this sort of caching, but these products often cannot handle the sustained throughput that Swarm can due to problems that occur when such a cache eventually overflows.
For EC encoding, we use k and p values mostly from the zfec (erasure coding) module. The EC literature will also mention an "m" value, which is just k + p.
When an EC object is written, there are k data segments and p parity segments. Data is striped into those segments so that each segment is written incrementally during an EC write and there is relatively little buffering. Those segments are distributed throughout the cluster to minimize data loss or data inaccessibility during outages. No two segments are ever put on the same volume, but there can be some "doubling up" of segments on the same chassis, but never more than k. In larger clusters, we strive for one segment per node. This distribution occurs on the original write and is maintained by the health processor over the lifetime of the object. The health processor maintains the k+p segments during failures and lost segments (for whatever reason). On read, only k segments are needed to reconstruct the data. Generally, we strive for performance when choosing which k of the k+p will be used.
The PAN/SAN largely applies to how clients interact with the cluster. Once the SAN is chosen, say on an EC write request, the segment writes and manifest writes are orchestrated, and there are no redirections during those writes as each node has a model of the cluster and its resources and level of busyness.
The overlay index keeps the records of each EC segment, but they are treated like any other object in the Swarm cluster. Overlay interaction for a GET of a particular stream (replica or segment) results in two UDP round-trips. The first round-trip is to the node with the overlay index for the stream. This gives us the most likely location(s) of the stream in the cluster. A second round-trip gets current bid information for the replica. When the SAN performs an EC GET, we usually look for the k lowest bids and then attempt to read those to assemble the EC object. There are a number of caveats and potential for retrying here, as segments may become unavailable during the request.
Smaller objects are "wholly replicated", meaning that the cluster will have two (or often three) replicas written at the same time to different volumes on different chassis. Like with EC, we maintain proper replica counts and distribution over the lifetime of the object. On reads, we only need to choose one replica (usually the one with the lowest bid) to service the request. The PAN then redirects to the SAN, and the object is served from there.
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.