Overview
Listing Cache (LC) is a performance optimization feature designed to improve the speed of listing large datasets within Swarm storage. It works by caching pseudo-folder listings, reducing the time and resource consumption required to fetch and display object listings repeatedly.
The Listing Cache solves a scalability problem with the gateway's delimited folder listing functionality. To determine if a folder has subfolders, an Elasticsearch query has to enumerate all objects with the folder name as a prefix to their object names. This can run into the millions of objects for large buckets. When such queries are issued repeatedly and at high frequencies, the resulting CPU use brings an entire Elasticsearch cluster to a halt.
Limitations
Client-specific binding: Bound to a dedicated client, with no cross-gateway sharing allowed. Once you decide to serve 1 or more domains on a listing-cache enabled gateway it must serve all requests to those domain(s) exclusively. This is achieved by configuring your load-balancer with dedicated host based traffic redirection rules.
Non-persistent cache: The disk/memory cache is discarded by default on restart.
Limited lifecycle and recursive deletion support: No support for bucket lifecycle policies, delete lifepoints, or recursive deletes. All writes and deletes must originate from the gateway.
Memory constraints: Caching large volumes of data can quickly consume system memory. Misconfiguring cache sizes can lead to memory exhaustion or excessive eviction, reducing cache effectiveness.
Delimiters support: Custom delimiters are not yet supported, only forward slash "/".
Replication support: <pending engineering feedback>, do not setup replication when LC is enabled.
Not supported functionalities: Custom delimiters, S3 lifecycles, and recursive deletes.
Prerequisites
The Listing Cache can be enabled on gateway 8.1.2 or above. Ensure the following prerequisites are met before deploying Listing Cache:
Hardware Requirements:
8 vCPUs
16GB RAM
200GB dedicated partition formatted with XFS
Load Balancing Configuration:
Hardcode domains to a single gateway with Listing Cache (LC).
Info
Shared gateway support is currently not available.
Assuming you are using recommended settings, you will need to do the following:
Set Java Memory Heap
vim /etc/sysconfig/cloudgateway
HEAP_MIN="12228m"
HEAP_MAX="12228m"
Create disk cache partition
vgcreate swarmspool /dev/sdb
lvcreate -L 195G -n diskcache swarmspool
mkfs.xfs /dev/swarmspool/diskcache
mount /dev/swarmspool/diskcache /var/spool/caringo/
Persist it by adding at the end of /etc/fstab
/dev/mapper/swarmspool-diskcache /var/spool/caringo xfs defaults 0 0
Do Not Use Listing Cache If:
You use multipart S3 operations.
You use custom delimiters in search queries.
You need the ability to do recursive deletes of domains and buckets.
You use S3 lifecycle policies.
You need support for the delete lifepoints.
You do not use pseudo folders or all objects are in a single pseudo folder.
How to Enable Listing Cache
The procedure to enable Listing Cache in Swarm is outlined below:
Add in the /etc/caringo/cloudgateway/gateway.cfg.
[storage_cluster] disableListingCache=false
After testing in a staging environment, roll out the Listing Cache to production by deploying the necessary configurations and code changes.
Monitor performance impact closely during the rollout phase.
Optional. Pre-warm the cache with commonly accessed listings before enabling it in production, so the initial requests are served from the cache.
How Does Listing Cache Work
Ensure Sufficient Disk Space: Listing Cache stores each folder in a separate SQLite database, which consumes disk space. Provide ample disk space to avoid frequent evictions of folder databases, as this impacts performance.
Automatic Folder Detection: Listing Cache automatically learns about folders through ongoing list, write, and delete requests. No manual intervention is required to create or manage databases for each folder.
Monitor Cache Population: Initially, for any new folder, the cache starts with an "infinite gap," meaning it has no data cached and queries Elasticsearch. Over time, as more listings are cached, the gap reduces until the folder is fully cached and can be served without querying Elasticsearch.
Real-Time Cache Updates: Ongoing write and delete requests are intercepted and used to keep the folder databases updated, ensuring the cache remains consistent with the actual data.
LRU-Based Eviction: The system automatically evicts the least recently used (LRU) databases when disk space is full. If a folder's database is evicted and later requested, the cache process restarts for that folder.
Disk Space Directly Impacts Performance: The more disk space available, the fewer evictions occur, allowing more folders to remain fully cached and reducing the need for frequent Elasticsearch queries.
Prepare for Elasticsearch Querying: In case of cache misses or folder database evictions, Elasticsearch will be queried. Ensure that Elasticsearch is properly configured to handle such requests, especially during periods of high cache turnover.
How to Determine If the Listing Cache is Working Correctly
Monitor Cache Hit Rate
If you have telemetry and Grafana available, check the Listing Cache dashboard.
Check Response Time
Compare the response time before and after enabling the Listing Cache. Reduced response times, particularly for frequently requested folder listings, indicate the cache functions correctly.
Resource Utilization
Monitor memory usage and CPU utilization. Increased memory usage and steady CPU activity are normal in a caching system, but excessively high CPU or memory usage may indicate misconfiguration.
Deployment Steps
Follow these steps to deploy the Listing Cache:
Step 1: Prepare the Environment
Provision a server with the specified hardware requirements.
Ensure the server’s 200GB partition is formatted with the XFS file system.
Verify network connectivity to other components of the S3 environment.
Step 2: Configure Load Balancer
Modify load-balancing rules to hardcode domains to a single gateway.
Ensure that all LC-enabled domains point to the appropriate gateway.
Test the load balancer configuration to confirm proper routing.
Step 3: Install and Configure Listing Cache
Download the LC installation package from the designated repository.
Install the package on the prepared server.
Configure LC settings according to your environment’s specifications:
Set up domain-specific configurations.
Enable pseudo folder support as required.
Step 4: Validate Deployment
Perform basic functionality tests:
Verify data retrieval and storage through LC.
Test operations within pseudo folders.
Check system logs for any errors or warnings.
Monitor performance metrics to ensure hardware is sufficient.
Step 5: Go Live
Enable LC for production workloads.
Monitor system performance and address any issues promptly.
Post-deployment Recommendations
Regularly monitor LC’s performance and resource utilization.
Plan for updates as new features and improvements are released.
Document any environment-specific configurations for future reference.
HAProxy Configuration for LC-Enabled Gateway
Below is the suggested generic HAProxy configuration tailored for a Listing Cache (LC)-enabled gateway.
This configuration is designed for HAProxy version 2.2 and higher.
Failover without failback is enabled for Listing Cache. Since restarting LC clears its cache, it is optimal to only failover if the gateway becomes unavailable.
SCSP traffic is not routed to the Listing Cache. The configuration is primarily intended for handling S3 traffic.
Specific domains are redirected to the LC-enabled gateway, while all other traffic is routed to the regular non-cached pool.
The following is an example /etc/haproxy.conf
file.
global log 127.0.0.1 local2 alert chroot /var/lib/haproxy stats socket /var/lib/haproxy/stats mode 660 level admin stats timeout 30s user haproxy group haproxy daemon ca-base /etc/pki/ca-trust/ crt-base /etc/haproxy/certs ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS:!3DES ssl-default-bind-options no-sslv3 maxconn 2048 tune.ssl.default-dh-param 2048 defaults log global mode http option forwardfor option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 130000 frontend HTTP_IN bind *:80 name *:80 option http-keep-alive acl acl_is_http req.proto_http http-request redirect scheme https if acl_is_http frontend stats mode http bind 0.0.0.0:8404 stats enable stats uri /stats stats refresh 10s stats admin if LOCALHOST frontend HTTPS_IN bind *:443 name *:443 ssl alpn h2,http/1.1 crt /etc/haproxy/certs/wildcard.acme.com.pem mode http option http-keep-alive option httplog acl acl_is_content_ui path -m beg /_admin/portal acl acl_awsauth hdr_sub(Authorization) -i AWS acl acl_aws path_reg -i (?<=[?&])(AWSAccessKeyId|X-Amz-Credential)= # Define an acl per domain you want to send to LC acl acl_is_domain_a hdr(host) -i domaina.acme.com use_backend POOL-S3-listingcache if acl_is_domain_a use_backend POOL-S3 if acl_awsauth || acl_aws use_backend POOL-scsp if is_content_ui backend POOL-scsp mode http balance leastconn stick-table type ip size 50k expire 30m stick on src http-reuse safe server GW01 10.11.21.33:8080 check inter 10s server GW02 10.11.21.34:8080 check inter 10s backend POOL-S3-listingcache balance source stick-table type ip size 50k expire 24d stick on src option httpchk http-check connect http-check send meth HEAD uri / ver HTTP/1.1 hdr Host haproxy-healthcheck http-check expect status 403 server GW03 10.11.21.35:8090 check inter 10s fall 3 rise 2 server GW04 10.11.21.36:8090 check inter 10s fall 3 rise 2 backup backend POOL-S3 balance leastconn stick-table type ip size 50k expire 30m stick on src option httpchk http-check connect http-check send meth HEAD uri / ver HTTP/1.1 hdr Host haproxy-healthcheck http-check expect status 403 server GW01 10.11.21.33:8090 check inter 10s fall 3 rise 2 server GW02 10.11.21.34:8090 check inter 10s fall 3 rise 2
Metrics
caringo_listingcache_request (Summary) Request counts and latencies for write/delete/list, versioned/nonversioned. Labels: method=[write, delete, list], mode=[V, NV] caringo_listingcache_request_errors (Counter) Request error counts for write/delete/list, versioned/nonversioned. Labels: method=[write, delete, list], mode=[V, NV] caringo_listingcache_listed_recs (Counter) Total number of records returned by the listing cache, versioned/nonversioned. Labels: mode=[V, NV] caringo_listingcache_backend_query (Summary) Counts and latencies of ES queries for priming/listing, versioned/nonversioned. Labels: method=["list", "prime"], mode=[V, NV] caringo_listingcache_backend_query_recs (Counter) Number of ES records queried for priming/listing, versioned/nonversioned. Labels: method=["list", "prime"], mode=[V, NV] caringo_listingcache_cache_query (Summary) Counts and latencies of SqliteDB queries for priming/listing, versioned/nonversioned. Labels: method=["list", "prime", "reconciliation"], mode=[V, NV] caringo_listingcache_cache_query_recs (Counter) Number of SqliteDB records queried for priming/listing, versioned/nonversioned. Labels: method=["list", "prime", "reconciliation"], mode=[V, NV] caringo_listingcache_flushes_pending (Gauge) Folder updates pending flush to SqliteDB disk cache. caringo_listingcache_flushes_done (Counter) Folder updates flushed to SqliteDB disk cache. caringo_listingcache_trims_pending (Gauge) Folders pending trim in memory cache. caringo_listingcache_trims_done (Counter) Folders trimmed in memory cache. caringo_listingcache_folder_pulls_pending (Gauge) Folders marked to be internally pulled into cache. caringo_listingcache_folder_pulls_done (Counter) Folders internally pulled into cache. caringo_listingcache_mem_cached (Gauge) Folders currently in memory cache. caringo_listingcache_mem_evicted (Counter) Folders evicted from memory cache. caringo_listingcache_dbhandle_cached (Gauge) SqliteDB handles currently in memory cache. caringo_listingcache_dbhandle_evicted (Counter) SqliteDB handles evicted from memory cache. caringo_listingcache_disk_cached (Gauge) SqliteDBs currently in disk cache. caringo_listingcache_disk_evicted (Counter) Folders evicted from disk cache. caringo_listingcache_disk_cached_bytes (Gauge) Size in bytes of SqliteDBs currently in disk cache. caringo_listingcache_disk_evicted_bytes (Counter) Size in bytes of SqliteDBs evicted from disk cache. caringo_listingcache_reconciliations_done (Counter) Number of cache records reconciled (versionid mismatches corrected based on etag). Labels: origin=[backend,cache] caringo_listingcache_memory_used (Gauge) Memory use as perceived by the listing cache. caringo_listingcache_disk_free (Gauge) Disk free space as perceived by the listing cache.