Listing Cache

Overview

Listing Cache (LC) is a performance optimization feature designed to improve the speed of listing large datasets within Swarm storage. It works by caching pseudo-folder listings, reducing the time and resource consumption required to fetch and display object listings repeatedly.

The Listing Cache solves a scalability problem with the gateway's delimited folder listing functionality. To determine if a folder has subfolders, an Elasticsearch query has to enumerate all objects with the folder name as a prefix to their object names. This can run into the millions of objects for large buckets. When such queries are issued repeatedly and at high frequencies, the resulting CPU use brings an entire Elasticsearch cluster to a halt.

Limitations

Client-specific binding: Bound to a dedicated client, with no cross-gateway sharing allowed. Once you decide to serve 1 or more domains on a listing-cache enabled gateway it must serve all requests to those domain(s) exclusively. This is achieved by configuring your load-balancer with dedicated host based traffic redirection rules.
Non-persistent cache: The disk/memory cache is discarded by default on restart.
Limited lifecycle and recursive deletion support: No support for bucket lifecycle policies, delete lifepoints, or recursive deletes. All writes and deletes must originate from the gateway.
Memory constraints: Caching large volumes of data can quickly consume system memory. Misconfiguring cache sizes can lead to memory exhaustion or excessive eviction, reducing cache effectiveness.
Delimiters support: Custom delimiters are not yet supported, only forward slash "/".
Replication support: <pending engineering feedback>, do not setup replication when LC is enabled.
Not supported functionalities: Custom delimiters, S3 lifecycles, and recursive deletes.

Prerequisites

The Listing Cache can be enabled on gateway 8.1.2 or above. Ensure the following prerequisites are met before deploying Listing Cache:

Hardware Requirements:
- 8 vCPUs
- 16GB RAM
- 200GB dedicated partition formatted with XFS
Load Balancing Configuration:
- Hardcode domains to a single gateway with Listing Cache (LC).

Info

Shared gateway support is currently not available.

Assuming you are using recommended settings, you will need to do the following:

Set Java Memory Heap

vim /etc/sysconfig/cloudgateway

HEAP_MIN="12228m"
HEAP_MAX="12228m"

Create disk cache partition

vgcreate swarmspool /dev/sdb
lvcreate -L 195G -n diskcache swarmspool
mkfs.xfs /dev/swarmspool/diskcache
mount /dev/swarmspool/diskcache /var/spool/caringo/

Persist it by adding at the end of /etc/fstab

/dev/mapper/swarmspool-diskcache /var/spool/caringo xfs defaults 0 0

Do Not Use Listing Cache If:

You use multipart S3 operations.
You use custom delimiters in search queries.
You need the ability to do recursive deletes of domains and buckets.
You use S3 lifecycle policies.
You need support for the delete lifepoints.
You do not use pseudo folders or all objects are in a single pseudo folder.

How to Enable Listing Cache

The procedure to enable Listing Cache in Swarm is outlined below:

Add in the /etc/caringo/cloudgateway/gateway.cfg.

[storage_cluster]
disableListingCache=false

After testing in a staging environment, roll out the Listing Cache to production by deploying the necessary configurations and code changes.
Monitor performance impact closely during the rollout phase.
Optional. Pre-warm the cache with commonly accessed listings before enabling it in production, so the initial requests are served from the cache.

How Does Listing Cache Work

Ensure Sufficient Disk Space: Listing Cache stores each folder in a separate SQLite database, which consumes disk space. Provide ample disk space to avoid frequent evictions of folder databases, as this impacts performance.
Automatic Folder Detection: Listing Cache automatically learns about folders through ongoing list, write, and delete requests. No manual intervention is required to create or manage databases for each folder.
Monitor Cache Population: Initially, for any new folder, the cache starts with an "infinite gap," meaning it has no data cached and queries Elasticsearch. Over time, as more listings are cached, the gap reduces until the folder is fully cached and can be served without querying Elasticsearch.
Real-Time Cache Updates: Ongoing write and delete requests are intercepted and used to keep the folder databases updated, ensuring the cache remains consistent with the actual data.
LRU-Based Eviction: The system automatically evicts the least recently used (LRU) databases when disk space is full. If a folder's database is evicted and later requested, the cache process restarts for that folder.
Disk Space Directly Impacts Performance: The more disk space available, the fewer evictions occur, allowing more folders to remain fully cached and reducing the need for frequent Elasticsearch queries.
Prepare for Elasticsearch Querying: In case of cache misses or folder database evictions, Elasticsearch will be queried. Ensure that Elasticsearch is properly configured to handle such requests, especially during periods of high cache turnover.

How to Determine If the Listing Cache is Working Correctly

Monitor Cache Hit Rate
- If you have telemetry and Grafana available, check the Listing Cache dashboard.
Check Response Time
- Compare the response time before and after enabling the Listing Cache. Reduced response times, particularly for frequently requested folder listings, indicate the cache functions correctly.
Resource Utilization
- Monitor memory usage and CPU utilization. Increased memory usage and steady CPU activity are normal in a caching system, but excessively high CPU or memory usage may indicate misconfiguration.

Deployment Steps

Follow these steps to deploy the Listing Cache:

Step 1: Prepare the Environment

Provision a server with the specified hardware requirements.
Ensure the server’s 200GB partition is formatted with the XFS file system.
Verify network connectivity to other components of the S3 environment.

Step 2: Configure Load Balancer

Modify load-balancing rules to hardcode domains to a single gateway.
Ensure that all LC-enabled domains point to the appropriate gateway.
Test the load balancer configuration to confirm proper routing.

Step 3: Install and Configure Listing Cache

Download the LC installation package from the designated repository.
Install the package on the prepared server.
Configure LC settings according to your environment’s specifications:
- Set up domain-specific configurations.
- Enable pseudo folder support as required.

Step 4: Validate Deployment

Perform basic functionality tests:
- Verify data retrieval and storage through LC.
- Test operations within pseudo folders.
Check system logs for any errors or warnings.
Monitor performance metrics to ensure hardware is sufficient.

Step 5: Go Live

Enable LC for production workloads.
Monitor system performance and address any issues promptly.

Post-deployment Recommendations

Regularly monitor LC’s performance and resource utilization.
Plan for updates as new features and improvements are released.
Document any environment-specific configurations for future reference.

HAProxy Configuration for LC-Enabled Gateway

Below is the suggested generic HAProxy configuration tailored for a Listing Cache (LC)-enabled gateway.

This configuration is designed for HAProxy version 2.2 and higher.
Failover without failback is enabled for Listing Cache. Since restarting LC clears its cache, it is optimal to only failover if the gateway becomes unavailable.
SCSP traffic is not routed to the Listing Cache. The configuration is primarily intended for handling S3 traffic.
Specific domains are redirected to the LC-enabled gateway, while all other traffic is routed to the regular non-cached pool.

The following is an example /etc/haproxy.conf file.

global
    log 127.0.0.1 local2 alert
    chroot /var/lib/haproxy
    stats socket /var/lib/haproxy/stats mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

    ca-base /etc/pki/ca-trust/
    crt-base /etc/haproxy/certs

    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS:!3DES
    ssl-default-bind-options no-sslv3
    maxconn 2048
    tune.ssl.default-dh-param 2048

defaults
    log     global
    mode    http
    option  forwardfor
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  130000

frontend HTTP_IN
    bind *:80 name *:80
  option http-keep-alive
  acl acl_is_http req.proto_http
    http-request redirect scheme https if acl_is_http

frontend stats
    mode http
    bind 0.0.0.0:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST

frontend HTTPS_IN
    bind *:443 name *:443 ssl alpn h2,http/1.1 crt /etc/haproxy/certs/wildcard.acme.com.pem
  mode http
  option http-keep-alive
  option httplog

    acl acl_is_content_ui path -m beg /_admin/portal
  acl acl_awsauth hdr_sub(Authorization) -i AWS
    acl acl_aws path_reg -i (?<=[?&])(AWSAccessKeyId|X-Amz-Credential)=
    # Define an acl per domain you want to send to LC
  acl acl_is_domain_a hdr(host) -i  domaina.acme.com

  use_backend POOL-S3-listingcache if acl_is_domain_a
  use_backend POOL-S3 if acl_awsauth || acl_aws
  use_backend POOL-scsp if is_content_ui

backend POOL-scsp
    mode http
    balance leastconn
    stick-table type ip size 50k expire 30m  
    stick on src
    http-reuse safe 
    server GW01 10.11.21.33:8080 check inter 10s
    server GW02 10.11.21.34:8080 check inter 10s

backend POOL-S3-listingcache
     balance source
   stick-table type ip size 50k expire 24d  
     stick on src

   option httpchk
     http-check connect
     http-check send meth HEAD uri / ver HTTP/1.1 hdr Host haproxy-healthcheck
     http-check expect status 403

   server GW03 10.11.21.35:8090 check inter 10s fall 3 rise 2  
     server GW04 10.11.21.36:8090 check inter 10s fall 3 rise 2  backup

backend POOL-S3
    balance leastconn
    stick-table type ip size 50k expire 30m  
    stick on src  

    option httpchk
    http-check connect
    http-check send meth HEAD uri / ver HTTP/1.1 hdr Host haproxy-healthcheck
    http-check expect status 403

    server GW01 10.11.21.33:8090 check inter 10s fall 3 rise 2
    server GW02 10.11.21.34:8090 check inter 10s fall 3 rise 2

Metrics

caringo_listingcache_request (Summary)
        Request counts and latencies for write/delete/list, versioned/nonversioned.
        Labels: method=[write, delete, list], mode=[V, NV]

caringo_listingcache_request_errors (Counter)
        Request error counts for write/delete/list, versioned/nonversioned.
        Labels: method=[write, delete, list], mode=[V, NV]

caringo_listingcache_listed_recs (Counter)
        Total number of records returned by the listing cache, versioned/nonversioned.
        Labels: mode=[V, NV]

caringo_listingcache_backend_query (Summary)
        Counts and latencies of ES queries for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime"], mode=[V, NV]

caringo_listingcache_backend_query_recs (Counter)
        Number of ES records queried for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime"], mode=[V, NV]

caringo_listingcache_cache_query (Summary)
        Counts and latencies of SqliteDB queries for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime", "reconciliation"], mode=[V, NV]

caringo_listingcache_cache_query_recs (Counter)
        Number of SqliteDB records queried for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime", "reconciliation"], mode=[V, NV]

caringo_listingcache_flushes_pending (Gauge)
        Folder updates pending flush to SqliteDB disk cache.

caringo_listingcache_flushes_done (Counter)
        Folder updates flushed to SqliteDB disk cache.

caringo_listingcache_trims_pending (Gauge)
        Folders pending trim in memory cache.

caringo_listingcache_trims_done (Counter)
        Folders trimmed in memory cache.

caringo_listingcache_folder_pulls_pending (Gauge)
        Folders marked to be internally pulled into cache.

caringo_listingcache_folder_pulls_done (Counter)
        Folders internally pulled into cache.

caringo_listingcache_mem_cached (Gauge)
        Folders currently in memory cache.

caringo_listingcache_mem_evicted (Counter)
        Folders evicted from memory cache.

caringo_listingcache_dbhandle_cached (Gauge)
        SqliteDB handles currently in memory cache.

caringo_listingcache_dbhandle_evicted (Counter)
        SqliteDB handles evicted from memory cache.

caringo_listingcache_disk_cached (Gauge)
        SqliteDBs currently in disk cache.

caringo_listingcache_disk_evicted (Counter)
        Folders evicted from disk cache.

caringo_listingcache_disk_cached_bytes (Gauge)
        Size in bytes of SqliteDBs currently in disk cache.

caringo_listingcache_disk_evicted_bytes (Counter)
        Size in bytes of SqliteDBs evicted from disk cache.

caringo_listingcache_reconciliations_done (Counter)
        Number of cache records reconciled (versionid mismatches corrected based on etag).
        Labels: origin=[backend,cache]

caringo_listingcache_memory_used (Gauge)
        Memory use as perceived by the listing cache.

caringo_listingcache_disk_free (Gauge)
        Disk free space as perceived by the listing cache.