Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 20 Next »

Overview

Listing Cache (LC) is a performance optimization feature designed to improve the speed of listing large datasets within Swarm storage. It works by caching pseudo-folder listings, reducing the time and resource consumption required to fetch and display object listings repeatedly.

The Listing Cache solves a scalability problem with the gateway's delimited folder listing functionality. To determine if a folder has subfolders, an Elasticsearch query has to enumerate all objects with the folder name as a prefix to their object names. This can run into the millions of objects for large buckets. When such queries are issued repeatedly and at high frequencies, the resulting CPU use brings an entire Elasticsearch cluster to a halt.

Limitations

  • Client-specific binding: Bound to a dedicated client, with no cross-gateway sharing allowed. Once you decide to serve 1 or more domains on a listing-cache enabled gateway it must serve all requests to those domain(s) exclusively. This is achieved by configuring your load-balancer with dedicated host based traffic redirection rules.

  • Non-persistent cache: The disk/memory cache is discarded by default on restart.

  • Limited lifecycle and recursive deletion support: No support for bucket lifecycle policies, delete lifepoints, or recursive deletes. All writes and deletes must originate from the gateway.

  • Memory constraints: Caching large volumes of data can quickly consume system memory. Misconfiguring cache sizes can lead to memory exhaustion or excessive eviction, reducing cache effectiveness.

  • Delimiters support: Custom delimiters are not yet supported, only forward slash "/".

  • Replication support: <pending engineering feedback>, do not setup replication when LC is enabled.

  • Not supported functionalities: Custom delimiters, S3 lifecycles, and recursive deletes.

Prerequisites

The Listing Cache can be enabled on gateway 8.1.2 or above. Ensure the following prerequisites are met before deploying Listing Cache:

  • Hardware Requirements:

    • 8 vCPUs

    • 16GB RAM

    • 200GB dedicated partition formatted with XFS

  • Load Balancing Configuration:

    • Hardcode domains to a single gateway with Listing Cache (LC).

Info

Shared gateway support is currently not available.

Assuming you are using recommended settings, you will need to do the following:

Set Java Memory Heap

vim /etc/sysconfig/cloudgateway

HEAP_MIN="12228m"
HEAP_MAX="12228m"

Create disk cache partition

vgcreate swarmspool /dev/sdb
lvcreate -L 195G -n diskcache swarmspool
mkfs.xfs /dev/swarmspool/diskcache
mount /dev/swarmspool/diskcache /var/spool/caringo/

Persist it by adding at the end of /etc/fstab

/dev/mapper/swarmspool-diskcache /var/spool/caringo xfs defaults 0 0

Do Not Use Listing Cache If:

  1. You use multipart S3 operations.

  2. You use custom delimiters in search queries.

  3. You need the ability to do recursive deletes of domains and buckets.

  4. You use S3 lifecycle policies.

  5. You need support for the delete lifepoints.

  6. You do not use pseudo folders or all objects are in a single pseudo folder.

How to Enable Listing Cache

The procedure to enable Listing Cache in Swarm is outlined below:

  1. Add in the /etc/caringo/cloudgateway/gateway.cfg.

[storage_cluster]
disableListingCache=false
  1. After testing in a staging environment, roll out the Listing Cache to production by deploying the necessary configurations and code changes.

  2. Monitor performance impact closely during the rollout phase.

  3. Optional. Pre-warm the cache with commonly accessed listings before enabling it in production, so the initial requests are served from the cache.

How Does Listing Cache Work

  • Ensure Sufficient Disk Space: Listing Cache stores each folder in a separate SQLite database, which consumes disk space. Provide ample disk space to avoid frequent evictions of folder databases, as this impacts performance.

  • Automatic Folder Detection: Listing Cache automatically learns about folders through ongoing list, write, and delete requests. No manual intervention is required to create or manage databases for each folder.

  • Monitor Cache Population: Initially, for any new folder, the cache starts with an "infinite gap," meaning it has no data cached and queries Elasticsearch. Over time, as more listings are cached, the gap reduces until the folder is fully cached and can be served without querying Elasticsearch.

  • Real-Time Cache Updates: Ongoing write and delete requests are intercepted and used to keep the folder databases updated, ensuring the cache remains consistent with the actual data.

  • LRU-Based Eviction: The system automatically evicts the least recently used (LRU) databases when disk space is full. If a folder's database is evicted and later requested, the cache process restarts for that folder.

  • Disk Space Directly Impacts Performance: The more disk space available, the fewer evictions occur, allowing more folders to remain fully cached and reducing the need for frequent Elasticsearch queries.

  • Prepare for Elasticsearch Querying: In case of cache misses or folder database evictions, Elasticsearch will be queried. Ensure that Elasticsearch is properly configured to handle such requests, especially during periods of high cache turnover.

How to Determine if the Listing Cache is Working Correctly

  1. Monitor Cache Hit Rate

    • If you have telemetry and Grafana available, check the Listing Cache dashboard.

  2. Check Response Time

    • Compare the response time before and after enabling the Listing Cache. Reduced response times, particularly for frequently requested folder listings, indicate the cache functions correctly.

  3. Resource Utilization

    • Monitor memory usage and CPU utilization. Increased memory usage and steady CPU activity are normal in a caching system, but excessively high CPU or memory usage may indicate misconfiguration.

Deployment Steps

Follow these steps to deploy the Listing Cache:

Step 1: Prepare the Environment

  1. Provision a server with the specified hardware requirements.

  2. Ensure the server’s 200GB partition is formatted with the XFS file system.

  3. Verify network connectivity to other components of the S3 environment.

Step 2: Configure Load Balancer

  1. Modify load-balancing rules to hardcode domains to a single gateway.

  2. Ensure that all LC-enabled domains point to the appropriate gateway.

  3. Test the load balancer configuration to confirm proper routing.

Step 3: Install and Configure Listing Cache

  1. Download the LC installation package from the designated repository.

  2. Install the package on the prepared server.

  3. Configure LC settings according to your environment’s specifications:

    • Set up domain-specific configurations.

    • Enable pseudo folder support as required.

Step 4: Validate Deployment

  1. Perform basic functionality tests:

    • Verify data retrieval and storage through LC.

    • Test operations within pseudo folders.

  2. Check system logs for any errors or warnings.

  3. Monitor performance metrics to ensure hardware is sufficient.

Step 5: Go Live

  1. Enable LC for production workloads.

  2. Monitor system performance and address any issues promptly.

Post-deployment Recommendations

  • Regularly monitor LC’s performance and resource utilization.

  • Plan for updates as new features and improvements are released.

  • Document any environment-specific configurations for future reference.

Metrics

caringo_listingcache_request (Summary)
        Request counts and latencies for write/delete/list, versioned/nonversioned.
        Labels: method=[write, delete, list], mode=[V, NV]

caringo_listingcache_request_errors (Counter)
        Request error counts for write/delete/list, versioned/nonversioned.
        Labels: method=[write, delete, list], mode=[V, NV]

caringo_listingcache_listed_recs (Counter)
        Total number of records returned by the listing cache, versioned/nonversioned.
        Labels: mode=[V, NV]

caringo_listingcache_backend_query (Summary)
        Counts and latencies of ES queries for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime"], mode=[V, NV]

caringo_listingcache_backend_query_recs (Counter)
        Number of ES records queried for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime"], mode=[V, NV]

caringo_listingcache_cache_query (Summary)
        Counts and latencies of SqliteDB queries for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime", "reconciliation"], mode=[V, NV]

caringo_listingcache_cache_query_recs (Counter)
        Number of SqliteDB records queried for priming/listing, versioned/nonversioned.
        Labels: method=["list", "prime", "reconciliation"], mode=[V, NV]

caringo_listingcache_flushes_pending (Gauge)
        Folder updates pending flush to SqliteDB disk cache.

caringo_listingcache_flushes_done (Counter)
        Folder updates flushed to SqliteDB disk cache.

caringo_listingcache_trims_pending (Gauge)
        Folders pending trim in memory cache.

caringo_listingcache_trims_done (Counter)
        Folders trimmed in memory cache.

caringo_listingcache_folder_pulls_pending (Gauge)
        Folders marked to be internally pulled into cache.

caringo_listingcache_folder_pulls_done (Counter)
        Folders internally pulled into cache.

caringo_listingcache_mem_cached (Gauge)
        Folders currently in memory cache.

caringo_listingcache_mem_evicted (Counter)
        Folders evicted from memory cache.

caringo_listingcache_dbhandle_cached (Gauge)
        SqliteDB handles currently in memory cache.

caringo_listingcache_dbhandle_evicted (Counter)
        SqliteDB handles evicted from memory cache.

caringo_listingcache_disk_cached (Gauge)
        SqliteDBs currently in disk cache.

caringo_listingcache_disk_evicted (Counter)
        Folders evicted from disk cache.

caringo_listingcache_disk_cached_bytes (Gauge)
        Size in bytes of SqliteDBs currently in disk cache.

caringo_listingcache_disk_evicted_bytes (Counter)
        Size in bytes of SqliteDBs evicted from disk cache.

caringo_listingcache_reconciliations_done (Counter)
        Number of cache records reconciled (versionid mismatches corrected based on etag).
        Labels: origin=[backend,cache]

caringo_listingcache_memory_used (Gauge)
        Memory use as perceived by the listing cache.

caringo_listingcache_disk_free (Gauge)
        Disk free space as perceived by the listing cache.
  • No labels