Search Feeds
Adding a Search Feed
To add a Search feed in the cluster, click the +Add button at the top of the Feeds page and then add a Search feed. Verify the [storage cluster] managementPassword
is set properly in the gateway.cfg file if errors are encountered during feed creation. Correct the value and restart the gateway service if a change is needed.
Naming: Swarm applies a naming scheme guaranteeing Elasticsearch index names are always unique, within and across clusters. Swarm creates a new Elasticsearch index and alias for that feed if a second search feed is created through the UI. (v9.0)
Multiples: Swarm allows the creation of more than one Search feed facilitating the transition from using one Elasticsearch cluster to another. During the transition, continue using the primary feed for queries; the second feed is incomplete until it fully clears its backlog. When the second feed is caught up, transition to it (apply Make primary to the second feed) as soon as reasonable for current operations. Delete the original feed once the new primary feed target is verified as working. Having multiple feeds is usually for temporary use only because every feed incurs cluster activity, even when paused.
Important
Restart all Gateway servers to pick up the new feed. No restarts are needed if only a feed definition is updated.
The following table describes the data entry fields in the dialog box.
ID (Existing feeds) | Read-only; system-assigned identifier |
---|---|
Status (Existing feeds) | Read-only; the current feed processing state. Primary - Flags the Search feed used for all search queries. Only one feed can be Primary. Set from the Feeds command menu. |
Name | The name is attached to this feed. |
Batch Size | Defaults to 100. The maximum number of objects sent concurrently to be processed. |
Batch Timeout (Seconds) | Defaults to 1. The maximum amount of time (in seconds) before a batch is resent to be processed after a timeout. |
Search Full Metadata | Enabled - (default) Swarm storage indexes all object metadata, including baseline and custom metadata fields. See Metadata Field Matching for a list of baseline and custom fields. |
Server Host(s) or IP(s) | The IP addresses or server names are resolvable by DNS. Separate with a comma or space if entering more than one. ImportantRefresh the feed to prevent it becoming blocked if the list of ES servers on an active feed is changed. |
Server Port | Defaults to 9200. The default port for a host. |
Alias (Existing feeds) | Read-only; system-assigned name by which Elasticsearch references the Swarm feed. |
Changing Default Search Feed
You may need to change your default search feed to another search feed due to any of the following reasons:
Decommissioning an older Elasticsearch cluster.
A new search feed is required for Per Domain Indexing.
Before changing the default search feed, ensure that both search feeds are synchronized and up to date. Typically, the newer feed should be nearly 100% complete as shown in the UI.
To confirm the consistency of the feeds, use the es-count-docs-all-indices.sh
script from the Support Tools bundle. This script counts the number of documents in each domain for both feeds. Comparing the results will help ensure that the new feed is ready to become the default.
When ready, switch to the new feed by applying "Make primary" to the second feed. Do this at a time that minimizes disruption to current operations.
If the new Elasticsearch feed points to a different set of IP addresses (indicating that new ES nodes were deployed for this feed), update the IP addresses in the /etc/caringo/cloudgateway/gateway.cfg
file on each Content Gateway:
indexerHosts = [ list of ES nodes ]
After updating this parameter (or even if no changes are needed), restart the cloudgateway service on each Content Gateway to apply the changes:
systemctl restart cloudgateway
You can delete the original feed once the new primary feed is verified as working. However, it is best practice to keep both feeds active for at least a week to ensure no issues arise with your applications. This allows you to revert to the old feed if needed.
Using Feed Actions
Clicking on an existing search feed in the Feeds list opens its Feed Settings page, with the existing settings populated. The gear icon menu at the top right supports multiple feed actions, appropriate to the type of feed:
Pause / Resume | Occasionally it is desirable to pause feed processing to perform system maintenance. Pause the search feed before stopping the Elasticsearch service in the search cluster. Return to the action menu and select the Resume action to resume feed processing after completing system maintenance. |
---|---|
Make Primary | Select the 'Make primary' option from the feed actions menu to change which search feed is the primary feed used for all search queries for Search feeds only. |
Refresh | Object data is sent to the feed target in near real-time (NRT) as they are are written or updated. Any objects unable to be processed immediately are retried each HP cycle until successful, at which point they are marked as complete and are not resent. Select the Refresh option from the feed action menu, which verifies and rehydrates all previously sent content to the Elasticsearch cluster, if a data loss failure occurs on the remote feed target and a restore from backup cannot be completed. This process takes some time, as it must revisit all objects in the cluster. For search feeds, if an Elasticsearch index for the cluster does not exist, it is created. To recreate an existing index 'fresh' (such as for case-insensitive searching where case-sensitive was previously used), drop the existing index before refreshing the feed. |
Delete | When deleting a feed, it frees source cluster resources. To delete a feed, select the Delete option from the feed action menu and verify intention to permanently delete the feed. The deleted feed is removed from the remaining cluster nodes within 60 seconds. Delete the search data previously sent by the feed if desired. |
View Feed Table | Displays the SNMP Repository Dump for the selected node, for feed diagnostics and troubleshooting (see below). |
Best Practice
While Swarm allows multiple search feeds, there is a limit of 8 that may match any domain. For optimal performance and scalability, it is best to avoid creating an excessive number of search feeds.
Troubleshooting Feeds
Feed Diagnostics: To troubleshoot the blocked feed, double-click it to open its settings page, click the gear icon, and select View feed table, which displays the SNMP Repository Dump for the selected node. (v2.0)
Review the feedPluginState status to identify the blockage.
Idle Feeds: A feed can appear to be idle with items still queued for processing. Plan for the fact that feed status reporting is a best-effort snapshot, not a low-latency or guaranteed transaction mechanism.
Feed Prioritization: Domain and bucket context objects are prioritized for all types of feeds; this improves usability when initiating remote sites.
Retries for Blocked Feeds: Blocked feeds are retried every 20 minutes, but if the definition for a blocked feed is changed, it triggers an immediate attempt with the new definition, which may clear the blockage. (v10.1)
Blocked Search Feeds: Swarm marks the feed with the status Blocked and messages report it is missing if Swarm cannot find the Elasticsearch index associated with a search feed. Delete the feed and recreate it with the same settings if the search index is gone.
Related content
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.