Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

If you want to enumerate a an entire cluster and you have an Search (Indexer) Feed already configured, you may use the indexer-enumerator.sh script from the support tools bundle to do so. If you have a simply

For a smaller query, it might be easier to use the Content UI Portal if you already have it portal (if it’s installed on a Content Gateway). This script is for enumerating potentially large data sets where the UI may not would be as less helpful.

Tips

  • You can

...

  • run the script with

...

  • bash -

...

  • xto get examples of the curl syntax that you can adapt for your own custom indexer calls

...

Instructions

This is not an exhaustive overview of the script, but an in-depth example of how you can use this script to investigate what is in your cluster.

Info

I have the environmental variable SCSP_HOST set to a storage node IP to avoid having to put -a [storage-node-ip] on every example below

Run indexer-enumerator.sh -D to find out what domains exist in your cluster.

...

  • .

  • You can search by domain, bucket, prefix, size, date written, and type of object.

  • When you have the match you want, you can remove the -orc options and from there output the object match results to file.

Note

Be careful to run this script from a directory/partition with plenty of disk space if you are returning millions of objects.

For full enumerations of larger data sets, you may want to add the -s option to echo the enumerator loop count. Each call to the indexer has a maximum of 10k returned values, so knowing how many iterations of that 10k figure the script has returned is valuable for larger enumerations.

Instructions

This is an extended example of how you can use this script to investigate what is in your cluster.

Info

The environmental variable SCSP_HOST is set to a storage node IP to avoid having to put -a [storage-node-ip] on every example below.

Table of Contents
minLevel3
maxLevel3

Listing domains

Run indexer-enumerator.sh -D to find out what domains exist in your cluster.

Code Block
[root@c-csn1 tmp]# indexer-enumerator.sh -D
A complete domain listing can be found here: ./OUTPUTDIR-2020_0722-124732/domains.txt

Since this output is likely pretty short in most cases, I can do the command with Because a domain listing should be short, I use the -or options to output the results to stdout:

Code Block
[root@c-csn1 tmp]# indexer-enumerator.sh -D -or

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

...

Counting objects and space usage

Now I know the domains but not what’s in them. Next, to find out how many objects are in each of these domains and domain and how much space each takes. For that, I will use combine the -c option along with the -d ALL option. :

Code Block
languagebash
[root@c-csn1 tmp]# indexer-enumerator.sh -d ALL -c

Enumerating all domains in the cluster:
A complete domain listing can be found here: ./OUTPUTDIR-2020_0722-124949/domains.txt
test1.c-csn1.enfield.com/ has 3147 unique matching objects of stype: all, withreps, uses 458.44MB disk space
caringodrive.c-csn1.enfield.com/ has 20 unique matching objects of stype: all, withreps, uses 156.55MB disk space
filefly-c-csn1.enfield.com/ has 1597 unique matching objects of stype: all, withreps, uses 9.32GB disk space
c-csn1-test1.enfield.com/ has 1114 unique matching objects of stype: all, withreps, uses 971.29MB disk space
c-csn1-admindomain/ has 38 unique matching objects of stype: all, withreps, uses 382.00bytes disk space
m-csn4.enfield.com/ has 8 unique matching objects of stype: all, withreps, uses 184.14MB disk space
nfstest1.enfield.com/ has 19 unique matching objects of stype: all, withreps, uses 13.59MB disk space
filefly-s3-target.c-csn1.enfield.com/ has 8217 unique matching objects of stype: all, withreps, uses 656.16MB disk space
es-backups.enfield.com/ has 41360 unique matching objects of stype: all, withreps, uses 3.69GB disk space
c-csn1.enfield.com/ has 129 unique matching objects of stype: all, withreps, uses 2.12GB disk space
bob.is.great.com/ has 11 unique matching objects of stype: all, withreps, uses 10.81MB disk space
s3-compatible/ has 5 unique matching objects of stype: all, withreps, uses 5.86MB disk space
c-csn1-cfs1.enfield.com/ has 9853 unique matching objects of stype: all, withreps, uses 259.00MB disk space
c-csn1-s3-target.enfield.com/ has 76 unique matching objects of stype: all, withreps, uses 428.23MB disk space


Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 65594 unique matching objects of stype: all, withreps, uses 18.21GB disk space

This gives me a good idea of what’s in my cluster. The only thing

Counting untenanted objects

What it does not show me are the untenanted streams- streams objects (those not in a any domain). Older clusters may not have ANY any domains and so all of the streams may objects would be untenanted. Newer clusters may will have almost all streams tenanted… newer clusters may even have most or all objects tenanted and use enforceTenancy=true in the cluster configuration requiring all streams to be to ensure thatall objects are in a domain.

We can see if we have any untenanted streams objects by using the -t option. I will again use the -c option just to get a count of the # number of streamsobjects.

Code Block
languagebash
[root@c-csn1 tmp]# indexer-enumerator.sh -t -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

Untenanted streams enumerated: 9 unique objects, withreps, uses 101.44KB disk space

We can see by the above, we do not have many untenanted streams By this, I learn that I have only 9 untenanted objects in this particular cluster.

Counting buckets

Going back to the all domains output, I see the c-csn1-test1.enfield.com domain looks interesting to me because the domain name doesn’t give me a good idea what’s in it (like in the way that the filefly-c-csn1.enfield.com and es-backups.enfield.com do).

...

There appear to be 20 buckets here. You can see that those 20 buckets take no , and they seem to use no disk space. That’s because I asked for ONLY buckets only bucket objects, which don’t take up data. If I want to To see how much data resides in inside a particular bucket, I would need to do a query on that bucket. Also, there might be unnamed streams objects that live in this domain (ie, streams that that is, are named by UUID and do not live in a bucket whose name is a unique UUID).

Let’s see what buckets exist in this domain (not just count them, as we did above):

Code Block
languagebash
[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -B

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/.TOKEN
c-csn1-test1.enfield.com/Bucket15917374547579_0
c-csn1-test1.enfield.com/Bucket15917374547579_1
c-csn1-test1.enfield.com/Bucket15917374547579_2
c-csn1-test1.enfield.com/Bucket15917374547579_3
c-csn1-test1.enfield.com/Bucket15917374547579_4
c-csn1-test1.enfield.com/Bucket15917374547579_5
c-csn1-test1.enfield.com/Bucket15917374547579_6
c-csn1-test1.enfield.com/Bucket15917374547579_7
c-csn1-test1.enfield.com/Bucket15917374547579_8
c-csn1-test1.enfield.com/Bucket15917374547579_9
c-csn1-test1.enfield.com/Bucket15917383799242_0
c-csn1-test1.enfield.com/Bucket15917383799242_1
c-csn1-test1.enfield.com/Bucket15917383799242_2
c-csn1-test1.enfield.com/Bucket15917383799242_3
c-csn1-test1.enfield.com/Bucket15917383799242_4
c-csn1-test1.enfield.com/pants
c-csn1-test1.enfield.com/10kbuckettest
c-csn1-test1.enfield.com/superpants
c-csn1-test1.enfield.com/20200622


c-csn1-test1.enfield.com/ has 20 unique objects of stype: bucket, withreps, uses 0 disk space.

Searching objects

I see that I have a bucket named “pants”. Let’s see how many streams objects live in my pants bucket.

Code Block
languagebash
[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -b pants -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/pants has 3 unique objects of stype: all, withreps, uses 11.83KB disk space.

Since As there are only three, I will output them to stdout (keeping the -or flags and removing the -c flag):

Code Block
[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -b pants

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/pants/vimium-options.json
c-csn1-test1.enfield.com/pants/vimium-options-2020-mbp.json
c-csn1-test1.enfield.com/pants/plugins.txt


c-csn1-test1.enfield.com/pants has 3 unique objects of stype: all, withreps, uses 11.83KB disk space. 

I keep using the -c option because I could potentially make a query that returns millions if not billions of results. Certainly I don’t want to do that right now. OkFrom the above, I see from the above that I have 3 files in that bucket.

Since Because that domain should be an FQDN that resolves to the Content Gateway (or Swarm cluster, if not using a Content Gateway), I can just curl info any of these files to see more information:

...

I can see that I wrote this stream on June 19, 2020. This bucket isn’t all that exciting to me at this point because now I see it doesn’t have very many streams in it. I am going to start poking around a bit more.object recently, and there aren’t many objects in this bucket. I am going to poke around more.

Searching unnamed objects

So, if that bucket doesn’t contain a majority of the streams objects in my domain, what bucket does? Or, perhaps unnamed streams objects are the majority of my streamsobjects. We can search only for unnamed streams objects like so using the -u option:

...

We might be tempted to use the -t option for “untenanted” streamsobjects, because untenanted streams objects are always unnamed, but these streams objects ARE tenanted (meaning, they live in a domain) but are also unnamed. Therefore, using -d [domain] -t will error.

Ok, we have 79 unnamed streams that objects that live in c-csn1-test1.enfield.com. I want to get a few examples of these to show you what unnamed streams objects in a domain look like, but I don’t want to output all 79 to stdout. I will use the -u -1 -M 5 options to say “only send a single request for results (-1) and only return 5 items (-5) in that single request, and only return unnamed (-u) streamsobjects:

Code Block
languagebash
[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -u -1 -M 5

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/7f7c9ecde7f265ac7dd4ba81e4388540
c-csn1-test1.enfield.com/f5b214774783d8bcc91ceae67c50a080
c-csn1-test1.enfield.com/e0896cec233e382c17840ae1c7d92054
c-csn1-test1.enfield.com/6fe0dbcdbf8bc538250f655dd152b5fd
c-csn1-test1.enfield.com/3122faaa7d02f9f7438702bf6bedb6ff


c-csn1-test1.enfield.com/ has 79 unique objects of stype: unnamed, withreps, uses 80.33KB disk space.

I can now curl any of these streams objects if I wanted to see their headers:

Code Block
languagebash
[root@c-csn1 tmp]# curl -IL c-csn1-test1.enfield.com/3122faaa7d02f9f7438702bf6bedb6ff -ucaringoadmin:caringo
HTTP/1.1 200 OK
Date: Wed, 22 Jul 2020 18:42:48 GMT
Gateway-Request-Id: FE7629C67527A767
Server: CAStor Cluster/11.2.0
Via: 1.1 c-csn1-test1.enfield.com (Cloud Gateway SCSP/6.4.0)
Gateway-Protocol: scsp
Castor-System-CID: 21876415934a554d1072804cfc776e10
Castor-System-Cluster: c-csn1.enfield.com
Castor-System-Created: Tue, 23 Jun 2020 15:07:45 GMT
Content-Type: application/x-www-form-urlencoded
Last-Modified: Tue, 23 Jun 2020 15:07:45 GMT
X-Last-Modified-By-Meta:
X-Owner-Meta:
x-bob-meta-apples: dunkin
ETag: "3122faaa7d02f9f7438702bf6bedb6ff"
Castor-System-Domain: c-csn1-test1.enfield.com
Volume: fa52b18e98d6164c5c0b700bba9652bb
Content-MD5: Ep4TEA3HwH8cOehCM1zZIQ==
Content-Length: 412

Searching metadata

Notice I have a metadata header called “x-bob-meta-apples” with value of “dunkin”. That’s interesting to me. I wonder if I have that metadata elsewhere in this domain:

...

The -m and -v options together show me that indeed I do have 43 matching streamsobjects. I wonder if I have any other streams objects that match the header but not necessarily that value. For this test, I simply remove the -v dunkin part of the command:

...

Since I have more results here, I know that I have that header with a different value.

Searching across multiple domains

One of the more powerful things about the indexer-enumerator.sh is that I can search across all domains, not just one domain. Let’s see how many streams objects matching that metadata header I have across my whole cluster. For this query, I change the domain name to “ALL” and I am just going to get a count match by using the -c option again:

...

That shows me 4 different domains (although it doesn’t show me untenanted streams objects that may match) have streams objects with that metadata. I can then narrow the search down to match that particular header value “dunkin”:

...

Ah! This shows me that all of the streams objects matching that header have a value of either “dunkin” or “donuts”.

Searching by age

What if I was only interested in streams objects written long ago? Maybe I want to find all streams objects written x days ago so that I can delete them…

Let’s get a single stream object from the matching output above and then do a curl INFO.

...

I can see that it was written on June 23 of this year. Were ALL of the streams objects written this year matching that header written this year? We can check by using the -G 1 and -g 1 options:

...

Yes, they were all written this year. Since the stream object example we had was written on June 23 (today is July 22), I can do some further narrowing down based on my example. June 23 was 29 days ago from when I am running these examples:

Code Block
[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 29 -c

Only enumerating streams written at least 29 day(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 14 unique objects of stype: all, withreps, uses 14.64KB disk space.

14 streams objects matched that, and our test stream object in particular you can see matches as expected:

...

Let’s back out a little bit and add another option.

Search by size

How about if we were looking for small files with that same metadata. Let’s try to match streams objects about the same size as our example above - which was 858 bytes. To that end, I will add the -l 859 (l for “littler”) option to our query.

Code Block
[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 29 -l 859 -c

Only streams smaller than 859 bytes are listed.
Only enumerating streams written at least 29 day(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 12 unique objects of stype: all, withreps, uses 10.11KB disk space.

Ok, 12 of the 14 streams objects were smaller than 858 bytes. Nice to know. I want to verify that my example stream object is in that result set as a sanity check. The UUID started with e, so let’s use yet another option- the prefix match. I will add -p e to match any stream object starting with “p” in its name/UUID. I will remove the -c option so that I am actually seeing the match:

Code Block
[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 29 -l 859 -p e

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/e0896cec233e382c17840ae1c7d92054

Only streams smaller than 859 bytes are listed.
Only streams with names (or UUIDs) starting with "e" are listed.
Only enumerating streams written at least 29 day(s) ago

c-csn1-test1.enfield.com/ has 1 unique objects of stype: all, withreps, uses 1.67KB disk space.
[root@c-csn1 tmp]#

Sure enough, there’s our streamobject!

Now, how about if I want to match all streams objects larger than that stream object across all domains, matching that same header, written more than 29 days ago. I will use the capital L option and change the domain to “ALL”:

Code Block
 [root@c-csn1 tmp]# indexer-enumerator.sh -ro -d ALL -m x-bob-meta-apples -v dunkin -f 29 -L 859 -c

Enumerating all domains in the cluster:

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

test1.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
caringodrive.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-test1.enfield.com/ has 2 unique matching objects of stype: all, withreps, uses 4.53KB disk space
c-csn1-admindomain/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
m-csn4.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
nfstest1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-s3-target.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
es-backups.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
bob.is.great.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
s3-compatible/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-cfs1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-s3-target.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space


Only streams larger than 859 bytes are listed.
Only enumerating streams written at least 29 day(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 2 unique matching objects of stype: all, withreps, uses 4.53KB disk space

I can see only 2 streams objects match. I can remove the -c option and get those results if I wanted.

Hopefully the above gives you a good understanding of how the indexer-enumerator.sh script works and the power of its flexibility. You can search by domain, bucket, prefix, size, date written and type of stream. When you have decided you have the match you want, you can remove the -orc options and from there output the stream match results to file. Be careful to run this script from a directory/ partition with plenty of disk space if you are returning millions of streams. For full enumerations of larger data sets, you may want to add the -s option to echo the enumerator loop count. Each call to the indexer has a maximum of 10k returned values, so knowing how many iterations of that 10k figure the script has returned is valuable for larger enumerations.

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@957
showSpacefalse
sortmodified
typepage
reversetrue
labelselasticsearch
cqllabel = "elasticsearch" and type = "page" and space = "KB"

...

hiddentrue

...