Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

If you want to enumerate a cluster and you have an Indexer Feed already configured, you may use the indexer-enumerator.sh script from the support tools bundle to do so. If you have a simply query, it might be easier to use the Content UI Portal if you already have it installed on a Content Gateway. This script is for enumerating potentially large data sets where the UI may not be as helpful. You can also use run the script with “bash -x” to get examples of the curl syntax for your own custom indexer calls if the API documentation is not helpful for you.

Instructions

This is not an exhaustive overview of the script, but an in-depth example of how you can use this script to investigate what is in your cluster.

I have the environmental variable SCSP_HOST set to a storage node IP to avoid having to put -a [storage-node-ip] on every example below

Run indexer-enumerator.sh -D to find out what domains exist in your cluster.

[root@c-csn1 tmp]# indexer-enumerator.sh -D
A complete domain listing can be found here: ./OUTPUTDIR-2020_0722-124732/domains.txt

Since this output is likely pretty short in most cases, I can do the command with the -or options to output the results to stdout:

[root@c-csn1 tmp]# indexer-enumerator.sh -D -or

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

That’s great, but I have no idea what’s in these domains. I would like to find out how many objects are in each of these domains and how much space each takes. For that, I will use the -c option along with the -d ALL option.

[root@c-csn1 tmp]# indexer-enumerator.sh -d ALL -c

Enumerating all domains in the cluster:
A complete domain listing can be found here: ./OUTPUTDIR-2020_0722-124949/domains.txt
test1.c-csn1.enfield.com/ has 3147 unique matching objects of stype: all, withreps, uses 458.44MB disk space
caringodrive.c-csn1.enfield.com/ has 20 unique matching objects of stype: all, withreps, uses 156.55MB disk space
filefly-c-csn1.enfield.com/ has 1597 unique matching objects of stype: all, withreps, uses 9.32GB disk space
c-csn1-test1.enfield.com/ has 1114 unique matching objects of stype: all, withreps, uses 971.29MB disk space
c-csn1-admindomain/ has 38 unique matching objects of stype: all, withreps, uses 382.00bytes disk space
m-csn4.enfield.com/ has 8 unique matching objects of stype: all, withreps, uses 184.14MB disk space
nfstest1.enfield.com/ has 19 unique matching objects of stype: all, withreps, uses 13.59MB disk space
filefly-s3-target.c-csn1.enfield.com/ has 8217 unique matching objects of stype: all, withreps, uses 656.16MB disk space
es-backups.enfield.com/ has 41360 unique matching objects of stype: all, withreps, uses 3.69GB disk space
c-csn1.enfield.com/ has 129 unique matching objects of stype: all, withreps, uses 2.12GB disk space
bob.is.great.com/ has 11 unique matching objects of stype: all, withreps, uses 10.81MB disk space
s3-compatible/ has 5 unique matching objects of stype: all, withreps, uses 5.86MB disk space
c-csn1-cfs1.enfield.com/ has 9853 unique matching objects of stype: all, withreps, uses 259.00MB disk space
c-csn1-s3-target.enfield.com/ has 76 unique matching objects of stype: all, withreps, uses 428.23MB disk space


Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 65594 unique matching objects of stype: all, withreps, uses 18.21GB disk space

This gives me a good idea of what’s in my cluster. The only thing it does not show me are the untenanted streams- streams not in a domain. Older clusters may not have ANY domains and all of the streams may be untenanted. Newer clusters may have almost all streams tenanted… newer clusters may even have enforceTenancy=true in the cluster configuration requiring all streams to be in a domain.

We can see if we have any untenanted streams by using the -t option. I will again use the -c option just to get a count of the # of streams.

[root@c-csn1 tmp]# indexer-enumerator.sh -t -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

Untenanted streams enumerated: 9 unique objects, withreps, uses 101.44KB disk space

We can see by the above, we do not have many untenanted streams in this particular cluster.

Going back to the all domains output, I see the c-csn1-test1.enfield.com domain looks interesting to me because the domain name doesn’t give me a good idea what’s in it (like the filefly-c-csn1.enfield.com and es-backups.enfield.com do).

So, let’s drill down into that domain by using the -d c-csn1-test1.enfield.com option.

How many buckets live in here?

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -B -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 20 unique objects of stype: bucket, withreps, uses 0 disk space.

There appear to be 20 buckets here. You can see that those 20 buckets take no space. That’s because I asked for ONLY buckets which don’t take up data. If I want to see how much data resides in a particular bucket, I would need to do a query on that bucket. Also, there might be unnamed streams that live in this domain (ie, streams that do not live in a bucket whose name is a unique UUID). Let’s see what buckets exist in this domain (not just count them as we did above):

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -B

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/.TOKEN
c-csn1-test1.enfield.com/Bucket15917374547579_0
c-csn1-test1.enfield.com/Bucket15917374547579_1
c-csn1-test1.enfield.com/Bucket15917374547579_2
c-csn1-test1.enfield.com/Bucket15917374547579_3
c-csn1-test1.enfield.com/Bucket15917374547579_4
c-csn1-test1.enfield.com/Bucket15917374547579_5
c-csn1-test1.enfield.com/Bucket15917374547579_6
c-csn1-test1.enfield.com/Bucket15917374547579_7
c-csn1-test1.enfield.com/Bucket15917374547579_8
c-csn1-test1.enfield.com/Bucket15917374547579_9
c-csn1-test1.enfield.com/Bucket15917383799242_0
c-csn1-test1.enfield.com/Bucket15917383799242_1
c-csn1-test1.enfield.com/Bucket15917383799242_2
c-csn1-test1.enfield.com/Bucket15917383799242_3
c-csn1-test1.enfield.com/Bucket15917383799242_4
c-csn1-test1.enfield.com/pants
c-csn1-test1.enfield.com/10kbuckettest
c-csn1-test1.enfield.com/superpants
c-csn1-test1.enfield.com/20200622


c-csn1-test1.enfield.com/ has 20 unique objects of stype: bucket, withreps, uses 0 disk space.

I see that I have a bucket named “pants”. Let’s see how many streams live in my pants bucket.

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -b pants -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/pants has 3 unique objects of stype: all, withreps, uses 11.83KB disk space.

Since there are only three, I will output them to stdout (keeping the -or flags and removing the -c flag):

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -b pants

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/pants/vimium-options.json
c-csn1-test1.enfield.com/pants/vimium-options-2020-mbp.json
c-csn1-test1.enfield.com/pants/plugins.txt


c-csn1-test1.enfield.com/pants has 3 unique objects of stype: all, withreps, uses 11.83KB disk space. 

I keep using the -c option because I could potentially make a query that returns millions if not billions of results. Certainly I don’t want to do that right now. Ok, I see from the above that I have 3 files in that bucket.

Since that domain should be an FQDN that resolves to the Content Gateway (or Swarm cluster if not using a Content Gateway), I can just curl info any of these files to see more information:

[root@c-csn1 tmp]# curl -IL c-csn1-test1.enfield.com/pants/plugins.txt -ucaringoadmin:caringo
HTTP/1.1 200 OK
Date: Wed, 22 Jul 2020 18:28:05 GMT
Gateway-Request-Id: ED6BE75CEE440295
Server: CAStor Cluster/11.2.0
Via: 1.1 c-csn1-test1.enfield.com (Cloud Gateway SCSP/6.4.0)
Gateway-Protocol: scsp
Castor-System-CID: 15a648db93dc29a6819bb256643915fc
Castor-System-Cluster: c-csn1.enfield.com
Castor-System-Created: Fri, 19 Jun 2020 21:31:53 GMT
Castor-System-Name: plugins.txt
Castor-System-Version: 1592602313.352
Content-Type: application/x-www-form-urlencoded
Last-Modified: Fri, 19 Jun 2020 21:31:53 GMT
X-Last-Modified-By-Meta: acepelon@
X-Owner-Meta: acepelon
ETag: "f877345eb91e9b72ad44d2a4480af33c"
Castor-System-Path: /c-csn1-test1.enfield.com/pants/plugins.txt
Castor-System-Domain: c-csn1-test1.enfield.com
Volume: 53a22d293eea60eb4bfaacc9933f12d6
Content-MD5: Li8xabfpx+wi+MMZFE3Uqg==
Content-Length: 616

I can see that I wrote this stream on June 19, 2020. This bucket isn’t all that exciting to me at this point because now I see it doesn’t have very many streams in it. I am going to start poking around a bit more.

So, if that bucket doesn’t contain a majority of the streams in my domain, what bucket does? Or, perhaps unnamed streams are the majority of my streams. We can search only for unnamed streams like so using the -u option:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -u -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 79 unique objects of stype: unnamed, withreps, uses 80.33KB disk space.

We might be tempted to use the -t option for “untenanted” streams, because untenanted streams are always unnamed, but these streams ARE tenanted (meaning, they live in a domain) but are also unnamed. Therefore, using -d [domain] -t will error.

Ok, we have 79 unnamed streams that live in c-csn1-test1.enfield.com. I want to get a few examples of these to show you what unnamed streams in a domain look like, but I don’t want to output all 79 to stdout. I will use the -u -1 -M 5 options to say “only send a single request for results (-1) and only return 5 items (-5) in that single request, and only return unnamed (-u) streams:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -u -1 -M 5

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/7f7c9ecde7f265ac7dd4ba81e4388540
c-csn1-test1.enfield.com/f5b214774783d8bcc91ceae67c50a080
c-csn1-test1.enfield.com/e0896cec233e382c17840ae1c7d92054
c-csn1-test1.enfield.com/6fe0dbcdbf8bc538250f655dd152b5fd
c-csn1-test1.enfield.com/3122faaa7d02f9f7438702bf6bedb6ff


c-csn1-test1.enfield.com/ has 79 unique objects of stype: unnamed, withreps, uses 80.33KB disk space.

I can now curl any of these streams if I wanted to see their headers:

[root@c-csn1 tmp]# curl -IL c-csn1-test1.enfield.com/3122faaa7d02f9f7438702bf6bedb6ff -ucaringoadmin:caringo
HTTP/1.1 200 OK
Date: Wed, 22 Jul 2020 18:42:48 GMT
Gateway-Request-Id: FE7629C67527A767
Server: CAStor Cluster/11.2.0
Via: 1.1 c-csn1-test1.enfield.com (Cloud Gateway SCSP/6.4.0)
Gateway-Protocol: scsp
Castor-System-CID: 21876415934a554d1072804cfc776e10
Castor-System-Cluster: c-csn1.enfield.com
Castor-System-Created: Tue, 23 Jun 2020 15:07:45 GMT
Content-Type: application/x-www-form-urlencoded
Last-Modified: Tue, 23 Jun 2020 15:07:45 GMT
X-Last-Modified-By-Meta:
X-Owner-Meta:
x-bob-meta-apples: dunkin
ETag: "3122faaa7d02f9f7438702bf6bedb6ff"
Castor-System-Domain: c-csn1-test1.enfield.com
Volume: fa52b18e98d6164c5c0b700bba9652bb
Content-MD5: Ep4TEA3HwH8cOehCM1zZIQ==
Content-Length: 412

Notice I have a metadata header called “x-bob-meta-apples” with value of “dunkin”. That’s interesting to me. I wonder if I have that metadata elsewhere in this domain:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 43 unique objects of stype: all, withreps, uses 3.42MB disk space.

The -m and -v options together show me that indeed I do have 43 matching streams. I wonder if I have any other streams that match the header but not necessarily that value. For this test, I simply remove the -v dunkin part of the command:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples  -c

Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 78 unique objects of stype: all, withreps, uses 3.46MB disk space.

Since I have more results here, I know that I have that header with a different value.

One of the more powerful things about the indexer-enumerator.sh is that I can search across all domains, not just one domain. Let’s see how many streams matching that metadata header I have across my whole cluster. For this query, I change the domain name to “ALL” and I am just going to get a count match by using the -c option again:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d ALL -m x-bob-meta-apples -c

Enumerating all domains in the cluster:

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

test1.c-csn1.enfield.com/ has 7 unique matching objects of stype: all, withreps, uses 7.32KB disk space
caringodrive.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-c-csn1.enfield.com/ has 42 unique matching objects of stype: all, withreps, uses 41.67KB disk space
c-csn1-test1.enfield.com/ has 78 unique matching objects of stype: all, withreps, uses 3.46MB disk space
c-csn1-admindomain/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
m-csn4.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
nfstest1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-s3-target.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
es-backups.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
bob.is.great.com/ has 6 unique matching objects of stype: all, withreps, uses 10.81MB disk space
s3-compatible/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-cfs1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-s3-target.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space


Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 133 unique matching objects of stype: all, withreps, uses 14.32MB disk space

That shows me 4 different domains (although it doesn’t show me untenanted streams that may match) have streams with that metadata. I can then narrow the search down to match that particular header value “dunkin”:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d ALL -m x-bob-meta-apples -v dunkin -c

Enumerating all domains in the cluster:

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

test1.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
caringodrive.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-c-csn1.enfield.com/ has 14 unique matching objects of stype: all, withreps, uses 14.64KB disk space
c-csn1-test1.enfield.com/ has 43 unique matching objects of stype: all, withreps, uses 3.42MB disk space
c-csn1-admindomain/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
m-csn4.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
nfstest1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-s3-target.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
es-backups.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
bob.is.great.com/ has 3 unique matching objects of stype: all, withreps, uses 5.40MB disk space
s3-compatible/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-cfs1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-s3-target.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space


Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 60 unique matching objects of stype: all, withreps, uses 8.84MB disk space

73 fewer objects. Let’s try a different header value:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d ALL -m x-bob-meta-apples -v donuts -c

Enumerating all domains in the cluster:

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

test1.c-csn1.enfield.com/ has 7 unique matching objects of stype: all, withreps, uses 7.32KB disk space
caringodrive.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-c-csn1.enfield.com/ has 28 unique matching objects of stype: all, withreps, uses 27.02KB disk space
c-csn1-test1.enfield.com/ has 35 unique matching objects of stype: all, withreps, uses 36.62KB disk space
c-csn1-admindomain/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
m-csn4.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
nfstest1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-s3-target.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
es-backups.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
bob.is.great.com/ has 3 unique matching objects of stype: all, withreps, uses 5.40MB disk space
s3-compatible/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-cfs1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-s3-target.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space


Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 73 unique matching objects of stype: all, withreps, uses 5.47MB disk space

Ah! This shows me that all of the streams matching that header have a value of either “dunkin” or “donuts”.

What if I was only interested in streams written long ago? Maybe I want to find all streams written x days ago so that I can delete them…

Let’s get a single stream from the matching output above and then do a curl INFO.

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -1 -M 1

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/e0896cec233e382c17840ae1c7d92054


c-csn1-test1.enfield.com/ has 43 unique objects of stype: all, withreps, uses 3.42MB disk space.
[root@c-csn1 tmp]# curl -IL c-csn1-test1.enfield.com/e0896cec233e382c17840ae1c7d92054 -ucaringoadmin:caringo
HTTP/1.1 200 OK
Date: Wed, 22 Jul 2020 18:57:27 GMT
Gateway-Request-Id: 58E8B3631B9490E0
Server: CAStor Cluster/11.2.0
Via: 1.1 c-csn1-test1.enfield.com (Cloud Gateway SCSP/6.4.0)
Gateway-Protocol: scsp
Castor-System-CID: 21876415934a554d1072804cfc776e10
Castor-System-Cluster: c-csn1.enfield.com
Castor-System-Created: Tue, 23 Jun 2020 15:07:45 GMT
Content-Type: application/x-www-form-urlencoded
Last-Modified: Tue, 23 Jun 2020 15:07:45 GMT
X-Last-Modified-By-Meta:
X-Owner-Meta:
x-bob-meta-apples: dunkin
ETag: "e0896cec233e382c17840ae1c7d92054"
Castor-System-Domain: c-csn1-test1.enfield.com
Volume: fa52b18e98d6164c5c0b700bba9652bb
Content-MD5: 6AspDUv0/7hEBsMFALI5Ig==
Content-Length: 858

I can see that it was written on June 23 of this year. Were ALL of the streams written this year matching that header written this year? We can check by using the -G 1 and -g 1 options:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -G 1 -c

Only enumerating streams written since 1 year(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 43 unique objects of stype: all, withreps, uses 3.42MB disk space.

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -g 1 -c

Only enumerating streams written at least 1 year(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 0 unique objects of stype: all, withreps, uses 0 disk space.

Yes, they were all written this year. Since the stream example we had was written on June 23 (today is July 22), I can do some further narrowing down based on my example. June 23 was 29 days ago from when I am running these examples:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 29 -c

Only enumerating streams written at least 29 day(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 14 unique objects of stype: all, withreps, uses 14.64KB disk space.

14 streams matched that, and our test stream in particular you can see matches as expected:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 29 | grep e08
c-csn1-test1.enfield.com/e0896cec233e382c17840ae1c7d92054 

But only 14 of 43 objects were written at least 29 days ago. Were any written more than 30 days ago?

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 30 -c

Only enumerating streams written at least 30 day(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 0 unique objects of stype: all, withreps, uses 0 disk space.

Nope.

I can further winnow my results as desired.

Let’s back out a little bit and add another option.

How about if we were looking for small files with that same metadata. Let’s try to match streams about the same size as our example above - which was 858 bytes. To that end, I will add the -l 859 (l for “littler”) option to our query.

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 29 -l 859 -c

Only streams smaller than 859 bytes are listed.
Only enumerating streams written at least 29 day(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.

c-csn1-test1.enfield.com/ has 12 unique objects of stype: all, withreps, uses 10.11KB disk space.

Ok, 12 of the 14 streams were smaller than 858 bytes. Nice to know. I want to verify that my example stream is in that result set as a sanity check. The UUID started with e, so let’s use yet another option- the prefix match. I will add -p e to match any stream starting with “p” in its name/UUID. I will remove the -c option so that I am actually seeing the match:

[root@c-csn1 tmp]# indexer-enumerator.sh -ro -d c-csn1-test1.enfield.com -m x-bob-meta-apples -v dunkin -f 29 -l 859 -p e

Starting to enumerate the requested streams in domain: c-csn1-test1.enfield.com

c-csn1-test1.enfield.com/e0896cec233e382c17840ae1c7d92054

Only streams smaller than 859 bytes are listed.
Only streams with names (or UUIDs) starting with "e" are listed.
Only enumerating streams written at least 29 day(s) ago

c-csn1-test1.enfield.com/ has 1 unique objects of stype: all, withreps, uses 1.67KB disk space.
[root@c-csn1 tmp]#

Sure enough, there’s our stream!

Now, how about if I want to match all streams larger than that stream across all domains, matching that same header, written more than 29 days ago. I will use the capital L option and change the domain to “ALL”:

 [root@c-csn1 tmp]# indexer-enumerator.sh -ro -d ALL -m x-bob-meta-apples -v dunkin -f 29 -L 859 -c

Enumerating all domains in the cluster:

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

test1.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
caringodrive.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-test1.enfield.com/ has 2 unique matching objects of stype: all, withreps, uses 4.53KB disk space
c-csn1-admindomain/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
m-csn4.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
nfstest1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
filefly-s3-target.c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
es-backups.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
bob.is.great.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
s3-compatible/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-cfs1.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space
c-csn1-s3-target.enfield.com/ has 0 unique matching objects of stype: all, withreps, uses 0 disk space


Only streams larger than 859 bytes are listed.
Only enumerating streams written at least 29 day(s) ago
Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 2 unique matching objects of stype: all, withreps, uses 4.53KB disk space

I can see only 2 streams match. I can remove the -c option and get those results if I wanted.

Hopefully the above gives you a good understanding of how the indexer-enumerator.sh script works and the power of its flexibility. You can search by domain, bucket, prefix, size, date written and type of stream. When you have decided you have the match you want, you can remove the -orc options and from there output the stream match results to file. Be careful to run this script from a directory/ partition with plenty of disk space if you are returning millions of streams. For full enumerations of larger data sets, you may want to add the -s option to echo the enumerator loop count. Each call to the indexer has a maximum of 10k returned values, so knowing how many iterations of that 10k figure the script has returned is valuable for larger enumerations.

  • No labels