How to use indexer-enumerator.sh

If you want to enumerate an entire cluster and you have an Search (Indexer) Feed already configured, you may use the indexer-enumerator.sh script from the support tools bundle to do so.

For a smaller query, it might be easier to use the Content UI portal (if it’s installed on a Content Gateway). This script is for enumerating potentially large data sets where the UI would be less helpful.

Tips

You can run the script with “bash -x” to get examples of the curl syntax that you can adapt for your own custom indexer calls.
You can search by domain, bucket, prefix, size, date written, and type of object.
When you have the match you want, you can remove the -orc options and from there output the object match results to file.

Be careful to run this script from a directory/partition with plenty of disk space if you are returning millions of objects.

For full enumerations of larger data sets, you may want to add the -s option to echo the enumerator loop count. Each call to the indexer has a maximum of 10k returned values, so knowing how many iterations of that 10k figure the script has returned is valuable for larger enumerations.

Instructions

This is an extended example of how you can use this script to investigate what is in your cluster.

The environmental variable SCSP_HOST is set to a storage node IP to avoid having to put -a [storage-node-ip] on every example below.

1 Listing domains
2 Counting objects and space usage
3 Counting untenanted objects
4 Counting buckets
5 Searching objects
6 Searching unnamed objects
7 Searching metadata
8 Searching across multiple domains
9 Searching by age
10 Search by size

Listing domains

Run indexer-enumerator.sh -D to find out what domains exist in your cluster.

[root@c-csn1 tmp]# indexer-enumerator.sh -D
A complete domain listing can be found here: ./OUTPUTDIR-2020_0722-124732/domains.txt

Because a domain listing should be short, I use the -or options to output the results to stdout:

[root@c-csn1 tmp]# indexer-enumerator.sh -D -or

Here are the domains:
test1.c-csn1.enfield.com
caringodrive.c-csn1.enfield.com
filefly-c-csn1.enfield.com
c-csn1-test1.enfield.com
c-csn1-admindomain
m-csn4.enfield.com
nfstest1.enfield.com
filefly-s3-target.c-csn1.enfield.com
es-backups.enfield.com
c-csn1.enfield.com
bob.is.great.com
s3-compatible
c-csn1-cfs1.enfield.com
c-csn1-s3-target.enfield.com

Counting objects and space usage

Now I know the domains but not what’s in them. Next, to find out how many objects are in each domain and how much space each takes, I combine the -c option with the -d ALL option:

[root@c-csn1 tmp]# indexer-enumerator.sh -d ALL -c

Enumerating all domains in the cluster:
A complete domain listing can be found here: ./OUTPUTDIR-2020_0722-124949/domains.txt
test1.c-csn1.enfield.com/ has 3147 unique matching objects of stype: all, withreps, uses 458.44MB disk space
caringodrive.c-csn1.enfield.com/ has 20 unique matching objects of stype: all, withreps, uses 156.55MB disk space
filefly-c-csn1.enfield.com/ has 1597 unique matching objects of stype: all, withreps, uses 9.32GB disk space
c-csn1-test1.enfield.com/ has 1114 unique matching objects of stype: all, withreps, uses 971.29MB disk space
c-csn1-admindomain/ has 38 unique matching objects of stype: all, withreps, uses 382.00bytes disk space
m-csn4.enfield.com/ has 8 unique matching objects of stype: all, withreps, uses 184.14MB disk space
nfstest1.enfield.com/ has 19 unique matching objects of stype: all, withreps, uses 13.59MB disk space
filefly-s3-target.c-csn1.enfield.com/ has 8217 unique matching objects of stype: all, withreps, uses 656.16MB disk space
es-backups.enfield.com/ has 41360 unique matching objects of stype: all, withreps, uses 3.69GB disk space
c-csn1.enfield.com/ has 129 unique matching objects of stype: all, withreps, uses 2.12GB disk space
bob.is.great.com/ has 11 unique matching objects of stype: all, withreps, uses 10.81MB disk space
s3-compatible/ has 5 unique matching objects of stype: all, withreps, uses 5.86MB disk space
c-csn1-cfs1.enfield.com/ has 9853 unique matching objects of stype: all, withreps, uses 259.00MB disk space
c-csn1-s3-target.enfield.com/ has 76 unique matching objects of stype: all, withreps, uses 428.23MB disk space


Only streams counts are listed.  To get the streams themselves, remove the -c flag.
All domains: 65594 unique matching objects of stype: all, withreps, uses 18.21GB disk space

This gives me a good idea of what’s in my cluster.

Counting untenanted objects

What it does not show me are the untenanted objects (those not in any domain). Older clusters may not have any domains and so all of the objects would be untenanted. Newer clusters will have most or all objects tenanted and use enforceTenancy=true in the cluster configuration to ensure that all objects are in a domain.

We can see if we have any untenanted objects by using the -t option. I will again use the -c option just to get a count of the number of objects.

By this, I learn that I have only 9 untenanted objects in this particular cluster.

Counting buckets

Going back to the all domains output, I see the c-csn1-test1.enfield.com domain looks interesting to me because the domain name doesn’t give me a good idea what’s in it (in the way that the filefly-c-csn1.enfield.com and es-backups.enfield.com do).

So, let’s drill down into that domain by using the -d c-csn1-test1.enfield.com option.

How many buckets live in here?

There appear to be 20 buckets here, and they seem to use no disk space. That’s because I asked for only bucket objects, which don’t take up data. To see how much data resides inside a particular bucket, I would need to do a query on that bucket. Also, there might be unnamed objects that live in this domain (that is, are named by UUID and do not live in a bucket).

Let’s see what buckets exist in this domain (not just count them, as we did above):

Searching objects

I see that I have a bucket named “pants”. Let’s see how many objects live in my pants bucket.

As there are only three, I will output them to stdout (keeping the -or flags and removing the -c flag):

I keep using the -c option because I could potentially make a query that returns millions if not billions of results. Certainly I don’t want to do that right now. From the above, I see that I have 3 files in that bucket.

Because that domain should be an FQDN that resolves to the Content Gateway (or Swarm cluster, if not using Gateway), I can just curl info any of these files to see more information:

I can see that I wrote this object recently, and there aren’t many objects in this bucket. I am going to poke around more.

Searching unnamed objects

So, if that bucket doesn’t contain a majority of the objects in my domain, what bucket does? Or, perhaps unnamed objects are the majority of my objects. We can search only for unnamed objects like so using the -u option:

We might be tempted to use the -t option for “untenanted” objects, because untenanted objects are always unnamed, but these objects ARE tenanted (meaning, they live in a domain) but are also unnamed. Therefore, using -d [domain] -t will error.

Ok, we have 79 unnamed objects that live in c-csn1-test1.enfield.com. I want to get a few examples of these to show you what unnamed objects in a domain look like, but I don’t want to output all 79 to stdout. I will use the -u -1 -M 5 options to say “only send a single request for results (-1) and only return 5 items (-5) in that single request, and only return unnamed (-u) objects:

I can now curl any of these objects if I wanted to see their headers:

Searching metadata

Notice I have a metadata header called “x-bob-meta-apples” with value of “dunkin”. That’s interesting to me. I wonder if I have that metadata elsewhere in this domain:

The -m and -v options together show me that indeed I do have 43 matching objects. I wonder if I have any other objects that match the header but not necessarily that value. For this test, I simply remove the -v dunkin part of the command:

Since I have more results here, I know that I have that header with a different value.

Searching across multiple domains

One of the more powerful things about the indexer-enumerator.sh is that I can search across all domains, not just one domain. Let’s see how many objects matching that metadata header I have across my whole cluster. For this query, I change the domain name to “ALL” and I am just going to get a count match by using the -c option again:

That shows me 4 different domains (although it doesn’t show me untenanted objects that may match) have objects with that metadata. I can then narrow the search down to match that particular header value “dunkin”:

73 fewer objects. Let’s try a different header value:

Ah! This shows me that all of the objects matching that header have a value of either “dunkin” or “donuts”.

Searching by age

What if I was only interested in objects written long ago? Maybe I want to find all objects written x days ago so that I can delete them…

Let’s get a single object from the matching output above and then do a curl INFO.

I can see that it was written on June 23 of this year. Were ALL of the objects written this year matching that header written this year? We can check by using the -G 1 and -g 1 options:

Yes, they were all written this year. Since the object example we had was written on June 23 (today is July 22), I can do some further narrowing down based on my example. June 23 was 29 days ago from when I am running these examples:

14 objects matched that, and our test object in particular you can see matches as expected:

But only 14 of 43 objects were written at least 29 days ago. Were any written more than 30 days ago?

Nope.

I can further winnow my results as desired.

Let’s back out a little bit and add another option.

Search by size

How about if we were looking for small files with that same metadata. Let’s try to match objects about the same size as our example above - which was 858 bytes. To that end, I will add the -l 859 (l for “littler”) option to our query.

Ok, 12 of the 14 objects were smaller than 858 bytes. Nice to know. I want to verify that my example object is in that result set as a sanity check. The UUID started with e, so let’s use yet another option- the prefix match. I will add -p e to match any object starting with “p” in its name/UUID. I will remove the -c option so that I am actually seeing the match:

Sure enough, there’s our object!

Now, how about if I want to match all objects larger than that object across all domains, matching that same header, written more than 29 days ago. I will use the capital L option and change the domain to “ALL”:

I can see only 2 objects match. I can remove the -c option and get those results if I wanted.

Hopefully the above gives you a good understanding of how the indexer-enumerator.sh script works and the power of its flexibility.