Determining the size of your Swarm-stored content

The Swarm Search API allows you to compute the total size of objects stored in a bucket or domain, by using the du query argument.

Example request and response for calculating the space used by all content in the domain:

GET http://cloud.caringo.com?format=json&du=withreps&stype=all&size=0
`
HTTP/1.1 200 OK `
Gateway-Request-Id: 26F809F67D883E6D `
Content-Type: application/json; charset=utf-8 `
Castor-Object-Count: 17 `
Castor-Bytes-Used-With-Reps: 121590 `
Date: Tue, 19 Jun 2012 22:00:16 GMT `
Server: CAStor Cluster/6.0.0-2.es `
Via: cloud.caringo.com (Cloud Gateway/1.1) `
Content-Length: 4 `
[ `
]

1) In case you need to determine the space occupied by the different content_types that you have stored in your entire cluster, then you can use the Elasticsearch Indexer's Terms Stats Facets API (in case you user version 0.9.x or lower):

curl -XGET -d @query.json "http://<IP_of_Elasticsearch_Indexer_Node>:9200/<name_of_your_Swarm_cluster?/_search?pretty"

where query.json looks like:

{
"facets" : {
"contentTypes_stats" : {
"terms_stats" : {
"key_field" : "contentType",
"value_field" : "size",                 --> the value_field can be "size" (the total space will not include reps) or "sizewithreps" (the total space will include reps)
"size" : 0
},
"global" : true
}
}
}

This will return the stats for all the content_types in your cluster.

Example response:

 "facets" : {
    "contentTypes_stats" : {
      "_type" : "terms_stats",
      "missing" : 0,
      "terms" : [ {
        "term" : "text/html",
        "count" : 56,
        "total_count" : 56,
        "min" : 29.0,
        "max" : 30016.0,
        "total" : 112787.0,
        "mean" : 2014.0535714285713
      }, {
        "term" : "image/png",
        "count" : 54,
        "total_count" : 54,
        "min" : 284.0,
        "max" : 326408.0,
        "total" : 602879.0,
        "mean" : 11164.425925925925
      },................

 

The facets[contentTypes_stats][terms][total] represent the total space occupied by objects of that particular content_type (with or withoutreps included, depending on your query.json)

 

2) In case you need to determine the space occupied by the different content_types that you have stored in a particular domain, then you need to follow these steps:

a) Determine the name-id mappings for all the domains in your cluster:

curl -XGET "http://<IP_of_Swarm_node>/?domains&format=json&fields=domainid,name"

Example response:

[
{"domainid":"f9336c9ceecca321bb6c6408b008d141", "name":"cloudscaler3demo.internal"},
{"domainid":"9dc45e197fc307229d53db5762c5b232", "name":"thesmiths"},
{"domainid":"672f29fd63b90a644206c101888399de", "name":"thebrowns"},
{"domainid":"38c22d3f971d0abd9147399fffd9592f", "name":"theflintstones"},
{"domainid":"c7a7212bc42efbc1fdf4943a6a368efc", "name":"gatewayadmindomain"},
{"domainid":"ad18fffec74600e932a1f16025aba265", "name":"therubbles"}
]

b) For each domain run the following query against your ElasticSearch Indexer:

curl -XGET -d @query.json "http://<Indexer_IP>:9200/<name_of_SWARM_Cluster>/_search?q=domainid:<ID_of_Swarm_Domain>&pretty"

where query.json is:

{
"facets" : {
"contentTypes_stats" : {
"terms_stats" : {
"key_field" : "contentType",
"value_field" : "size",          --> the value_field can be "size" (the total space will not include reps) or "sizewithreps" (the total space will include reps)
"size" : 0
}
}
}
}

Example response:

 "facets" : {
    "contentTypes_stats" : {
      "_type" : "terms_stats",
      "missing" : 0,
      "terms" : [ {
        "term" : "application/castorcontext",
        "count" : 3,
        "total_count" : 3,
        "min" : 0.0,
        "max" : 0.0,
        "total" : 0.0,
        "mean" : 0.0
      }, {
        "term" : "application/test",
        "count" : 2,
        "total_count" : 2,
        "min" : 2.00863744E8,
        "max" : 2.00863744E8,
        "total" : 4.01727488E8,
        "mean" : 2.00863744E8................

As in the previous case, the facets[contentTypes_stats][terms][total] represent the total space occupied by objects within the queried domain, of that particular content_type  (with or withoutreps included, depending on your query.json)


© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.