Determining the size of your Swarm-stored content
The Swarm Search API allows you to compute the total size of objects stored in a bucket or domain, by using the du query argument.
Example request and response for calculating the space used by all content in the domain:
GET http://cloud.caringo.com?format=json&du=withreps&stype=all&size=0
`HTTP/1.1 200 OK `
Gateway-Request-Id: 26F809F67D883E6D `
Content-Type: application/json; charset=utf-8 `
Castor-Object-Count: 17 `
Castor-Bytes-Used-With-Reps: 121590 `
Date: Tue, 19 Jun 2012 22:00:16 GMT `
Server: CAStor Cluster/6.0.0-2.es `
Via: cloud.caringo.com (Cloud Gateway/1.1) `
Content-Length: 4 `
[ `
]
1) In case you need to determine the space occupied by the different content_types that you have stored in your entire cluster, then you can use the Elasticsearch Indexer's Terms Stats Facets API (in case you user version 0.9.x or lower):
curl -XGET -d @query.json "http://<IP_of_Elasticsearch_Indexer_Node>:9200/<name_of_your_Swarm_cluster?/_search?pretty"
where query.json looks like:
{
"facets" : {
"contentTypes_stats" : {
"terms_stats" : {
"key_field" : "contentType",
"value_field" : "size", --> the value_field can be "size" (the total space will not include reps) or "sizewithreps" (the total space will include reps)
"size" : 0
},
"global" : true
}
}
}
This will return the stats for all the content_types in your cluster.
Example response:
"facets" : {
"contentTypes_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : "text/html",
"count" : 56,
"total_count" : 56,
"min" : 29.0,
"max" : 30016.0,
"total" : 112787.0,
"mean" : 2014.0535714285713
}, {
"term" : "image/png",
"count" : 54,
"total_count" : 54,
"min" : 284.0,
"max" : 326408.0,
"total" : 602879.0,
"mean" : 11164.425925925925
},................
The facets[contentTypes_stats][terms][total] represent the total space occupied by objects of that particular content_type (with or withoutreps included, depending on your query.json)
2) In case you need to determine the space occupied by the different content_types that you have stored in a particular domain, then you need to follow these steps:
a) Determine the name-id mappings for all the domains in your cluster:
curl -XGET "http://<IP_of_Swarm_node>/?domains&format=json&fields=domainid,name"
Example response:
[
{"domainid":"f9336c9ceecca321bb6c6408b008d141", "name":"cloudscaler3demo.internal"},
{"domainid":"9dc45e197fc307229d53db5762c5b232", "name":"thesmiths"},
{"domainid":"672f29fd63b90a644206c101888399de", "name":"thebrowns"},
{"domainid":"38c22d3f971d0abd9147399fffd9592f", "name":"theflintstones"},
{"domainid":"c7a7212bc42efbc1fdf4943a6a368efc", "name":"gatewayadmindomain"},
{"domainid":"ad18fffec74600e932a1f16025aba265", "name":"therubbles"}
]
b) For each domain run the following query against your ElasticSearch Indexer:
curl -XGET -d @query.json "http://<Indexer_IP>:9200/<name_of_SWARM_Cluster>/_search?q=domainid:<ID_of_Swarm_Domain>&pretty"
where query.json is:
{
"facets" : {
"contentTypes_stats" : {
"terms_stats" : {
"key_field" : "contentType",
"value_field" : "size", --> the value_field can be "size" (the total space will not include reps) or "sizewithreps" (the total space will include reps)
"size" : 0
}
}
}
}
Example response:
"facets" : {
"contentTypes_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : "application/castorcontext",
"count" : 3,
"total_count" : 3,
"min" : 0.0,
"max" : 0.0,
"total" : 0.0,
"mean" : 0.0
}, {
"term" : "application/test",
"count" : 2,
"total_count" : 2,
"min" : 2.00863744E8,
"max" : 2.00863744E8,
"total" : 4.01727488E8,
"mean" : 2.00863744E8................
As in the previous case, the facets[contentTypes_stats][terms][total] represent the total space occupied by objects within the queried domain, of that particular content_type (with or withoutreps included, depending on your query.json)
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.