Walkthrough: Ordering Sets of Filtered Objects
The following are details and guidance for a complex example, how to paginate (list ordered subsets of) the search results on objects matching specific metadata.
This walkthrough shows how and why to combine the use of three related Search Query Arguments : size
, marker
, and sort
.
How to Count Objects in a Bucket
This query returns an empty set (size=0
), focus on the header output alone:
$ curl -si -u jdoe "https://jdoe.cloud.acme.com/public/
?format=json&domain=jdoe.cloud.acme.com&size=0"
Enter host password for user 'jdoe':
HTTP/1.1 200 OK
Date: Wed, 16 Dec 2020 15:55:42 GMT
Gateway-Request-Id: 5BF093C3AECC45AD
Server: CAStor Cluster/12.0.0
Via: 1.1 jdoe.cloud.acme.com (Cloud Gateway SCSP/7.1.0)
Gateway-Protocol: scsp
Allow-Encoding: *;q=0
Castor-System-Alias: ac611714399ae0e5f22a628d4e8c26f4
Castor-System-CID: 924273bee8a6e01865d7b2a315ea5ae3
Castor-System-Cluster: foo.tx.acme.com
Castor-System-Created: Thu, 10 Sep 2015 19:45:24 GMT
Castor-System-Name: public
Castor-System-Version: 1441914324.106
X-Last-Modified-By-Meta: jdoe@
X-Owner-Meta: jdoe
X-Timestamp: Thu, 10 Sep 2015 19:45:24 GMT
X-timestamp: Wed, 16 Dec 2020 15:55:42 GMT
Content-Type: application/json;charset=utf-8
Castor-Object-Count: 62
Castor-System-Object-Count: 62
Last-Modified: Wed, 16 Dec 2020 15:55:42 GMT
Transfer-Encoding: chunked
[
]
Check the value for Castor-Object-Count
to determine how many objects are associated with the search performed. The number of objects in the "public" bucket under domain "jdoe.cloud.acme.com" is 62 per above.
How to Count Filtered Objects
Drill down further and focus on items matching a metadata characteristic. Filter for a specific kind of content (application, audio, image, text, video) being stored in the object, which is recorded in the Content-Type
metadata header. Note: filter objects by custom metadata as well.
This search filters for objects holding MP4 video content:
$ curl -si -u jdoe "https://jdoe.cloud.acme.com/public/
?format=json&domain=jdoe.cloud.acme.com&size=0&content-type=video/mp4"
Enter host password for user 'jdoe':
HTTP/1.1 200 OK
Date: Wed, 16 Dec 2020 17:10:58 GMT
Gateway-Request-Id: C28EB97FE6EF3914
Server: CAStor Cluster/12.0.0
Via: 1.1 jdoe.cloud.acme.com (Cloud Gateway SCSP/7.1.0)
Gateway-Protocol: scsp
Allow-Encoding: *;q=0
Castor-System-Alias: ac611714399ae0e5f22a628d4e8c26f4
Castor-System-CID: 924273bee8a6e01865d7b2a315ea5ae3
Castor-System-Cluster: foo.tx.acme.com
Castor-System-Created: Thu, 10 Sep 2015 19:45:24 GMT
Castor-System-Name: public
Castor-System-Version: 1441914324.106
X-Last-Modified-By-Meta: jdoe@
X-Owner-Meta: jdoe
X-Timestamp: Thu, 10 Sep 2015 19:45:24 GMT
X-timestamp: Wed, 16 Dec 2020 17:10:58 GMT
Content-Type: application/json;charset=utf-8
Castor-Object-Count: 39
Castor-System-Object-Count: 39
Last-Modified: Wed, 16 Dec 2020 17:10:58 GMT
Transfer-Encoding: chunked
[
]
Filtering the "public" bucket in domain "jdoe.cloud.acme.com" for MP4 content (content-type=video/mp4
) produces a count of 39 videos (Castor-Object-Count: 39
).
How to Limit (Page) the Results
Limit the size of the search results when a portion of the search results is needed or the entire set of objects is too large to be displayed in full. Combining three search query arguments provides the control needed:
size
- Controls the size of the result set, unrelated to object size (content-length
). Set it to 0 when the actual listing is not needed.marker
- Used withsize
to paginate large result sets. Use an empty key to begin a new search, then use the lastsort
key value of the results on the next request to continue pagination.sort
- Sorts the results on one or more fields, in the order listed. Sorting defaults to ascending, so add descending (:desc) as needed. Sorting is computationally intensive, so sort output when necessary.
$ curl -s -u jdoe "https://jdoe.cloud.acme.com/public/
?format=json&domain=jdoe.cloud.acme.com&content-type=video/mp4&marker=&size=5&sort=etag:desc"
Enter host password for user 'jdoe':
[
{
"last_modified": "2018-09-04T17:14:44.848000Z",
"bytes": 261671693,
"name": "recording-a.mp4",
"hash": "ff3ea60737fe1aec9b4a506a23c29fe9",
"written": "2018-09-04T17:14:44.848000Z",
"accessed": "2018-09-04T17:14:44.848000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2017-07-31T15:37:45.580000Z",
"bytes": 77337274,
"name": "recording-b.mp4",
"hash": "f2402263315cad55c0909f50f7154c13",
"written": "2017-07-31T15:37:45.580000Z",
"accessed": "2017-07-31T15:37:45.580000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2017-06-14T18:32:28.592000Z",
"bytes": 24926795,
"name": "recording-c.mp4",
"hash": "ed35d20e43af0a5a1757f000905ff653",
"written": "2017-06-14T18:32:28.592000Z",
"accessed": "2017-06-14T18:32:28.592000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2019-07-19T15:50:53.444000Z",
"bytes": 3810394,
"name": "recording-d.mp4",
"hash": "ec3c93febe2ff19e3c6a6561f8c25363",
"written": "2019-07-19T15:50:53.444000Z",
"accessed": "2019-07-19T15:50:53.444000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2018-06-29T19:02:45.724000Z",
"bytes": 55816215,
"name": "recording-e.mp4",
"hash": "e7e4a3d4cd8ee0df2894520d0624ceca",
"written": "2018-06-29T19:02:45.724000Z",
"accessed": "2018-06-29T19:02:45.724000Z",
"content_type": "video/mp4"
}
]
Skip getting the return headers of the request: this is performed because total object account above for objects filtering for is already determined.
"
marker=
", set to empty, starts at the beginning of the result set."
size=5
" returns the first 5 of our filtered objects."
sort=etag:desc
" sorts the objects in descending order from the "hash" value (ETag) associated with the object which is covered in detail below.
How to Pull the Next Result Set
Select a marker for the subsequent query to get the next five results in the set. Subsequent requests can be selected (marked) by a characteristic (metadata field) returned for the last object in the set. There are many to choose from:
{
"last_modified": "2018-06-29T19:02:45.724000Z",
"bytes": 55816215,
"name": "recording-e.mp4",
"hash": "e7e4a3d4cd8ee0df2894520d0624ceca",
"written": "2018-06-29T19:02:45.724000Z",
"accessed": "2018-06-29T19:02:45.724000Z",
"content_type": "video/mp4"
}
The best practice is to use the "hash
" field:
Field | Downsides of use as a Marker |
---|---|
name | Effort: Must URL-encode any special characters Not guaranteed to be unique except inside a given bucket |
last_modified | Not guaranteed to be unique Can introduce gaps in paging the result sets Changeable in real time, during the query run itself |
hash | None |
The "hash
" is the object's ETag (entity tag), which is guaranteed to be unique across the entire cluster. It supports queries spanning multiple buckets and domains.
How to Use the Hash as Marker
It takes two steps to page through result sets using the hash value as the marker:
Parse the hash value out of the output for the last object in the previous set.
Set the marker argument to be the hash string.
The hash value listed for the last object in the result above is "e7e4a3d4cd8ee0df2894520d0624ceca
", so start our next search for results after the object as follows:
$ curl -s -u jdoe "https://jdoe.cloud.acme.com/public/
?format=json&domain=jdoe.cloud.acme.com&content-type=video/mp4&marker=e7e4a3d4cd8ee0df2894520d0624ceca&size=5&sort=etag:desc"
Enter host password for user 'jdoe':
[
{
"last_modified": "2017-09-01T16:10:01.496000Z",
"bytes": 23902924,
"name": "recording-f.mp4",
"hash": "e7d46a777a2c67f5ebc016f8a8626ac5",
"written": "2017-09-01T16:10:01.496000Z",
"accessed": "2017-09-01T16:10:01.496000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2018-05-10T20:05:19.500000Z",
"bytes": 57463240,
"name": "recording-g.mp4",
"hash": "df9fc440b0a230e2d771e29b08829fe8",
"written": "2018-05-10T20:05:19.500000Z",
"accessed": "2018-05-10T20:05:19.500000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2016-01-04T21:42:05.896000Z",
"bytes": 76657180,
"name": "recording-h.mp4",
"hash": "cf65f6fac3d9683be29b6e37f1bc5910",
"written": "2016-01-04T21:42:05.896000Z",
"accessed": "2016-01-04T21:42:05.896000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2017-02-28T19:45:25.664000Z",
"bytes": 312294768,
"name": "recording-i.mp4",
"hash": "b89823900fcbe09c762f9946cf598612",
"written": "2017-02-28T19:45:25.664000Z",
"accessed": "2017-02-28T19:45:25.664000Z",
"content_type": "video/mp4"
},
{
"last_modified": "2017-09-08T21:26:44.148000Z",
"bytes": 442920798,
"name": "recording-j.mp4",
"hash": "b6e556acd26d43f052490afd0fe42e4f",
"written": "2017-09-08T21:26:44.148000Z",
"accessed": "2017-09-08T21:26:44.148000Z",
"content_type": "video/mp4"
}
]
This returns the next set of 5 objects in descending ETag value ordering (sort=etag:desc
).
For the next set, parse out the hash for the last object listed (b6e556acd26d43f052490afd0fe42e4f
) and continue until walking through all objects returned.
Important
The "sort" argument is computationally intensive. Watch the load on the Elasticsearch cluster to gauge the performance impact when running queries like this.
Related content
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.