How to add metadata for better searches

Swarm Search is made powerful by the use of metadata to help you organize your data. But how do you write metadata into your objects? Let's look at an example using curl.

Here is a named write operation directly to a Swarm node (not hitting a proxy). The domain is c-csn2.example.com and was already created in the Swarm Admin Console. The bucket is called "bucket1" and does not yet exist so first we need to create the bucket. The node IP address is 192.168.202.84.

Create the bucket:

curl -i --location-trusted -XPOST --data-binary "" "http://192.168.202.84/bucket1"

Example output:

HTTP/1.1 201 Created
Location: http://192.168.202.86:80/bucket1?domain=c-csn2.example.com
Volume: 711578efa80e74a7977500b9afa5512a
Location: http://192.168.202.92:80/bucket1?domain=c-csn2.example.com
Volume: ddf04417655c2a5203f69c0854e9080e
Entity-MD5: pxGQhM7wTPBgzjplyV5dJA==
Stored-Digest: a7119084cef04cf060ce3a65c95e5d24
Last-Modified: Tue, 02 Feb 2016 22:00:42 GMT
Content-UUID: 77009875f227a7dba5daa572b89387f4
Castor-System-Version: 1454450442.476
Etag: "f3f0e2136b282bd91ff1c2829c8b13ba"
Castor-System-Alias: 77009875f227a7dba5daa572b89387f4
Replica-Count: 2
Date: Tue, 02 Feb 2016 22:00:42 GMT
Server: CAStor Cluster/8.0.0
Content-Length: 46
Content-Type: text/html
Keep-Alive: timeout=14400
<html><body>New stream created</body></html>

Now, we need to write the object with a local file called uuids.txt with real, usable metadata:

# curl -i --location-trusted -XPOST --data-binary @uuids.txt 
-H "CAStor-application: manual"
-H "x-band-meta-llica: unforgiven" "http://192.168.202.84/bucket1/uuids.txt"
HTTP/1.1 201 Created
Location: http://192.168.202.92:80/bucket1/uuids.txt?domain=c-csn2.example.com
Volume: eebf1d49c3018c49e5bd8a9193fae1b0
Location: http://192.168.202.84:80/bucket1/uuids.txt?domain=c-csn2.example.com
Volume: 04b2b33a65e3c94b6298a1cad06a8f1d
Entity-MD5: +sXQs75gA96HGppViMkrpQ==
Stored-Digest: fac5d0b3be6003de871a9a5588c92ba5
Last-Modified: Tue, 02 Feb 2016 22:10:57 GMT
Castor-System-Version: 1454451057.124
Etag: "97d226be57d95a31f8f01e10a2df3ae0"
Replica-Count: 2
Date: Tue, 02 Feb 2016 22:10:57 GMT
Server: CAStor Cluster/8.0.0
Content-Length: 46
Content-Type: text/html
Keep-Alive: timeout=14400
<html><body>New stream created</body></html>

To see the metadata, we can INFO that object:

# curl -iI --location-trusted "http://192.168.202.84/bucket1/uuids.txt"
HTTP/1.1 200 OK
CAStor-application: manual
Castor-System-CID: 77009875f227a7dba5daa572b89387f4
Castor-System-Cluster: c-csn2.example.com
Castor-System-Created: Tue, 02 Feb 2016 22:10:57 GMT
Castor-System-Name: uuids.txt
Castor-System-Version: 1454451057.124
Content-Length: 561
Content-Type: application/x-www-form-urlencoded
Last-Modified: Tue, 02 Feb 2016 22:10:57 GMT
x-band-meta-llica: unforgiven
Etag: "97d226be57d95a31f8f01e10a2df3ae0"
Castor-System-Path: /c-csn2.example.com/bucket1/uuids.txt
Castor-System-Domain: c-csn2.example.com
Volume: 04b2b33a65e3c94b6298a1cad06a8f1d
Date: Tue, 02 Feb 2016 22:11:49 GMT
Server: CAStor Cluster/8.0.0
Keep-Alive: timeout=14400

Provided that Swarm Search is configured, we can easily search for any objects that were written with that metadata.

Example searching for the "CAStor-application: manual" metadata:

# curl --location-trusted 'http://192.168.202.84/?size=10000&format=json
&stype=all&domain=c-csn2.example.com
&fields=name,context&sort=context,name&CAStor-application=manual'
[
{"name":"uuids.txt", "context":"c-csn2.example.com/bucket1"}
]

Example searching for the "x-band-meta-llica: unforgiven" metadata using regex inside the search. This will match any value, assuming the object has the header x-band-meta-llica:

# curl --location-trusted 'http://192.168.202.84/?size=10000&format=json
&stype=all&domain=c-csn2.example.com
&fields=name,context&sort=context,name&x-band-meta-llica=*'
[
{"name":"uuids.txt", "context":"c-csn2.example.com/bucket1"}
]

Now, assume that we wrote another object with the header x-band-meta-llica but a different value, we can show that our search still matches. This way we can make distinct values in a particular header but still match on any objects given that header.

Let us change the metadata of the first header so that it doesn't match (and change the filename to uuids2.txt), just as an example:

#curl -i --location-trusted -XPOST --data-binary @uuids.txt 
-H "CAStor-application: curl"
-H "x-band-meta-llica: sad but true" "http://192.168.202.84/bucket1/uuids2.txt"

HTTP/1.1 201 Created
Location: http://192.168.202.91:80/bucket1/uuids.txt?domain=c-csn2.example.com
Volume: 4186c53996d5f1511d4c8dfe969dd5a3
Location: http://192.168.202.84:80/bucket1/uuids.txt?domain=c-csn2.example.com
Volume: 04b2b33a65e3c94b6298a1cad06a8f1d
Entity-MD5: Vw2Sokb1t1c+VHxhP/3AFQ==
Stored-Digest: 570d92a246f5b7573e547c613ffdc015
Last-Modified: Tue, 02 Feb 2016 22:24:26 GMT
Castor-System-Version: 1454451866.709
Etag: "c8f0d1ee9779735b42acfe23947b1c13"
Replica-Count: 2
Date: Tue, 02 Feb 2016 22:24:26 GMT
Server: CAStor Cluster/8.0.0
Content-Length: 46
Content-Type: text/html
Keep-Alive: timeout=14400
<html><body>New stream created</body></html>

Now, here are the same searches again:

# curl --location-trusted 'http://192.168.202.84/?size=10000&format=json
&stype=all&domain=c-csn2.example.com
&fields=name,context&sort=context,name&CAStor-application=manual'
[
{"name":"uuids.txt", "context":"c-csn2.example.com/bucket1"}
]

This only matches one object because only the first object we wrote has that specific header value.

This search matches both objects because we used a wildcard in the value and they have the same header:

# curl --location-trusted 'http://192.168.202.84/?size=10000&format=json
&stype=all&domain=c-csn2.example.com
&fields=name,context&sort=context,name&x-band-meta-llica=*'
[
{"name":"uuids.txt", "context":"c-csn2.example.com/bucket1"},
{"name":"uuids2.txt", "context":"c-csn2.example.com/bucket1"}
]

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.