Metadata Annotation

In addition to updating object metadata directly (via COPY), append additional metadata to existing objects without altering the original. This provides a method to extend the metadata of immutable objects, including historical versions, because each object's create date, original metadata, and version sequence remain undisturbed. Annotations provide an additional method for finding and managing objects, such as storing S3 object-level ACLs for the Gateway to enforce.

Important

Swarm cannot be downgraded to an earlier version once this feature is used.

Benefits - Keeping metadata annotation separate from the object itself provides several advantages:

  • Add helpful metadata without changing the object’s create date, original metadata, and version sequence.

  • Retrieve objects as originally written, so applications can distinguish between what was original and what was added later.

  • Annotate immutable objects.

  • Annotate historical versions of objects, independent of the current version of the object. This is keenly important when the metadata is derived from analysis performed on the data, which changes from version to version, or when capturing information about specific versions.

Note

Annotation Cleanup

There are two key features of this annotation method: (1) validation that target objects exist before annotations are written, and (2) the Health Processor's automated tracking and cleanup of annotation objects after the target object is removed. A target object annotated may be removed from Swarm in one of several ways:

  • SCSP Delete

  • SCSP Write (invalidating the old version)

  • Lifepoint Delete

  • Recursive delete of a parent context (domain or bucket)

Note

The Health Processor it logs a “DECORATION DELETE” AUDIT-level message when purging an annotation during garbage collection. Annotation objects "decorate" a targeted content object.

Regardless of the type of Swarm object annotated (named, alias, immutable, historical version) and the protection type (replicated or erasure-coded), metadata annotation operate largely the same way:

  • Swarm deletes the orphaned annotation during garbage collection if an annotation is created and the target object is later deleted.

  • The target object is completely unaffected if an annotation is created and later deleted. 

    • For named objects only, Swarm replaces the annotation object with a delete marker.

  • Swarm deletes both recursively if deleting a domain or a bucket containing both the original object and the annotation.

  • Create separate annotations for any historical versions if updating a versioned object; Swarm deletes the orphaned annotation during garbage collection when deleting a version.

  • Two outcomes occur based on the position in the version chain if creating and later deleting an annotation on a versioned object: 

    • Historical versions: Swarm removes the annotation. 

    • Current versions: Swarm replaces the annotation object with a delete marker.

Creating Annotations

Metadata annotation makes use of a persisted header, Castor-System-Decorates, which is the ETag of the target object the annotation object is extending (decorating). This is an annotation object, subject to special Health Processor management, if this header is present. The header is valid for all Swarm object types (immutable, alias, and named), but not for context objects (domains and buckets). Both the annotator (decorator) and annotated target object may be versioned. 

Create a new annotation object create an object pointing to the ETag of the target and includes the custom metadata to be added, such as GPS coordinates extracted from an existing, uploaded photo:

Extending Metadata with Post-Processed Data
Content-Length: 0 Castor-System-Decorates: 9282727ffcca3a09e0843281aafc13af X-GPS-Meta-Longitude: 36; 16; 48.36000000000589 X-GPS-Meta-Latitude: 115; 10; 20.79299999981990

Searching for Annotations

In the annotation (decorator) object’s Elasticsearch record, the Castor-System-Decorates header value is indexed under the key decorates, and the Elasticsearch configuration templates include the decorates field. Most Swarm queries return this value, if present, as part of the results.

Query argument - Use a “decorates=<uuid>” query argument in Swarm listing queries to find annotation objects for a given ETag (or earlier query result “hash”).

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443821892.

Sample Scenario for Annotations

Suppose a company needs to store surveillance videos as immutable objects (as protection from tampering) in the domain "swarm.example.com". To add a video, use the normal POST, adding the Content-Type of the video and custom metadata for the video's duration, camera location, and camera model:

curl -i --location-trusted -X POST --post301 \ --data-binary @20170311-972-9928817883.mp4 \ -H "Expect: 100-continue" \ -H "x-example-meta-Start-Time: 2017-03-11T12:00:01.678Z" \ -H "x-example-meta-End-Time: 2017-03-11T13:00:00.421Z" \ -H "x-example-meta-Building: Annex 2" \ -H "x-example-meta-Location: 972" \ -H "x-example-meta-CameraModel: SWDSK-850004A-US" \ -H "Content-Type: video/mp4" \ -H "Content-Disposition: inline" \ "http://swarm.example.com/" HTTP/1.1 201 Created Location: http://192.168.1.11:80/e970b3280d5501571c8c6fe9d6838557?domain=swarm.example.com Location: http://192.168.1.12:80/e970b3280d5501571c8c6fe9d6838557?domain=swarm.example.com Volume: b3381183a1cfc620d960db3eae1d086d Volume: 604a44d1a351045553b5481391af0810 Manifest: ec Content-UUID: e970b3280d5501571c8c6fe9d6838557 Last-Modified: Tue, 28 Mar 2017 19:19:48 GMT Castor-System-Encoding: zfec 1.4(2, 1, 524288, 200000000) Castor-System-Version: 1490728788.934 Etag: "681b2470307b9260fb83542903e51828" Replica-Count: 2 Date: Tue, 28 Mar 2017 19:22:19 GMT Server: CAStor Cluster/9.2.0 Content-Length: 46 Content-Type: text/html Keep-Alive: timeout=14400 <html><body>New stream created</body></html>

To verify the video is successfully stored, use a HEAD command:

curl --head --location-trusted "http://swarm.example.com/e970b3280d5501571c8c6fe9d6838557" HTTP/1.1 200 OK Castor-System-CID: 7e7fd5d747d244726af93c726672408b Castor-System-Cluster: swarm.example.com Castor-System-Created: Tue, 28 Mar 2017 19:19:48 GMT Content-Disposition: inline Content-Type: video/mp4 Last-Modified: Tue, 28 Mar 2017 19:19:48 GMT x-example-meta-Building: Annex 2 x-example-meta-CameraModel: SWDSK-850004A-US x-example-meta-End-Time: 2017-03-11T13:00:00.421Z x-example-meta-Location: 972 x-example-meta-Start-Time: 2017-03-11T12:00:01.678Z Manifest: ec Content-Length: 1500964975 Etag: "681b2470307b9260fb83542903e51828" Castor-System-Domain: swarm.example.com Volume: b3381183a1cfc620d960db3eae1d086d Date: Tue, 28 Mar 2017 19:24:25 GMT Server: CAStor Cluster/9.2.0 Keep-Alive: timeout=14400

The custom metadata is what makes it possible and practical to identify video of interest. Suppose an incident occurs in the Annex 2 building. Search for immutable video taken at Annex 2 during the time span to find surveillance video relevant to the investigation:

The search correctly finds a video of interest: e970b3280d5501571c8c6fe9d6838557

Adding Metadata Annotation

With the video stored securely, suppose the organization also needs to run an application to perform facial recognition on the video. An application generates data when it is run, including both information on the algorithm/settings and the detailed results. The original video object must remain read-only to serve as evidence, so the derived data and metadata must be stored with a method associating it with the original object without altering it.

The solution is to annotate the video with a decoration object (which can be named or unnamed) to associate the results with the original video.

To find any annotations producing facial recognition on the original object, search for objects that decorate the video and also qualify the search to look for facial recognition results:

The search correctly finds an annotation: 0cb2d9e90a3341b10bc9dba2 

 

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.