This article is part 16 of the "21 Thoughts on Video Streaming in 2021"-series.

Nicolas Weil (Product Manager at AWS Elemental) shares his 2021 thoughts on metadata in video/media streams. Keep reading to get 1) context and 2) my take.

2020 brought an interesting set of new specifications for CMAF, with AOM's ID3 Timed Metadata in the Common Media Application Format and MISB's ST1910 for the carriage of KLV Metadata over CMAF, opening new perspectives for building data-intensive video workflows through Event Message Boxes.

This is however not the end of the road, as MPEG is working on new specifications like the CMAF Timed Metadata Tracks, the Event Message track format (MPEG-B part 18), with a corresponding Event Stream and Timed Metadata Processing model in the core DASH spec.

While all these new specs create interesting perspectives in terms of bandwidth optimization (by centralizing all the events into a dedicated track instead of duplicating it into all audio/video tracks), it also creates an additional level of implementation complexity for packager and players, with the nested timelines of the Timed Metadata Track and the contained emsg boxes.

So far we have seen a very limited industry support for the Timed Metadata Track in the context of the new CMAF-based Ingest Protocol developed by DASH-IF/MPEG. It will be interesting to see in 2021 if the ecosystem finds enough benefits in the new set of specs to cope with the associated complexity, for ingest or player-centric use cases.

Metadata is the new fronteer, let's be trailblazers. For more information on KLV metadata in streaming workflows, please check this blog post by Matt Carter: https://aws.amazon.com/blogs/media/misb-st-1910-and-klv-changing-the-game-in-intelligence-entertainment-and-beyond/

Context

We already wrote about the Common Media Application Format (CMAF), but it should be added that "CMAF containers" can also contain timed metadata. An example of timed metadata would be a signal (coming from the container) signalling that an ad break starts at second 15 and stops at second 45.

There are (generally speaking) two types of timed metadata:

  1. In-band timed metadata which is contained by a container, for example ID3 in the TS containers of an HLS stream, or Event Message (=emsg) in CMAF containers.
  2. Out-of-band timed metadata which is contained outside of the container, for example the timed metadata in the manifest of an HLS (=> #EXT-X-DATERANGE) or MPEG-DASH (=> EventStream) stream.

Note that you typically have "faster access" to out-of-band timed metadata, because you first fetch and parse your HLS/DASH manifest (and acquire the out-of-band timed metadata). Only after processing your manifest, you start downloading  (and processing) your segments (=containers) which contain in-band timed metadata.

Nicolas is explaining that two new types of in-band metadata are available:

  1. ID3 metadata in CMAF containers.
    The benefit of this new type ... .
    The added challenge is ... .
  2. KLV metadata in CMAF containers. The benefit of this new type is ... . The added challenge is ... .

Additionally, he's mentioning that 3 new types are in the pipeline:

  1. CMAF Timed Metadata Tracks. This is a new type of in-band timed metadata.
    The benefit of this new type is ... .
    The added challenge is ... .
  2. Event Message track format (MPEG-B part 18), with a corresponding Event Stream. This new type relates to both in-band and out-of-band timed metadata, as "Event Message" is in-band timed metadata in CMAF containers, and Event Stream is out-of-band timed metadata in MPEG-DASH manifests.
    The benefit of this new type is ... .
    The added challenge is ... .
  3. Timed Metadata Processing model. This is ... ?
    The benefit of this new type is ... .
    The added challenge is ... .

My take