Skip to main content
Release Version

Snowplow Media Player Package

The package source code can be found in the snowplow/dbt-snowplow-media-player repo, and the docs for the model design here.

The package is built as an extension of the dbt-snowplow-web package that transforms raw media player event data into derived tables for easier querying generated by the Snowplow JavaScript tracker in combination with media tracking specific plugins such as the Media Tracking plugin or the YouTube Tracking plugin.

In order to keep the documentations separate and less verbose, this guide will assume the reader is already familiar with configuring and running the web model and will only explain how to operate the media-player package in conjunction with the web model as the media model was designed to be run together with it. Please refer to the snowplow-web dbt docs for a full breakdown of the package and how to set it up.

caution

Please note that the media player package is not compatible with the Flutter tracker as it is reliant on the Snowplow JavaScript tracker.

Overviewโ€‹

This package consists of a series of dbt models with the goal to produce the following main aggregated models from the raw media player events and relevant contexts:

  • snowplow_media_player_base: This derived table summarizes the key media player events and metrics of each media element on a media_id and pageview level which is considered as a base aggregation level for media interactions.

  • snowplow_media_player_plays_by_pageview: This view removes impressions from the '_base' table to summarize media plays on a page_view by media_id level.

  • snowplow_media_player_media_stats: This derived table aggregates the '_base' table to individual media_id level, calculating the main KPIs and overall video/audio metrics.

The package is built on top of the dbt-snowplow-web package taking that as a basis to carry out the incremental update. It is designed to be run together with the web model in a similar manner to how a custom module would run.

The _interactions_this_run table takes the snowplow_web_base_events_this_run table generated by the web package as an input then adds the various contexts to enrich the base table with the additional media related fields. It could be used for custom models for more in-depth event level derived tables and further analysis.

The _base_this_run table then aggregates the _interactions_this_run table to media_id and pageview level and serves as a basis for the incrementalized derived table _media_base.

The main _media_stats derived table will also be updated incrementally based on the _media_base derived table, however not through the snowplow_incremental materialization, but using the native dbt incremental materialization on a pageview basis after a set time window passed. This is to prevent complex and expensive queries due to metrics which need to take the whole page_view events into calculation. This way the metrics will only be calculated once per pageview / media, after no new events are expected.

The additional _pivot_base table is there to calculate the percent_progress boundaries and weights that are used to calculate the total play_time and other related media fields.

Operating with the Web packageโ€‹

Due to its unique relationship with the web package, in order to operate the media player package together with the web model there are several considerations to keep in mind. Depending on the use case one of the following scenarios may happen:

  1. The web package was already being used and the media player package added later
  2. The web package has not been used
  3. Only the media player package needs to be run at any time

Adding the media player data model to an existing dbt project with web model data already runningโ€‹

If you have already been using the web package then your new media player models will need to sync up with the data that has been processed (see the incremental logic section for why), so the backfilling needs to happen before we can process more web data. Please note that during backfill no new web data is allowed to be processed and depending on the snowplow__backfill_limit_days configured and the period that needs backfilling it can take multiple run for all models to sync up and new web events to start to be processed again. You can temporarily overwrite the snowplow__backfill_limit_days variable in your snowplow_web configuration if you want to backfill in less runs.

To begin the synching process please run the following script:

dbt run --select snowplow_web.base snowplow_media_player --vars 'snowplow__start_date: <date_when_media_player_tracking_starts>'
tip

Using the snowplow__start_date allows you to start your media player data at a later date than the rest of your web data if you need to.

The web model's update logic will recognize the new media player models and begin backfilling them between the date you defined within snowplow_start_date and the upper limit defined by the variable snowplow_backfill_limit_days that is set for the web model. You should see the following in the logs:

Snowplow: New Snowplow incremental model. Backfilling

After this you should be able to see all media_player models added to the derived.snowplow_web_incremental_manifest table. Any subsequent run from this point onwards could be carried out using the recommended web model running method, using the snowplow_web selector.

dbt run --selector snowplow_web

Custom modelsโ€‹

There are two custom models included in the package which could potentially be used in downstream models:

  1. the snowplow_media_player_session_stats table, which aggregates the snowplow_media_base table on a session level

  2. the snowplow_media_player_user_stats table, which aggregates the snowplow_media_player_session_stats to user level

By default these are disabled, but you can enable them in the project's profiles.yml, if needed.

dbt_project.yml
  models:
snowplow_media_player:
custom:
enabled: true

Just like in case of the web model, users are encouraged to use the Media Player model and its incremental logic to design their own custom models / modules. The snowplow_media_player_interactions_this_run table is designed with this in mind, where a couple of potentially useful fields are generated that the Media Player model does not use downstream but they nonetheless have the potential to be incorporated into users custom models.

One such example is the player_current_time, which is the playback position of a specific media in seconds whenever a media player event is fired, from which more precise time-based calculations could be made.

e.g. subsequent events' player_current_time could be used to deduct the player_end_time of an event. Subtracting these two would result in calculated play_times instead of taking the percent_progress fields as a base like the Media Player model.

with interaction_ends as (

select
event_id,
event_type,
start_tstamp,
lead(start_tstamp, 1) over(partition by play_id order by start_tstamp) as end_tstamp,
player_current_time,
lead(player_current_time, 1) over(partition by play_id order by start_tstamp) as player_end_time
from scratch.snowplow_media_player_interactions_this_run

)

select
event_id,
player_end_time - player_current_time as play_time_sec_calculated

from interaction_ends

where event_type = 'play'
Was this page helpful?