Passthrough Fields
Passthrough fields are a feature only available in the web package versions 0.16.0 and higher
Passthrough fields are the term used to define any field from your snowplow_web_base_events_this_run
table that you want available as is in a derived table, i.e. these fields are passed through the processing to be available in your derived tables. There are many use cases for this, but the most common is the need to add a field from a custom context to the e.g. the page views table to use as an additional dimension in analysis. Previously you would have to write a custom model to join this field on, or join the original page view event at the time of the analysis, but with passthrough fields this is now far easier (and cheaper) to achieve.
Which event the field(s) are taken from depends on the derived table:
- Page Views uses the field value from the
page_view
event itself, not the pings - Sessions uses the field value from the first
page_view
orpage_ping
event (order is based onderived_tstamp
,dvce_created_tstamp
, thenevent_id
in case of a tie) - Users uses the first and last sessions for that user, which means if you want a field passed through to the users table, you must first make it available in the sessions table
As mentioned, the fields are passed through as is, which means it is not currently possible aggregate the value of a field across a page view or session as we do some other fields in the table, in the case where this is required, you should build a custom model as in older versions of the package.
To enable the passthrough fields, you need to set the relevant variable in your root dbt_project.yml
file; snowplow__page_view_passthroughs
, snowplow__session_passthroughs
, snowplow__user_first_passthroughs
, and snowplow__user_last_passthroughs
for page views, sessions, and users (first and lasts) respectively. The variables are lists of fields; either the name of the field or a dictionary specifying the SQL and alias for the field e.g.
vars:
snowplow_web:
snowplow__page_view_passthroughs: ['v_collector', {'sql': 'event_id || app_id', 'alias': 'event_app_id'}]
would add the v_collector
field and a field called event_app_id
which is the concatenation of the event_id
and the app_id
. A more useful case for the SQL block is to extract a specific field from a context, however you can also just use the context column name as the field to bring the whole context through as is e.g.
vars:
snowplow_web:
snowplow__page_view_passthroughs: ['contexts_my_entity_1']
Note that how to extract a field from your context column will depend on your warehouse (see our querying guide for more information), and you are unable to use dbt macros in this variable. For the users table, any basic field will have first_
or last_
prefixed to the field name automatically to avoid clashes, however if you are using the SQL approach, you will need to add these prefixes as part of your alias.
It is unlikely, although not impossible, that when using the SQL approach you may need to provide a table alias to avoid ambiguous references, in this case please see the model sql file for the specific alias used for the snowplow_web_base_events_this_run
table in each case.