Snowplow Web Models
This page is auto-generated from our dbt packages, some information may be incomplete
Snowplow Web
Snowplow Web Base Events This Run
models/base/scratch/<adaptor>/snowplow_web_base_events_this_run.sql
Description
For any given run, this table contains all required events to be consumed by subsequent nodes in the Snowplow dbt web package. This is a cleaned, deduped dataset, containing all columns from the raw events table as well as having the page_view_id
joined in from the page view context.
Note: This table should be used as the input to any custom modules that require event level data, rather than selecting straight from atomic.events
File Paths
- bigquery
- databricks
- default
- snowflake
models/base/scratch/bigquery/snowplow_web_base_events_this_run.sql
models/base/scratch/databricks/snowplow_web_base_events_this_run.sql
models/base/scratch/default/snowplow_web_base_events_this_run.sql
models/base/scratch/snowflake/snowplow_web_base_events_this_run.sql
Details
Columns
Base event this run table column lists may be incomplete and is missing contexts/unstructs, please check your warehouse for a more accurate column list.
Column Name | Description |
---|---|
app_id | Application ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev. |
platform | Platform e.g. ‘web’ |
etl_tstamp | Timestamp event began ETL e.g. ‘2017-01-26 00:01:25.292’ |
collector_tstamp | Time stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’ |
dvce_created_tstamp | Timestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’ |
event | The type of event recorded e.g. ‘page_view’ |
event_id | A UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
txn_id | Transaction ID set client-side, used to de-dupe records e.g. 421828 |
name_tracker | Tracker namespace e.g. ‘sp1’ |
v_tracker | Tracker version e.g. ‘js-3.0.0’ |
v_collector | Collector version e.g. ‘ssc-2.1.0-kinesis’ |
v_etl | ETL version e.g. ‘snowplow-micro-1.1.0-common-1.4.2’ |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ |
user_ipaddress | User IP address e.g. ‘92.231.54.234’ |
user_fingerprint | A user fingerprint generated by looking at the individual browser features e.g. 2161814971 |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ |
domain_sessionidx | A visit / session index e.g. 3 |
network_userid | User ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’ |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ |
geo_region | ISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’ |
geo_city | City the visitor is in e.g. ‘New York’, ‘London’ |
geo_zipcode | Postcode the visitor is in e.g. ‘94109’ |
geo_latitude | Visitor location latitude e.g. 37.443604 |
geo_longitude | Visitor location longitude e.g. -122.4124 |
geo_region_name | Visitor region name e.g. ‘Florida’ |
ip_isp | Visitor’s ISP e.g. ‘FDN Communications’ |
ip_organization | Organization associated with the visitor’s IP address – defaults to ISP name if none is found e.g. ‘Bouygues Telecom’ |
ip_domain | Second level domain name associated with the visitor’s IP address e.g. ‘nuvox.net’ |
ip_netspeed | Visitor’s connection type e.g. ‘Cable/DSL’ |
page_url | The page URL e.g. ‘http://www.example.com’ |
page_title | Web page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’ |
page_referrer | URL of the referrer e.g. ‘http://www.referrer.com’ |
page_urlscheme | Scheme aka protocol e.g. ‘https’ |
page_urlhost | Host aka domain e.g. ‘“www.snowplow.io’ |
page_urlport | Port if specified, 80 if not 80 |
page_urlpath | Path to page e.g. ‘/product/index.html’ |
page_urlquery | Querystring e.g. ‘id=GTM-DLRG’ |
page_urlfragment | Fragment aka anchor e.g. ‘4-conclusion’ |
refr_urlscheme | Referer scheme e.g. ‘http’ |
refr_urlhost | Referer host e.g. ‘www.bing.com’ |
refr_urlport | Referer port e.g. 80 |
refr_urlpath | Referer page path e.g. ‘/images/search’ |
refr_urlquery | Referer URL querystring e.g. ‘q=psychic+oracle+cards’ |
refr_urlfragment | Referer URL fragment |
refr_medium | Type of referer e.g. ‘search’, ‘internal’ |
refr_source | Name of referer if recognised e.g. ‘Bing images’ |
refr_term | Keywords if source is a search engine e.g. ‘psychic oracle cards’ |
mkt_medium | Type of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’ |
mkt_source | The company / website where the traffic came from e.g. ‘Google’, ‘Facebook’ |
mkt_term | Any keywords associated with the referrer e.g. ‘new age tarot decks’ |
mkt_content | The content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723 |
mkt_campaign | The campaign ID e.g. ‘diageo-123’ |
se_category | Category of event e.g. ‘ecomm’, ‘video’ |
se_action | Action performed / event name e.g. ‘add-to-basket’, ‘play-video’ |
se_label | The object of the action e.g. the ID of the video played or SKU of the product added-to-basket e.g. ‘pbz00123’ |
se_property | A property associated with the object of the action e.g. ‘HD’, ‘large’ |
se_value | A value associated with the event / action e.g. the value of goods added-to-basket e.g. 9.99 |
tr_orderid | Order ID e.g. ‘#134’ |
tr_affiliation | Transaction affiliation (e.g. store where sale took place) e.g. ‘web’ |
tr_total | Total transaction value e.g. 12.99 |
tr_tax | Total tax included in transaction value e.g. 3.00 |
tr_shipping | Delivery cost charged e.g. 0.00 |
tr_city | Delivery address, city e.g. ‘London’ |
tr_state | Delivery address, state e.g. ‘Washington’ |
tr_country | Delivery address, country e.g. ‘France’ |
ti_orderid | Order ID e.g. ‘#134’ |
ti_sku | Product SKU e.g. ‘pbz00123’ |
ti_name | Product name e.g. ‘Cone pendulum’ |
ti_category | Product category e.g. ‘New Age’ |
ti_price | Product unit price e.g. 9.99 |
ti_quantity | Number of product in transaction e.g. 2 |
pp_xoffset_min | Minimum page x offset seen in the last ping period e.g. 0 |
pp_xoffset_max | Maximum page x offset seen in the last ping period e.g. 100 |
pp_yoffset_min | Minimum page y offset seen in the last ping period e.g. 0 |
pp_yoffset_max | Maximum page y offset seen in the last ping period e.g. 200 |
useragent | Raw useragent |
br_name | Browser name e.g. ‘Firefox 12’ |
br_family | Browser family e.g. ‘Firefox’ |
br_version | Browser version e.g. ‘12.0’ |
br_type | Browser type e.g. ‘Browser’ |
br_renderengine | Browser rendering engine e.g. ‘GECKO’ |
br_lang | Language the browser is set to e.g. ‘en-GB’ |
br_features_pdf | Whether the browser recognizes PDFs e.g. True |
br_features_flash | Whether Flash is installed e.g. True |
br_features_java | Whether Java is installed e.g. True |
br_features_director | Whether Adobe Shockwave is installed e.g. True |
br_features_quicktime | Whether QuickTime is installed e.g. True |
br_features_realplayer | Whether RealPlayer is installed e.g. True |
br_features_windowsmedia | Whether mplayer2 is installed e.g. True |
br_features_gears | Whether Google Gears is installed e.g. True |
br_features_silverlight | Whether Microsoft Silverlight is installed e.g. True |
br_cookies | Whether cookies are enabled e.g. True |
br_colordepth | Bit depth of the browser color palette e.g. 24 |
br_viewwidth | Viewport width e.g. 1000 |
br_viewheight | Viewport height e.g. 1000 |
os_name | Name of operating system e.g. ‘Android’ |
os_family | Operating system family e.g. ‘Linux’ |
os_manufacturer | Company responsible for OS e.g. ‘Apple’ |
os_timezone | Client operating system timezone e.g. ‘Europe/London’ |
dvce_type | Type of device e.g. ‘Computer’ |
dvce_ismobile | Is the device mobile? e.g. True |
dvce_screenwidth | Screen width in pixels e.g. 1900 |
dvce_screenheight | Screen height in pixels e.g. 1024 |
doc_charset | The page’s character encoding e.g. , ‘UTF-8’ |
doc_width | The page’s width in pixels e.g. 1024 |
doc_height | The page’s height in pixels e.g. 3000 |
tr_currency | Currency e.g. ‘USD’ |
tr_total_base | Total in base currency e.g. 12.99 |
tr_tax_base | Total tax in base currency e.g. 3.00 |
tr_shipping_base | decimal Delivery cost in base currency e.g. 0.00 |
ti_currency | Currency e.g. ‘EUR’ |
ti_price_base | decimal Price in base currency e.g. 9.99 |
base_currency | Reporting currency e.g. ‘GBP’ |
geo_timezone | Visitor timezone name e.g. ‘Europe/London’ |
mkt_clickid | The click ID e.g. ‘ac3d8e459’ |
mkt_network | The ad network to which the click ID belongs e.g. ‘DoubleClick’ |
etl_tags | JSON of tags for this ETL run e.g. “[‘prod’]” |
dvce_sent_tstamp | When the event was sent by the client device e.g. ‘2013-11-26 00:03:58.032’ |
refr_domain_userid | The Snowplow domain_userid of the referring website e.g. ‘bc2e92ec6c204a14’ |
refr_dvce_tstamp | The time of attaching the domain_userid to the inbound link e.g. ‘2013-11-26 00:02:05’ |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
derived_tstamp | Timestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’ |
event_vendor | Who defined the event e.g. ‘com.acme’ |
event_name | Event name e.g. ‘link_click’ |
event_format | Format for event e.g. ‘jsonschema’ |
event_version | Version of event schema e.g. ‘1-0-2’ |
event_fingerprint | Hash client-set event fields e.g. AADCE520E20C2899F4CED228A79A3083 |
true_tstamp | User-set “true timestamp” for the event e.g. ‘2013-11-26 00:02:04’ |
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ |
Code
- bigquery
- databricks
- default
- snowflake
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
{% set base_events_query = snowplow_utils.base_create_snowplow_events_this_run(
sessions_this_run_table='snowplow_web_base_sessions_this_run',
session_identifiers=var('snowplow__session_identifiers', [{"schema" : "atomic", "field" : "domain_sessionid"}]),
session_sql=var('snowplow__session_sql', none),
session_timestamp=var('snowplow__session_timestamp', 'collector_tstamp'),
derived_tstamp_partitioned=var('snowplow__derived_tstamp_partitioned', true),
days_late_allowed=var('snowplow__days_late_allowed', 3),
max_session_days=var('snowplow__max_session_days', 3),
app_ids=var('snowplow__app_ids', []),
snowplow_events_database=var('snowplow__database', target.database) if target.type not in ['databricks', 'spark'] else var('snowplow__databricks_catalog', 'hive_metastore') if target.type in ['databricks'] else var('snowplow__atomic_schema', 'atomic'),
snowplow_events_schema=var('snowplow__atomic_schema', 'atomic'),
snowplow_events_table=var('snowplow__events_table', 'events')) %}
with base_query as (
{{ base_events_query }}
)
select
a.contexts_com_snowplowanalytics_snowplow_web_page_1_0_0[safe_offset(0)].id as page_view_id,
a.session_identifier as domain_sessionid,
a.domain_sessionid as original_domain_sessionid,
a.user_identifier as domain_userid,
a.domain_userid as original_domain_userid,
a.* except(contexts_com_snowplowanalytics_snowplow_web_page_1_0_0, domain_sessionid, domain_userid)
from base_query a
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
{% set base_events_query = snowplow_utils.base_create_snowplow_events_this_run(
sessions_this_run_table='snowplow_web_base_sessions_this_run',
session_identifiers=var('snowplow__session_identifiers', [{"schema" : "atomic", "field" : "domain_sessionid"}]),
session_sql=var('snowplow__session_sql', none),
session_timestamp=var('snowplow__session_timestamp', 'collector_tstamp'),
derived_tstamp_partitioned=var('snowplow__derived_tstamp_partitioned', true),
days_late_allowed=var('snowplow__days_late_allowed', 3),
max_session_days=var('snowplow__max_session_days', 3),
app_ids=var('snowplow__app_ids', []),
snowplow_events_database=var('snowplow__database', target.database) if target.type not in ['databricks', 'spark'] else var('snowplow__databricks_catalog', 'hive_metastore') if target.type in ['databricks'] else var('snowplow__atomic_schema', 'atomic'),
snowplow_events_schema=var('snowplow__atomic_schema', 'atomic'),
snowplow_events_table=var('snowplow__events_table', 'events')) %}
with base_query as (
{{ base_events_query }}
)
select
a.contexts_com_snowplowanalytics_snowplow_web_page_1[0].id as page_view_id,
a.session_identifier as domain_sessionid,
a.domain_sessionid as original_domain_sessionid,
a.user_identifier as domain_userid,
a.domain_userid as original_domain_userid,
a.* except(contexts_com_snowplowanalytics_snowplow_web_page_1, domain_sessionid, domain_userid)
from base_query a
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
sort='collector_tstamp',
dist='event_id',
tags=["this_run"]
)
}}
{# dbt passed variables by reference so need to use copy to avoid altering the list multiple times #}
{% set contexts = var('snowplow__entities_or_sdes', []).copy() %}
{% do contexts.append({'name': var('snowplow__page_view_context'), 'prefix': 'page_view', 'single_entity': True}) %}
{% if var('snowplow__enable_iab', false) -%}
{% do contexts.append({'name': var('snowplow__iab_context'), 'prefix': 'iab', 'single_entity': True}) %}
{% endif -%}
{% if var('snowplow__enable_ua', false) -%}
{% do contexts.append({'name': var('snowplow__ua_parser_context'), 'prefix': 'ua', 'single_entity': True}) %}
{% endif -%}
{% if var('snowplow__enable_yauaa', false) -%}
{% do contexts.append({'name': var('snowplow__yauaa_context'), 'prefix': 'yauaa', 'single_entity': True}) %}
{% endif -%}
{% if var('snowplow__enable_consent', false) -%}
{% do contexts.append({'name': var('snowplow__consent_cmp_visible'), 'prefix': 'cmp_visible', 'single_entity': True}) %}
{% do contexts.append({'name': var('snowplow__consent_preferences'), 'prefix': 'consent_pref', 'single_entity': True}) %}
{% endif -%}
{% if var('snowplow__enable_cwv', false) -%}
{% do contexts.append({'name': var('snowplow__cwv_context'), 'prefix': 'cwv', 'single_entity': True}) %}
{% endif -%}
{% set base_events_query = snowplow_utils.base_create_snowplow_events_this_run(
sessions_this_run_table='snowplow_web_base_sessions_this_run',
session_identifiers=var('snowplow__session_identifiers', [{"schema" : "atomic", "field" : "domain_sessionid"}]),
session_sql=var('snowplow__session_sql', none),
session_timestamp=var('snowplow__session_timestamp', 'collector_tstamp'),
derived_tstamp_partitioned=var('snowplow__derived_tstamp_partitioned', true),
days_late_allowed=var('snowplow__days_late_allowed', 3),
max_session_days=var('snowplow__max_session_days', 3),
app_ids=var('snowplow__app_ids', []),
snowplow_events_database=var('snowplow__database', target.database) if target.type not in ['databricks', 'spark'] else var('snowplow__databricks_catalog', 'hive_metastore') if target.type in ['databricks'] else var('snowplow__atomic_schema', 'atomic'),
snowplow_events_schema=var('snowplow__atomic_schema', 'atomic'),
snowplow_events_table=var('snowplow__events_table', 'events'),
entities_or_sdes=contexts) %}
with base_query as (
{{ base_events_query }}
)
{% set base_query_cols = get_column_schema_from_query( 'select * from (' + base_events_query +') a') %}
select
{% for col in base_query_cols | map(attribute='name') | list -%}
{% if col == 'session_identifier' -%}
a.session_identifier as domain_sessionid
{%- elif col == 'domain_sessionid' -%}
a.domain_sessionid as original_domain_sessionid
{%- elif col == 'user_identifier' -%}
a.user_identifier as domain_userid
{%- elif col == 'domain_userid' -%}
a.domain_userid as original_domain_userid
{%- else -%}
a.{{col}}
{%- endif -%}
{%- if not loop.last -%},{%- endif %}
{% endfor %}
from base_query a
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
{% set base_events_query = snowplow_utils.base_create_snowplow_events_this_run(
sessions_this_run_table='snowplow_web_base_sessions_this_run',
session_identifiers=var('snowplow__session_identifiers', [{"schema" : "atomic", "field" : "domain_sessionid"}]),
session_sql=var('snowplow__session_sql', none),
session_timestamp=var('snowplow__session_timestamp', 'collector_tstamp'),
derived_tstamp_partitioned=var('snowplow__derived_tstamp_partitioned', true),
days_late_allowed=var('snowplow__days_late_allowed', 3),
max_session_days=var('snowplow__max_session_days', 3),
app_ids=var('snowplow__app_ids', []),
snowplow_events_database=var('snowplow__database', target.database) if target.type not in ['databricks', 'spark'] else var('snowplow__databricks_catalog', 'hive_metastore') if target.type in ['databricks'] else var('snowplow__atomic_schema', 'atomic'),
snowplow_events_schema=var('snowplow__atomic_schema', 'atomic'),
snowplow_events_table=var('snowplow__events_table', 'events')) %}
with base_query as (
{{ base_events_query }}
)
select
a.contexts_com_snowplowanalytics_snowplow_web_page_1[0]:id::varchar as page_view_id,
a.session_identifier as domain_sessionid,
a.domain_sessionid as original_domain_sessionid,
a.user_identifier as domain_userid,
a.domain_userid as original_domain_userid,
a.* exclude(contexts_com_snowplowanalytics_snowplow_web_page_1, domain_sessionid, domain_userid)
from base_query a
Depends On
- Models
- Macros
- macro.dbt.get_column_schema_from_query
- macro.snowplow_utils.app_id_filter
- macro.snowplow_utils.base_create_snowplow_events_this_run
- macro.snowplow_utils.return_limits_from_model
- macro.snowplow_utils.set_query_tag
- macro.snowplow_utils.timestamp_add
- macro.snowplow_web.get_iab_context_fields
- macro.snowplow_web.get_ua_context_fields
- macro.snowplow_web.get_yauaa_context_fields
Referenced By
- Models
- model.snowplow_media_player.snowplow_media_player_interactions_this_run
- model.snowplow_web.snowplow_web_consent_events_this_run
- model.snowplow_web.snowplow_web_page_views_this_run
- model.snowplow_web.snowplow_web_pv_engaged_time
- model.snowplow_web.snowplow_web_pv_scroll_depth
- model.snowplow_web.snowplow_web_sessions_this_run
- model.snowplow_web.snowplow_web_user_mapping
- model.snowplow_web.snowplow_web_vital_events_this_run
Snowplow Web Base New Event Limits
models/base/scratch/snowplow_web_base_new_event_limits.sql
Description
This table contains the lower and upper timestamp limits for the given run of the web model. These limits are used to select new events from the events table.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
lower_limit | The lower collector_tstamp limit for the run | timestamp_ntz |
upper_limit | The upper collector_tstamp limit for the run | timestamp_ntz |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{ config(
post_hook=["{{snowplow_utils.print_run_limits(this, 'snowplow_web')}}"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
{%- set models_in_run = snowplow_utils.get_enabled_snowplow_models('snowplow_web') -%}
{% set min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models = snowplow_utils.get_incremental_manifest_status(ref('snowplow_web_incremental_manifest'),
models_in_run) -%}
{% set run_limits_query = snowplow_utils.get_run_limits(min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models,
var("snowplow__start_date","2020-01-01")) -%}
{{ run_limits_query }}
Depends On
- Models
- Macros
Referenced By
- Models
- model.snowplow_media_player.snowplow_media_player_base
- model.snowplow_web.snowplow_web_base_sessions_lifecycle_manifest
- model.snowplow_web.snowplow_web_base_sessions_this_run
- model.snowplow_web.snowplow_web_consent_events_this_run
- model.snowplow_web.snowplow_web_consent_log
- model.snowplow_web.snowplow_web_page_views
- model.snowplow_web.snowplow_web_sessions
- model.snowplow_web.snowplow_web_user_mapping
- model.snowplow_web.snowplow_web_users
- model.snowplow_web.snowplow_web_vital_events_this_run
- model.snowplow_web.snowplow_web_vitals
- model.snowplow_web.snowplow_web_vitals_this_run
Snowplow Web Base Quarantined Sessions
models/base/manifest/snowplow_web_base_quarantined_sessions.sql
Description
This table contains any sessions that have been quarantined. Sessions are quarantined once they exceed the maximum allowed session length, defined by snowplow__max_session_days
.
Once quarantined, no further events from these sessions will be processed. Events up until the point of quarantine remain in your derived tables.
The reason for removing long sessions is to reduce table scans on both the events table and all derived tables. This improves performance greatly.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
session_identifier | The session_identifier of the quarantined session | text |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
full_refresh=snowplow_web.allow_refresh(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
}
)
}}
{% set quarantined_query = snowplow_utils.base_create_snowplow_quarantined_sessions() %}
{{ quarantined_query }}
Depends On
- Macros
Referenced By
Snowplow Web Base Sessions Lifecycle Manifest
models/base/manifest/snowplow_web_base_sessions_lifecycle_manifest.sql
Description
This incremental table is a manifest of all sessions that have been processed by the Snowplow dbt web model. For each session, the start and end timestamp is recorded.
By knowing the lifecycle of a session the model is able to able to determine which sessions and thus events to process for a given timeframe, as well as the complete date range required to reprocess all events of each session.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
session_identifier | The session identifier as defined in your project variables. Default to domain_sessionid. | text |
user_identifier | The user identifier as defined in your project variables. Default to domain_userid. | text |
start_tstamp | The collector_tstamp when the session began | timestamp_ntz |
end_tstamp | The collector_tstamp when the session ended | timestamp_ntz |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
unique_key='session_identifier',
upsert_date_key='start_tstamp',
sort='start_tstamp',
dist='session_identifier',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_sessions_lifecycle(),
full_refresh=snowplow_web.allow_refresh(),
tags=["manifest"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}
{% set sessions_lifecycle_manifest_query = snowplow_utils.base_create_snowplow_sessions_lifecycle_manifest(
session_identifiers=var('snowplow__session_identifiers', [{"schema" : "atomic", "field" : "domain_sessionid"}]),
session_sql=var('snowplow__session_sql', none),
session_timestamp=var('snowplow__session_timestamp', 'collector_tstamp'),
user_identifiers=var('snowplow__user_identifiers', [{"schema": "atomic", "field" : "domain_userid"}]),
user_sql=var('snowplow__user_sql', none),
quarantined_sessions='snowplow_web_base_quarantined_sessions',
derived_tstamp_partitioned=var('snowplow__derived_tstamp_partitioned', true),
days_late_allowed=var('snowplow__days_late_allowed', 3),
max_session_days=var('snowplow__max_session_days', 3),
app_ids=var('snowplow__app_ids', []),
snowplow_events_database=var('snowplow__database', target.database) if target.type not in ['databricks', 'spark'] else var('snowplow__databricks_catalog', 'hive_metastore') if target.type in ['databricks'] else var('snowplow__atomic_schema', 'atomic'),
snowplow_events_schema=var('snowplow__atomic_schema', 'atomic'),
snowplow_events_table=var('snowplow__events_table', 'events'),
event_limits_table='snowplow_web_base_new_event_limits',
incremental_manifest_table='snowplow_web_incremental_manifest'
) %}
{{ sessions_lifecycle_manifest_query }}
Depends On
- Models
- Macros
Referenced By
Snowplow Web Base Sessions This Run
models/base/scratch/snowplow_web_base_sessions_this_run.sql
Description
For any given run, this table contains all the required sessions.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
session_identifier | The session identifier as defined in your project variables. Default to domain_sessionid. | text |
user_identifier | The user identifier as defined in your project variables. Default to domain_userid. | text |
start_tstamp | The collector_tstamp when the session began | timestamp_ntz |
end_tstamp | The collector_tstamp when the session ended | timestamp_ntz |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
post_hook=["{{ snowplow_utils.base_quarantine_sessions(var('snowplow__max_session_days', 3), var('snowplow__quarantined_sessions', 'snowplow_web_base_quarantined_sessions')) }}"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
{% set sessions_query = snowplow_utils.base_create_snowplow_sessions_this_run(
lifecycle_manifest_table='snowplow_web_base_sessions_lifecycle_manifest',
new_event_limits_table='snowplow_web_base_new_event_limits') %}
{{ sessions_query }}
Depends On
- Models
- Macros
Referenced By
Snowplow Web Consent Cmp Stats
models/optional_modules/consent/snowplow_web_consent_cmp_stats.sql
Description
Used for modeling cmp_visible events and related metrics
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
event_id | A UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_userid | The optional userid of a user | text |
original_domain_userid | text | |
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
original_domain_sessionid | text | |
cmp_load_time | The time taken for the consent box to be shown to the screen | float |
cmp_tstamp | The timestamp of the cmp_visible event | timestamp_ntz |
first_consent_event_tstamp | The timestamp of the first consent event after a cmp_visible event | timestamp_ntz |
first_consent_event_type | The event type of the first consent event after a cmp_visible event | text |
cmp_interaction_time | The time it takes for the user to make a consent choice after the cmp_visible event is fired | number |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='table',
enabled=var("snowplow__enable_consent", false),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
{%- if target.type in ('postgres') -%}
with events as (
select
event_id,
domain_userid,
original_domain_userid,
page_view_id,
domain_sessionid,
original_domain_sessionid,
derived_tstamp,
event_name,
event_type,
cmp_load_time,
-- postgres does not allow the IGNORE NULL clause within last_value(), below workaround should do the same: removing NULLS using array_remove then using the COUNT window function (which counts the number of non-null items and count is bounded up to the current row) to access the array using that as its index position
(array_remove(array_agg(case when event_name = 'cmp_visible' then event_id else null end) over (partition by domain_userid order by derived_tstamp), null))[count(case when event_name = 'cmp_visible' then event_id else null end) over (partition by domain_userid order by derived_tstamp rows between unbounded preceding and current row)] as cmp_id
from {{ ref('snowplow_web_consent_log') }}
where event_type <> 'pending' or event_type is null
)
{%- elif target.type in ('databricks', 'spark') -%}
with events as (
select
event_id,
domain_userid,
original_domain_userid,
page_view_id,
domain_sessionid,
original_domain_sessionid,
derived_tstamp,
event_name,
event_type,
cmp_load_time,
last_value(case when event_name = 'cmp_visible' then event_id else null end, TRUE)
over (partition by domain_userid order by derived_tstamp
rows between unbounded preceding and current row) as cmp_id
from {{ ref('snowplow_web_consent_log') }}
where event_type <> 'pending' or event_type is null
)
{%- else -%}
with events as (
select
event_id,
domain_userid,
original_domain_userid,
page_view_id,
domain_sessionid,
original_domain_sessionid,
derived_tstamp,
event_name,
event_type,
cmp_load_time,
last_value(case when event_name = 'cmp_visible' then event_id else null end ignore nulls)
over (partition by domain_userid order by derived_tstamp
rows between unbounded preceding and current row) as cmp_id
from {{ ref('snowplow_web_consent_log') }}
where event_type <> 'pending' or event_type is null
)
{%- endif -%}
, event_orders as (
select
event_id,
event_type,
cmp_id,
derived_tstamp,
row_number() over(partition by cmp_id order by derived_tstamp) as row_num
from events
)
, first_consent_events as (
select
event_id,
cmp_id,
event_type,
derived_tstamp as first_consent_event_tstamp
from event_orders
where row_num = 2
)
, cmp_events as (
select distinct
event_id,
domain_userid,
original_domain_userid,
page_view_id,
domain_sessionid,
original_domain_sessionid,
cmp_load_time,
derived_tstamp as cmp_tstamp
from events
where event_name = 'cmp_visible'
)
select
e.event_id,
e.domain_userid,
e.original_domain_userid,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.cmp_load_time,
e.cmp_tstamp,
f.first_consent_event_tstamp,
f.event_type as first_consent_event_type,
{{ datediff('e.cmp_tstamp', 'f.first_consent_event_tstamp', 'second') }} as cmp_interaction_time
from cmp_events e
left join first_consent_events f
on e.event_id = f.cmp_id
Depends On
- Models
- Macros
Snowplow Web Consent Events This Run
models/optional_modules/consent/scratch/<adaptor>/snowplow_web_consent_events_this_run.sql
Description
This model does not currently have a description.
Type: Table
File Paths
- bigquery
- databricks
- default
- snowflake
models/optional_modules/consent/scratch/bigquery/snowplow_web_consent_events_this_run.sql
models/optional_modules/consent/scratch/databricks/snowplow_web_consent_events_this_run.sql
models/optional_modules/consent/scratch/default/snowplow_web_consent_events_this_run.sql
models/optional_modules/consent/scratch/snowflake/snowplow_web_consent_events_this_run.sql
Details
Columns
Column Name | Description | Type |
---|---|---|
event_id | text | |
domain_userid | text | |
original_domain_userid | text | |
user_id | text | |
geo_country | text | |
page_view_id | text | |
domain_sessionid | text | |
original_domain_sessionid | text | |
derived_tstamp | timestamp_ntz | |
load_tstamp | timestamp_ntz | |
event_name | text | |
event_type | text | |
basis_for_processing | text | |
consent_url | text | |
consent_version | text | |
consent_scopes | text | |
domains_applied | text | |
gdpr_applies | boolean | |
cmp_load_time | float |
Code
- bigquery
- databricks
- default
- snowflake
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_consent", false) and target.type == 'bigquery' | as_bool(),
)
}}
with prep as (
select
e.event_id,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.geo_country,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.derived_tstamp,
e.load_tstamp,
e.event_name,
{{ snowplow_utils.get_optional_fields(
enabled= true,
fields=consent_fields(),
col_prefix='unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='e') }},
{{ snowplow_utils.get_optional_fields(
enabled= true,
fields=[{'field': 'elapsed_time', 'dtype': 'string'}],
col_prefix='unstruct_event_com_snowplowanalytics_snowplow_cmp_visible_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='e') }}
from {{ ref("snowplow_web_base_events_this_run") }} as e
where e.event_name in ('cmp_visible', 'consent_preferences')
and {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
{% if var("snowplow__ua_bot_filter", false) %}
{{ filter_bots() }}
{% endif %}
)
select
p.event_id,
p.domain_userid,
p.original_domain_userid,
p.user_id,
p.geo_country,
p.page_view_id,
p.domain_sessionid,
p.original_domain_sessionid,
p.derived_tstamp,
p.load_tstamp,
p.event_name,
p.event_type,
p.basis_for_processing,
p.consent_url,
p.consent_version,
{{ snowplow_utils.get_array_to_string('consent_scopes', 'p', ', ') }} as consent_scopes,
{{ snowplow_utils.get_array_to_string('domains_applied', 'p', ', ') }} as domains_applied,
coalesce(safe_cast(p.gdpr_applies as boolean), false) gdpr_applies,
cast(p.elapsed_time as {{ dbt.type_float() }}) as cmp_load_time
from prep p
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_consent", false) and target.type in ['databricks', 'spark'] | as_bool(),
)
}}
with prep as (
select
e.event_id,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.geo_country,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.derived_tstamp,
e.load_tstamp,
e.event_name,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1.event_type::STRING as event_type,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1.basis_for_processing::STRING as basis_for_processing,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1.consent_url::STRING as consent_url,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1.consent_version::STRING as consent_version,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1.consent_scopes::ARRAY<STRING> as consent_scopes,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1.domains_applied::ARRAY<STRING> as domains_applied,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1.gdpr_applies::boolean as gdpr_applies,
e.unstruct_event_com_snowplowanalytics_snowplow_cmp_visible_1.elapsed_time::float as cmp_load_time
from {{ ref("snowplow_web_base_events_this_run") }} as e
where event_name in ('cmp_visible', 'consent_preferences')
and {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
{% if var("snowplow__ua_bot_filter", false) %}
{{ filter_bots() }}
{% endif %}
)
select
p.event_id,
p.domain_userid,
p.original_domain_userid,
p.user_id,
p.geo_country,
p.page_view_id,
p.domain_sessionid,
p.original_domain_sessionid,
p.derived_tstamp,
p.load_tstamp,
p.event_name,
p.event_type,
p.basis_for_processing,
p.consent_url,
p.consent_version,
{{ snowplow_utils.get_array_to_string('consent_scopes', 'p', ', ') }} as consent_scopes,
{{ snowplow_utils.get_array_to_string('domains_applied', 'p', ', ') }} as domains_applied,
coalesce(p.gdpr_applies, false) as gdpr_applies,
p.cmp_load_time
from prep p
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_consent", false) and target.type in ['redshift', 'postgres'] | as_bool(),
)
}}
{%- set lower_limit, upper_limit = snowplow_utils.return_limits_from_model(ref('snowplow_web_base_sessions_this_run'),
'start_tstamp',
'end_tstamp') %}
select
e.event_id,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.geo_country,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.derived_tstamp,
e.load_tstamp,
e.event_name,
e.consent_pref_event_type as event_type,
e.consent_pref_basis_for_processing as basis_for_processing,
e.consent_pref_consent_url as consent_url,
e.consent_pref_consent_version as consent_version,
replace(translate(e.consent_pref_consent_scopes, '"[]', ''), ',', ', ') as consent_scopes,
replace(translate(e.consent_pref_domains_applied, '"[]', ''), ',', ', ') as domains_applied,
coalesce(e.consent_pref_gdpr_applies, false) as gdpr_applies,
e.cmp_visible_elapsed_time as cmp_load_time
from {{ ref("snowplow_web_base_events_this_run") }} as e
where event_name in ('cmp_visible', 'consent_preferences')
and {{ snowplow_utils.is_run_with_new_events('snowplow_web') }}
--returns false if run doesn't contain new events.
{% if var("snowplow__ua_bot_filter", false) %}
{{ filter_bots() }}
{% endif %}
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_consent", false) and target.type == 'snowflake' | as_bool(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with prep as (
select
e.event_id,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.geo_country,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.derived_tstamp,
e.load_tstamp,
e.event_name,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1:eventType::varchar as event_type,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1:basisForProcessing::varchar as basis_for_processing,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1:consentUrl::varchar as consent_url,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1:consentVersion::varchar as consent_version,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1:consentScopes::array as consent_scopes,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1:domainsApplied::array as domains_applied,
e.unstruct_event_com_snowplowanalytics_snowplow_consent_preferences_1:gdprApplies::boolean as gdpr_applies,
e.unstruct_event_com_snowplowanalytics_snowplow_cmp_visible_1:elapsedTime::float as cmp_load_time
from {{ ref("snowplow_web_base_events_this_run") }} as e
where event_name in ('cmp_visible', 'consent_preferences')
and {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
{% if var("snowplow__ua_bot_filter", false) %}
{{ filter_bots() }}
{% endif %}
)
select
p.event_id,
p.domain_userid,
p.original_domain_userid,
p.user_id,
p.geo_country,
p.page_view_id,
p.domain_sessionid,
p.original_domain_sessionid,
p.derived_tstamp,
p.load_tstamp,
p.event_name,
p.event_type,
p.basis_for_processing,
p.consent_url,
p.consent_version,
{{ snowplow_utils.get_array_to_string('consent_scopes', 'p', ', ') }} as consent_scopes,
{{ snowplow_utils.get_array_to_string('domains_applied', 'p', ', ') }} as domains_applied,
coalesce(p.gdpr_applies, false) as gdpr_applies,
p.cmp_load_time
from prep p
Depends On
- Models
- Macros
Referenced By
Snowplow Web Consent Log
models/optional_modules/consent/snowplow_web_consent_log.sql
Description
Incremental table showing the audit trail of consent and Consent Management Platform (cmp) events
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
event_id | A UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
derived_tstamp | Timestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’ | timestamp_ntz |
load_tstamp | The timestamp of the event landing the data warehouse. | timestamp_ntz |
event_name | Event name e.g. ‘link_click’ | text |
event_type | The action for the consent preferences of a user E.g allow_all | text |
basis_for_processing | GDPR lawful basis for data collection & processing | text |
consent_url | URI of the privacy policy related document | text |
consent_version | Version of the privacy policy related document | text |
consent_scopes | The scopes allowed after the user finalized his selection of consent preferences Eg ['analytics', 'functional', 'advertisement'] | text |
domains_applied | The domains for which this consent allows these preferences to persist to | text |
gdpr_applies | A boolean which determines if GDPR applies based on the user's geo-location | boolean |
cmp_load_time | The time taken for the consent box to be shown to the screen | float |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized= 'incremental',
enabled=var("snowplow__enable_consent", false),
unique_key='event_id',
upsert_date_key='derived_tstamp',
sort='derived_tstamp',
dist='event_id',
tags=["derived"],
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val = {
"field": "derived_tstamp",
"data_type": "timestamp"
}, databricks_val = 'derived_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_consent(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize= true
)
}}
select
*
{% if target.type in ['databricks', 'spark'] -%}
, DATE(derived_tstamp) as derived_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_consent_events_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
Depends On
- Models
- Macros
Referenced By
Snowplow Web Consent Scope Status
models/optional_modules/consent/snowplow_web_consent_scope_status.sql
Description
Aggregate of current number of users consented to each consent scope
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
scope | Consent scope | text |
total_consent | The number of consent events corresponding to a scope | number |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='table',
enabled=var("snowplow__enable_consent", false)
)
}}
with arrays as (
select
u.domain_userid,
{{ snowplow_utils.get_split_to_array('last_consent_scopes', 'u', ', ') }} as scope_array
from {{ ref('snowplow_web_consent_users') }} u
where is_latest_version
)
, unnesting as (
{{ snowplow_utils.unnest('domain_userid', 'scope_array', 'consent_scope', 'arrays') }}
)
select
replace(replace(replace(cast(consent_scope as {{ snowplow_utils.type_max_string() }}), '"', ''), '[', ''), ']', '') as scope,
count(*) as total_consent
from unnesting
group by 1
Depends On
Snowplow Web Consent Totals
models/optional_modules/consent/snowplow_web_consent_totals.sql
Description
Summary of the latest consent status as per consent version
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
consent_version | Version of the privacy policy related document | text |
version_start_tstamp | The first allow_all consent event belonging to a consent version | timestamp_ntz |
consent_scopes | The scopes allowed after the user finalized his selection of consent preferences Eg ['analytics', 'functional', 'advertisement'] | text |
consent_url | URI of the privacy policy related document | text |
domains_applied | The domains for which this consent allows these preferences to persist to | text |
is_latest_version | A boolean to filter whether the last consent or cmp visible event is sent after the latest privacy policy version goes live | boolean |
last_allow_all_event | The timestamp of the last allow_all consent event generated by the latest consent version | timestamp_ntz |
total_visitors | The number of visitors who have visited since the last consent version is live | number |
allow_all | Total number of users whose last consent event sent from the latest consent version has type allow_all | number |
allow_selected | Total number of users whose last consent event sent from the latest consent version has type allow_selected | number |
allow | Total number of users whose last consent event sent from the latest consent version has type allow | number |
pending | Total number of users whose last consent event sent from the latest consent version has type pending | number |
denied | Total number of users whose last consent event sent from the latest consent version has type denied | number |
expired | Total number of users whose last consent event sent from the latest consent version has type expired | number |
withdrawn | Total number of users whose last consent event sent from the latest consent version has type withdrawn | number |
implicit_consent | number | |
expires_in_six_months | The total number of users whose consent expires in six months (only the offical version is taken into account) | number |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='table',
enabled=var("snowplow__enable_consent", false),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with totals as (
select
last_consent_version,
count(distinct domain_userid) as total_visitors,
count(case when last_consent_event_type ='allow_all' then 1 end) as allow_all,
count(case when last_consent_event_type ='allow_selected' then 1 end) as allow_selected,
count(case when last_consent_event_type IN ('allow_all', 'allow_selected') then 1 end) as allow,
count(case when last_consent_event_type = 'pending' then 1 end) as pending,
count(case when last_consent_event_type = 'deny_all' then 1 end) as denied,
count(case when last_consent_event_type = 'expired' then 1 end) as expired,
count(case when last_consent_event_type = 'withdrawn' then 1 end) as withdrawn,
count(case when last_consent_event_type = 'implicit_consent' then 1 end) as implicit_consent,
count(case when {{ dateadd('year', '1', 'last_consent_event_tstamp') }} <= {{ dateadd('month', '6', 'current_date') }}
and last_consent_event_type <> 'expired'
and {{ dateadd('year', '1', 'last_consent_event_tstamp') }} > current_date then 1 end) as expires_in_six_months
from {{ ref('snowplow_web_consent_users') }}
where last_consent_event_type is not null
group by 1
)
select
v.*,
t.total_visitors,
t.allow_all,
t.allow_selected,
t.allow,
t.pending,
t.denied,
t.expired,
t.withdrawn,
t.implicit_consent,
t.expires_in_six_months
from {{ ref('snowplow_web_consent_versions') }} v
left join totals t
on t.last_consent_version = v.consent_version
order by v.version_start_tstamp desc
Depends On
Snowplow Web Consent Users
models/optional_modules/consent/snowplow_web_consent_users.sql
Description
By user consent stats
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
cmp_events | The number of cmp_visible events the user has generated | number |
consent_events | The number of cosent events the user has generated | number |
last_cmp_event_tstamp | The timestamp of the last cmp_visible event | timestamp_ntz |
last_consent_event_tstamp | The timestamp of the last consent event after the cmp_visible event happened | timestamp_ntz |
last_consent_event_type | The type of the last consent event after the cmp_visible event happened | text |
last_consent_scopes | The list of consent scopes in connection with the last consent event | text |
last_consent_version | The privacy policy version in connection with the last consent event | text |
last_consent_url | The privacy policy url in connection with the last consent event | text |
last_domains_applied | The domains for which the last consent event applies | text |
last_processed_event | The timestamp of the last processed event needed for the incremental logic | timestamp_ntz |
is_latest_version | A boolean to filter whether the last consent or cmp visible event is sent after the latest privacy policy version goes live | boolean |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
enabled=var("snowplow__enable_consent", false),
unique_key='domain_userid',
sort = 'last_consent_event_tstamp',
dist = 'domain_userid',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
{% if is_incremental() %}
{%- set lower_limit, upper_limit = snowplow_utils.return_limits_from_model(this,
'last_processed_event',
'last_processed_event') %}
{% endif %}
with base as (
select
domain_userid,
user_id,
geo_country,
max(load_tstamp) as last_processed_event,
count(case when event_name = 'cmp_visible' then 1 end) as cmp_events,
count(case when event_name = 'consent_preferences' then 1 end) as consent_events,
max(case when event_name = 'cmp_visible' then derived_tstamp end) as last_cmp_event_tstamp,
row_number() over(partition by domain_userid order by max(load_tstamp) desc) as latest_event_by_user_rank
from {{ ref('snowplow_web_consent_log') }}
{% if is_incremental() %} -- and it has not been processed yet
where load_tstamp > {{ upper_limit }}
{% endif %}
group by 1,2,3
)
, latest_consents as (
select
domain_userid,
derived_tstamp as last_consent_event_tstamp,
event_type as last_consent_event_type,
consent_scopes as last_consent_scopes,
consent_version as last_consent_version,
consent_url as last_consent_url,
domains_applied as last_domains_applied,
row_number() over(partition by domain_userid order by load_tstamp desc) as latest_consent_event_by_user_rank
from {{ ref('snowplow_web_consent_log') }}
where event_name = 'consent_preferences'
{% if is_incremental() %} -- and it has not been processed yet
and load_tstamp > {{ upper_limit }}
{% endif %}
)
{% if is_incremental() %}
select
b.domain_userid,
b.user_id,
b.geo_country,
coalesce(b.cmp_events, 0) + coalesce(t.cmp_events, 0) as cmp_events,
coalesce(b.consent_events, 0) + coalesce(t.consent_events, 0) as consent_events,
b.last_cmp_event_tstamp,
l.last_consent_event_tstamp,
l.last_consent_event_type,
l.last_consent_scopes,
l.last_consent_version,
l.last_consent_url,
l.last_domains_applied,
b.last_processed_event,
case when v.is_latest_version then True else False end as is_latest_version
from base b
left join latest_consents l
on b.domain_userid = l.domain_userid
left join {{ ref('snowplow_web_consent_versions')}} v
on v.consent_version = l.last_consent_version
left join {{ this }} t
on t.domain_userid = b.domain_userid
where (l.latest_consent_event_by_user_rank = 1 or l.domain_userid is null)
and b.latest_event_by_user_rank = 1
{% else %}
select
b.domain_userid,
b.user_id,
b.geo_country,
b.cmp_events,
b.consent_events,
b.last_cmp_event_tstamp,
l.last_consent_event_tstamp,
l.last_consent_event_type,
l.last_consent_scopes,
l.last_consent_version,
l.last_consent_url,
l.last_domains_applied,
b.last_processed_event,
case when v.is_latest_version then True else False end as is_latest_version
from base b
left join latest_consents l
on b.domain_userid = l.domain_userid
left join {{ ref('snowplow_web_consent_versions') }} v
on v.consent_version = l.last_consent_version
where (l.latest_consent_event_by_user_rank = 1 or l.domain_userid is null)
and b.latest_event_by_user_rank = 1
{% endif %}
Depends On
- Models
- Macros
Referenced By
Snowplow Web Consent Versions
models/optional_modules/consent/snowplow_web_consent_versions.sql
Description
Used to keep track of each consent version and its validity
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
consent_version | Version of the privacy policy related document | text |
version_start_tstamp | The time_stamp of the first allow_all event related to a consent version | timestamp_ntz |
consent_scopes | The scopes allowed after the user finalized his selection of consent preferences Eg ['analytics', 'functional', 'advertisement'] | text |
consent_url | URI of the privacy policy related document | text |
domains_applied | The domains for which this consent allows these preferences to persist to | text |
is_latest_version | A boolean to filter whether the last consent or cmp visible event is sent after the latest privacy policy version goes live | boolean |
last_allow_all_event | The timestamp of the last allow_all event used for the incremental update | timestamp_ntz |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
enabled=var("snowplow__enable_consent", false),
unique_key='consent_version',
sort = 'version_start_tstamp',
dist = 'consent_version',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
{% if is_incremental() %}
{%- set lower_limit, upper_limit = snowplow_utils.return_limits_from_model(this,
'last_allow_all_event',
'last_allow_all_event') %}
{% endif %}
with consent_versions as (
select
consent_version,
consent_scopes,
consent_url,
domains_applied,
min(derived_tstamp) as version_start_tstamp,
max(load_tstamp) as last_allow_all_event
from {{ ref('snowplow_web_consent_log') }}
where event_name <> 'cmp_visible' and event_type = 'allow_all'
{% if is_incremental() %} -- and it has not been processed yet
and load_tstamp > {{ upper_limit }}
{% endif %}
group by 1,2,3,4
)
, latest_version as (
select
consent_version,
version_start_tstamp
from consent_versions
order by 2 desc limit 1
)
{% if is_incremental() %}
select
v.consent_version,
least(v.version_start_tstamp, t.version_start_tstamp) as version_start_tstamp,
v.consent_scopes,
v.consent_url,
v.domains_applied,
case when l.consent_version is not null then True else False end is_latest_version,
v.last_allow_all_event
from consent_versions v
left join latest_version l
on v.consent_version = l.consent_version
left join {{ this }} t
on t.consent_version = v.consent_version
{% else %}
select
v.consent_version,
v.version_start_tstamp,
v.consent_scopes,
v.consent_url,
v.domains_applied,
case when l.consent_version is not null then True else False end is_latest_version,
v.last_allow_all_event
from consent_versions v
left join latest_version l
on v.consent_version = l.consent_version
{% endif %}
Depends On
- Models
- Macros
Referenced By
Snowplow Web Incremental Manifest
models/base/manifest/snowplow_web_incremental_manifest.sql
Description
This incremental table is a manifest of the timestamp of the latest event consumed per model within the snowplow-web
package as well as any models leveraging the incremental framework provided by the package. The latest event's timestamp is based off collector_tstamp
. This table is used to determine what events should be processed in the next run of the model.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
model | The name of the model. | text |
last_success | The latest event consumed by the model, based on collector_tstamp | timestamp_ntz |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
full_refresh=snowplow_web.allow_refresh(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
}
)
}}
{% set incremental_manifest_query = snowplow_utils.base_create_snowplow_incremental_manifest() %}
{{ incremental_manifest_query }}
Depends On
- Macros
Referenced By
- Models
- model.snowplow_media_player.snowplow_media_player_base
- model.snowplow_web.snowplow_web_base_new_event_limits
- model.snowplow_web.snowplow_web_base_sessions_lifecycle_manifest
- model.snowplow_web.snowplow_web_consent_events_this_run
- model.snowplow_web.snowplow_web_consent_log
- model.snowplow_web.snowplow_web_page_views
- model.snowplow_web.snowplow_web_sessions
- model.snowplow_web.snowplow_web_user_mapping
- model.snowplow_web.snowplow_web_users
- model.snowplow_web.snowplow_web_vital_events_this_run
- model.snowplow_web.snowplow_web_vitals
- model.snowplow_web.snowplow_web_vitals_this_run
Snowplow Web Page View Events
models/page_views/scratch/<adaptor>/snowplow_web_page_view_events.sql
Description
This is a staging table containing all the page view events for a given run of the Web model. It is the first step in the page views module and therefore does not contain metrics such as engaged time and scroll depth which are calculated in subsequent models. It is also where the de-duping of page_view_id
's occurs
File Paths
- default
models/page_views/scratch/default/snowplow_web_page_view_events.sql
Details
Columns
Column Name | Description |
---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
Code
- default
{{
config(
sort='start_tstamp',
dist='page_view_id'
)
}}
with page_view_events as (
select
ev.page_view_id,
ev.event_id,
ev.app_id,
-- user fields
ev.user_id,
ev.domain_userid,
ev.network_userid,
-- session fields
ev.domain_sessionid,
ev.domain_sessionidx,
-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.derived_tstamp as start_tstamp,
ev.doc_width,
ev.doc_height,
ev.page_title,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,
ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,
ev.page_referrer,
ev.refr_urlscheme ,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,
ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone ,
ev.user_ipaddress,
ev.useragent,
ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,
row_number() over (partition by ev.page_view_id order by ev.derived_tstamp, ev.dvce_created_tstamp) as page_view_id_dedupe_index
from {{ ref('snowplow_web_base_events_this_run') }} as ev
where ev.event_name = 'page_view'
and ev.page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
)
select
pv.page_view_id,
pv.event_id,
pv.app_id,
-- user fields
pv.user_id,
pv.domain_userid,
pv.network_userid,
-- session fields
pv.domain_sessionid,
pv.domain_sessionidx,
-- timestamp fields
pv.dvce_created_tstamp,
pv.collector_tstamp,
pv.derived_tstamp,
pv.start_tstamp,
pv.doc_width,
pv.doc_height,
pv.page_title,
pv.page_url,
pv.page_urlscheme,
pv.page_urlhost,
pv.page_urlpath,
pv.page_urlquery,
pv.page_urlfragment,
pv.mkt_medium,
pv.mkt_source,
pv.mkt_term,
pv.mkt_content,
pv.mkt_campaign,
pv.mkt_clickid,
pv.mkt_network,
pv.page_referrer,
pv.refr_urlscheme ,
pv.refr_urlhost,
pv.refr_urlpath,
pv.refr_urlquery,
pv.refr_urlfragment,
pv.refr_medium,
pv.refr_source,
pv.refr_term,
pv.geo_country,
pv.geo_region,
pv.geo_region_name,
pv.geo_city,
pv.geo_zipcode,
pv.geo_latitude,
pv.geo_longitude,
pv.geo_timezone ,
pv.user_ipaddress,
pv.useragent,
pv.br_lang,
pv.br_viewwidth,
pv.br_viewheight,
pv.br_colordepth,
pv.br_renderengine,
pv.os_timezone,
row_number() over (partition by pv.domain_sessionid order by pv.derived_tstamp, pv.dvce_created_tstamp) as page_view_in_session_index --Moved to post dedupe, unlike V1 web model.
from page_view_events as pv
where page_view_id_dedupe_index = 1
Depends On
Snowplow Web Page Views
models/page_views/snowplow_web_page_views.sql
Description
This derived incremental table contains all historic page views and should be the end point for any analysis or BI tools.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
event_id | A UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
app_id | Application ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev. | text |
platform | text | |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
stitched_user_id | The user_id (or domain_user_id if not found during user stitching) when the snowplow__session_stitching or snowplow__page_view_stitching variable is enabled otherwise NULL. The user_id field to be used for stitching can be overwritten by var('snowplow__user_stitching_id') . | text |
network_userid | User ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’ | text |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionidx | A visit / session index e.g. 3 | number |
page_view_in_session_index | A page view index within a single session | number |
page_views_in_session | Distinct count of page_view_id within a session | number |
dvce_created_tstamp | Timestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’ | timestamp_ntz |
collector_tstamp | Time stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’ | timestamp_ntz |
derived_tstamp | Timestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’ | timestamp_ntz |
start_tstamp | Timestamp for the start of the page view, based on derived_tstamp | timestamp_ntz |
end_tstamp | Timestamp for the end of the page view, based on derived_tstamp | timestamp_ntz |
model_tstamp | The current timestamp when the model processed this row. | timestamp_ntz |
engaged_time_in_s | Time spent by the user on the page calculated using page pings. | number |
absolute_time_in_s | The time in seconds between the start_tstamp and end_tstamp | number |
horizontal_pixels_scrolled | Distance the user scrolled horizontally in pixels | number |
vertical_pixels_scrolled | Distance the user scrolled vertically in pixels | number |
horizontal_percentage_scrolled | Percentage of page scrolled horizontally | float |
vertical_percentage_scrolled | Percentage of page scrolled vertically | float |
doc_width | The page’s width in pixels e.g. 1024 | number |
doc_height | The page’s height in pixels e.g. 3000 | number |
content_group | text | |
page_title | Web page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’ | text |
page_url | The page URL e.g. ‘http://www.example.com’ | text |
page_urlscheme | Scheme aka protocol e.g. ‘https’ | text |
page_urlhost | Host aka domain e.g. ‘“www.snowplow.io’ | text |
page_urlpath | Path to page e.g. ‘/product/index.html’ | text |
page_urlquery | Querystring e.g. ‘id=GTM-DLRG’ | text |
page_urlfragment | Fragment aka anchor e.g. ‘4-conclusion’ | text |
mkt_medium | Type of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’ | text |
mkt_source | The company / website where the traffic came from e.g. ‘Google’, ‘Facebook’ | text |
mkt_term | Any keywords associated with the referrer e.g. ‘new age tarot decks’ | text |
mkt_content | The content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723 | text |
mkt_campaign | The campaign ID e.g. ‘diageo-123’ | text |
mkt_clickid | The click ID e.g. ‘ac3d8e459’ | text |
mkt_network | The ad network to which the click ID belongs e.g. ‘DoubleClick’ | text |
default_channel_group | text | |
page_referrer | URL of the referrer e.g. ‘http://www.referrer.com’ | text |
refr_urlscheme | Referer scheme e.g. ‘http’ | text |
refr_urlhost | Referer host e.g. ‘www.bing.com’ | text |
refr_urlpath | Referer page path e.g. ‘/images/search’ | text |
refr_urlquery | Referer URL querystring e.g. ‘q=psychic+oracle+cards’ | text |
refr_urlfragment | Referer URL fragment | text |
refr_medium | Type of referer e.g. ‘search’, ‘internal’ | text |
refr_source | Name of referer if recognised e.g. ‘Bing images’ | text |
refr_term | Keywords if source is a search engine e.g. ‘psychic oracle cards’ | text |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
geo_region | ISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’ | text |
geo_region_name | Visitor region name e.g. ‘Florida’ | text |
geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
geo_zipcode | Postcode the visitor is in e.g. ‘94109’ | text |
geo_latitude | Visitor location latitude e.g. 37.443604 | float |
geo_longitude | Visitor location longitude e.g. -122.4124 | float |
geo_timezone | Visitor timezone name e.g. ‘Europe/London’ | text |
user_ipaddress | User IP address e.g. ‘92.231.54.234’ | text |
useragent | Raw useragent | text |
br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
br_viewwidth | Viewport width e.g. 1000 | number |
br_viewheight | Viewport height e.g. 1000 | number |
br_colordepth | Bit depth of the browser color palette e.g. 24 | text |
br_renderengine | Browser rendering engine e.g. ‘GECKO’ | text |
os_timezone | Client operating system timezone e.g. ‘Europe/London’ | text |
category | Category based on activity if the IP/UA is a spider or robot, BROWSER otherwise | text |
primary_impact | Whether the spider or robot would affect page impression measurement, ad impression measurement, both or none | text |
reason | Type of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwise | text |
spider_or_robot | True if the IP address or user agent checked against the list is a spider or robot, false otherwise | boolean |
useragent_family | Useragent family (browser) name | text |
useragent_major | Useragent major version | text |
useragent_minor | Useragent minor version | text |
useragent_patch | Useragent patch version | text |
useragent_version | Full version of the useragent | text |
os_family | Operating system family e.g. ‘Linux’ | text |
os_major | Operation system major version | text |
os_minor | Operation system minor version | text |
os_patch | Operation system patch version | text |
os_patch_minor | Operation system patch minor version | text |
os_version | Operation system full version | text |
device_family | Device type | text |
device_class | Class of device e.g. phone | text |
device_category | text | |
screen_resolution | text | |
agent_class | Class of agent e.g. browser | text |
agent_name | Name of agent e.g. Chrome | text |
agent_name_version | Name and version of agent e.g. Chrome 53.0.2785.124 | text |
agent_name_version_major | Name and major version of agent e.g. Chrome 53 | text |
agent_version | Version of agent e.g. 53.0.2785.124 | text |
agent_version_major | Major version of agent e.g. 53 | text |
device_brand | Brand of device e.g. Google | text |
device_name | Name of device e.g. Google Nexus 6 | text |
device_version | Version of device e.g. 6.0 | text |
layout_engine_class | Class of layout engine e.g. Browser | text |
layout_engine_name | Name of layout engine e.g. Blink | text |
layout_engine_name_version | Name and version of layout engine e.g. Blink 53.0 | text |
layout_engine_name_version_major | Name and major version of layout engine e.g. Blink 53 | text |
layout_engine_version | Version of layout engine e.g. 53.0 | text |
layout_engine_version_major | Major version of layout engine e.g. 53 | text |
operating_system_class | Class of the OS e.g. Mobile | text |
operating_system_name | Name of the OS e.g. Android | text |
operating_system_name_version | Name and version of the OS e.g. Android 7.0 | text |
operating_system_version | Version of the OS e.g. 7.0 | text |
v_collector | text | |
event_id2 | text |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
on_schema_change='append_new_columns',
unique_key='page_view_id',
upsert_date_key='start_tstamp',
sort='start_tstamp',
dist='page_view_id',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val = {
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_page_views(),
tags=["derived"],
post_hook="{{ snowplow_web.stitch_user_identifiers(
enabled=var('snowplow__page_view_stitching')
) }}",
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}
select *
{% if target.type in ['databricks', 'spark'] -%}
, DATE(start_tstamp) as start_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_page_views_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
Depends On
- Models
- Macros
Snowplow Web Page Views This Run
models/page_views/scratch/<adaptor>/snowplow_web_page_views_this_run.sql
Description
This staging table contains all the page views for the given run of the Web model. It possess all the same columns as snowplow_web_page_views
. If building a custom module that requires page view events, this is the table you should reference.
Type: Table
File Paths
- bigquery
- databricks
- default
- snowflake
models/page_views/scratch/bigquery/snowplow_web_page_views_this_run.sql
models/page_views/scratch/databricks/snowplow_web_page_views_this_run.sql
models/page_views/scratch/default/snowplow_web_page_views_this_run.sql
models/page_views/scratch/snowflake/snowplow_web_page_views_this_run.sql
Details
Columns
Column Name | Description | Type |
---|---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
event_id | A UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
app_id | Application ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev. | text |
platform | Platform e.g. ‘web’ | text |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
stitched_user_id | text | |
network_userid | User ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’ | text |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionidx | A visit / session index e.g. 3 | number |
page_view_in_session_index | A page view index within a single session | number |
page_views_in_session | Distinct count of page_view_id within a session | number |
dvce_created_tstamp | Timestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’ | timestamp_ntz |
collector_tstamp | Time stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’ | timestamp_ntz |
derived_tstamp | Timestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’ | timestamp_ntz |
start_tstamp | Timestamp for the start of the page view, based on derived_tstamp | timestamp_ntz |
end_tstamp | Timestamp for the end of the page view, based on derived_tstamp | timestamp_ntz |
model_tstamp | The current timestamp when the model processed this row. | timestamp_ntz |
engaged_time_in_s | Time spent by the user on the page calculated using page pings. | number |
absolute_time_in_s | The time in seconds between the start_tstamp and end_tstamp | number |
horizontal_pixels_scrolled | Distance the user scrolled horizontally in pixels | number |
vertical_pixels_scrolled | Distance the user scrolled vertically in pixels | number |
horizontal_percentage_scrolled | Percentage of page scrolled horizontally | float |
vertical_percentage_scrolled | Percentage of page scrolled vertically | float |
doc_width | The page’s width in pixels e.g. 1024 | number |
doc_height | The page’s height in pixels e.g. 3000 | number |
content_group | Custom defined rule builder to classify page based on url title, etc. Defined in macro content_group_query . | text |
page_title | Web page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’ | text |
page_url | The page URL e.g. ‘http://www.example.com’ | text |
page_urlscheme | Scheme aka protocol e.g. ‘https’ | text |
page_urlhost | Host aka domain e.g. ‘“www.snowplow.io’ | text |
page_urlpath | Path to page e.g. ‘/product/index.html’ | text |
page_urlquery | Querystring e.g. ‘id=GTM-DLRG’ | text |
page_urlfragment | Fragment aka anchor e.g. ‘4-conclusion’ | text |
mkt_medium | Type of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’ | text |
mkt_source | The company / website where the traffic came from e.g. ‘Google’, ‘Facebook’ | text |
mkt_term | Any keywords associated with the referrer e.g. ‘new age tarot decks’ | text |
mkt_content | The content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723 | text |
mkt_campaign | The campaign ID e.g. ‘diageo-123’ | text |
mkt_clickid | The click ID e.g. ‘ac3d8e459’ | text |
mkt_network | The ad network to which the click ID belongs e.g. ‘DoubleClick’ | text |
default_channel_group | The channels by which users arrived at your site. | text |
page_referrer | URL of the referrer e.g. ‘http://www.referrer.com’ | text |
refr_urlscheme | Referer scheme e.g. ‘http’ | text |
refr_urlhost | Referer host e.g. ‘www.bing.com’ | text |
refr_urlpath | Referer page path e.g. ‘/images/search’ | text |
refr_urlquery | Referer URL querystring e.g. ‘q=psychic+oracle+cards’ | text |
refr_urlfragment | Referer URL fragment | text |
refr_medium | Type of referer e.g. ‘search’, ‘internal’ | text |
refr_source | Name of referer if recognised e.g. ‘Bing images’ | text |
refr_term | Keywords if source is a search engine e.g. ‘psychic oracle cards’ | text |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
geo_region | ISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’ | text |
geo_region_name | Visitor region name e.g. ‘Florida’ | text |
geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
geo_zipcode | Postcode the visitor is in e.g. ‘94109’ | text |
geo_latitude | Visitor location latitude e.g. 37.443604 | float |
geo_longitude | Visitor location longitude e.g. -122.4124 | float |
geo_timezone | Visitor timezone name e.g. ‘Europe/London’ | text |
user_ipaddress | User IP address e.g. ‘92.231.54.234’ | text |
useragent | Raw useragent | text |
br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
br_viewwidth | Viewport width e.g. 1000 | number |
br_viewheight | Viewport height e.g. 1000 | number |
br_colordepth | Bit depth of the browser color palette e.g. 24 | text |
br_renderengine | Browser rendering engine e.g. ‘GECKO’ | text |
os_timezone | Client operating system timezone e.g. ‘Europe/London’ | text |
category | Category based on activity if the IP/UA is a spider or robot, BROWSER otherwise | text |
primary_impact | Whether the spider or robot would affect page impression measurement, ad impression measurement, both or none | text |
reason | Type of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwise | text |
spider_or_robot | True if the IP address or user agent checked against the list is a spider or robot, false otherwise | boolean |
useragent_family | Useragent family (browser) name | text |
useragent_major | Useragent major version | text |
useragent_minor | Useragent minor version | text |
useragent_patch | Useragent patch version | text |
useragent_version | Full version of the useragent | text |
os_family | Operating system family e.g. ‘Linux’ | text |
os_major | Operation system major version | text |
os_minor | Operation system minor version | text |
os_patch | Operation system patch version | text |
os_patch_minor | Operation system patch minor version | text |
os_version | Operation system full version | text |
device_family | Device type | text |
device_class | Class of device e.g. phone | text |
device_category | Derived from the device_class it is used to classify devices into one of the following: Desktop / Mobile / Tablet / Other. | text |
screen_resolution | Combines dvce_screenwidth x dvce_screenheight. | text |
agent_class | Class of agent e.g. browser | text |
agent_name | Name of agent e.g. Chrome | text |
agent_name_version | Name and version of agent e.g. Chrome 53.0.2785.124 | text |
agent_name_version_major | Name and major version of agent e.g. Chrome 53 | text |
agent_version | Version of agent e.g. 53.0.2785.124 | text |
agent_version_major | Major version of agent e.g. 53 | text |
device_brand | Brand of device e.g. Google | text |
device_name | Name of device e.g. Google Nexus 6 | text |
device_version | Version of device e.g. 6.0 | text |
layout_engine_class | Class of layout engine e.g. Browser | text |
layout_engine_name | Name of layout engine e.g. Blink | text |
layout_engine_name_version | Name and version of layout engine e.g. Blink 53.0 | text |
layout_engine_name_version_major | Name and major version of layout engine e.g. Blink 53 | text |
layout_engine_version | Version of layout engine e.g. 53.0 | text |
layout_engine_version_major | Major version of layout engine e.g. 53 | text |
operating_system_class | Class of the OS e.g. Mobile | text |
operating_system_name | Name of the OS e.g. Android | text |
operating_system_name_version | Name and version of the OS e.g. Android 7.0 | text |
operating_system_version | Version of the OS e.g. 7.0 | text |
v_collector | text | |
event_id2 | text |
Code
- bigquery
- databricks
- default
- snowflake
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
with prep as (
select
ev.page_view_id,
ev.event_id,
ev.app_id,
ev.platform,
-- user fields
ev.user_id,
ev.domain_userid,
ev.original_domain_userid,
{% if var('snowplow__page_view_stitching') %}
-- updated with mapping as part of post hook on derived page_views table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ type_string() }}) as stitched_user_id,
{% endif %}
ev.network_userid,
-- session fields
ev.domain_sessionid,
ev.original_domain_sessionid,
ev.domain_sessionidx,
-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.derived_tstamp as start_tstamp,
ev.doc_width,
ev.doc_height,
ev.page_title,
{{ content_group_query() }} as content_group,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,
-- marketing fields
ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,
{{ channel_group_query() }} as default_channel_group,
-- referrer fields
ev.page_referrer,
ev.refr_urlscheme,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,
-- geo fields
ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone ,
ev.user_ipaddress,
ev.useragent,
ev.dvce_screenwidth || 'x' || ev.dvce_screenheight as screen_resolution,
ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields: set iab variable to true to enable
{{ snowplow_utils.get_optional_fields(
enabled=var('snowplow__enable_iab', false),
fields=iab_fields(),
col_prefix='contexts_com_iab_snowplow_spiders_and_robots_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='ev') }},
-- ua parser enrichment fields: set ua_parser variable to true to enable
{{ snowplow_utils.get_optional_fields(
enabled=var('snowplow__enable_ua', false),
fields=ua_fields(),
col_prefix='contexts_com_snowplowanalytics_snowplow_ua_parser_context_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='ev') }},
-- yauaa enrichment fields: set yauaa variable to true to enable
{{ snowplow_utils.get_optional_fields(
enabled=var('snowplow__enable_yauaa', false),
fields=yauaa_fields(),
col_prefix='contexts_nl_basjes_yauaa_context_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='ev') }}
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__page_view_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} as ev
left join {{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
where ev.event_name = 'page_view'
and ev.page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
qualify row_number() over (partition by ev.page_view_id order by ev.derived_tstamp, ev.dvce_created_tstamp) = 1
)
, page_view_events as (
select
ev.page_view_id,
ev.event_id,
ev.app_id,
ev.platform,
-- user fields
ev.user_id,
ev.domain_userid,
ev.original_domain_userid,
ev.stitched_user_id,
ev.network_userid,
-- session fields
ev.domain_sessionid,
ev.original_domain_sessionid,
ev.domain_sessionidx,
row_number() over (partition by ev.domain_sessionid order by ev.derived_tstamp, ev.dvce_created_tstamp, ev.event_id) AS page_view_in_session_index,
-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.start_tstamp,
coalesce(t.end_tstamp, ev.derived_tstamp) as end_tstamp, -- only page views with pings will have a row in table t
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
coalesce(t.engaged_time_in_s, 0) as engaged_time_in_s, -- where there are no pings, engaged time is 0.
{{ datediff('ev.derived_tstamp', 'coalesce(t.end_tstamp, ev.derived_tstamp)', 'second') }} as absolute_time_in_s,
sd.hmax as horizontal_pixels_scrolled,
sd.vmax as vertical_pixels_scrolled,
sd.relative_hmax as horizontal_percentage_scrolled,
sd.relative_vmax as vertical_percentage_scrolled,
ev.doc_width,
ev.doc_height,
ev.content_group,
ev.page_title,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,
ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,
ev.default_channel_group,
ev.page_referrer,
ev.refr_urlscheme,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,
ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone,
ev.user_ipaddress,
ev.useragent,
ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,
ev.category,
ev.primary_impact,
ev.reason,
ev.spider_or_robot,
ev.useragent_family,
ev.useragent_major,
ev.useragent_minor,
ev.useragent_patch,
ev.useragent_version,
ev.os_family,
ev.os_major,
ev.os_minor,
ev.os_patch,
ev.os_patch_minor,
ev.os_version,
ev.device_family,
ev.device_class,
case when ev.device_class = 'Desktop' then 'Desktop'
when ev.device_class = 'Phone' then 'Mobile'
when ev.device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
ev.screen_resolution,
ev.agent_class,
ev.agent_name,
ev.agent_name_version,
ev.agent_name_version_major,
ev.agent_version,
ev.agent_version_major,
ev.device_brand,
ev.device_name,
ev.device_version,
ev.layout_engine_class,
ev.layout_engine_name,
ev.layout_engine_name_version,
ev.layout_engine_name_version_major,
ev.layout_engine_version,
ev.layout_engine_version_major,
ev.operating_system_class,
ev.operating_system_name,
ev.operating_system_name_version,
ev.operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, ev.{{col}}
{%- endfor -%}
{%- endif %}
from prep ev
left join {{ ref('snowplow_web_pv_engaged_time') }} t
on ev.page_view_id = t.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and ev.domain_sessionid = t.domain_sessionid {% endif %}
left join {{ ref('snowplow_web_pv_scroll_depth') }} sd
on ev.page_view_id = sd.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and ev.domain_sessionid = sd.domain_sessionid {% endif %}
)
select
ev.page_view_id,
ev.event_id,
ev.app_id,
ev.platform,
-- user fields
ev.user_id,
ev.domain_userid,
ev.original_domain_userid,
ev.stitched_user_id,
ev.network_userid,
-- session fields
ev.domain_sessionid,
ev.original_domain_sessionid,
ev.domain_sessionidx,
ev.page_view_in_session_index,
max(ev.page_view_in_session_index) over (partition by ev.domain_sessionid) as page_views_in_session,
-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.start_tstamp,
ev.end_tstamp,
ev.model_tstamp,
ev.engaged_time_in_s,
ev.absolute_time_in_s,
ev.horizontal_pixels_scrolled,
ev.vertical_pixels_scrolled,
ev.horizontal_percentage_scrolled,
ev.vertical_percentage_scrolled,
ev.doc_width,
ev.doc_height,
ev.content_group,
ev.page_title,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,
ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,
ev.default_channel_group,
ev.page_referrer,
ev.refr_urlscheme,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,
ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone,
ev.user_ipaddress,
ev.useragent,
ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,
ev.category,
ev.primary_impact,
ev.reason,
ev.spider_or_robot,
ev.useragent_family,
ev.useragent_major,
ev.useragent_minor,
ev.useragent_patch,
ev.useragent_version,
ev.os_family,
ev.os_major,
ev.os_minor,
ev.os_patch,
ev.os_patch_minor,
ev.os_version,
ev.device_family,
ev.device_class,
ev.device_category,
ev.screen_resolution,
ev.agent_class,
ev.agent_name,
ev.agent_name_version,
ev.agent_name_version_major,
ev.agent_version,
ev.agent_version_major,
ev.device_brand,
ev.device_name,
ev.device_version,
ev.layout_engine_class,
ev.layout_engine_name,
ev.layout_engine_name_version,
ev.layout_engine_name_version_major,
ev.layout_engine_version,
ev.layout_engine_version_major,
ev.operating_system_class,
ev.operating_system_name,
ev.operating_system_name_version,
ev.operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, ev.{{col}}
{%- endfor -%}
{%- endif %}
from page_view_events ev
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
with prep as (
select
ev.page_view_id,
ev.event_id,
ev.app_id,
ev.platform,
-- user fields
ev.user_id,
ev.domain_userid,
ev.original_domain_userid,
{% if var('snowplow__page_view_stitching') %}
-- updated with mapping as part of post hook on derived page_views table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ type_string() }}) as stitched_user_id,
{% endif %}
ev.network_userid,
-- session fields
ev.domain_sessionid,
ev.original_domain_sessionid,
ev.domain_sessionidx,
-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.derived_tstamp as start_tstamp,
ev.doc_width,
ev.doc_height,
ev.page_title,
{{ content_group_query() }} as content_group,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,
-- marketing fields
ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,
{{ channel_group_query() }} as default_channel_group,
-- referrer fields
ev.page_referrer,
ev.refr_urlscheme,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,
-- geo fields
ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone ,
ev.user_ipaddress,
ev.useragent,
ev.dvce_screenwidth || 'x' || ev.dvce_screenheight as screen_resolution,
ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields: set iab variable to true to enable
{{snowplow_web.get_iab_context_fields()}},
-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields()}},
-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields()}}
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__page_view_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} as ev
left join {{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
where ev.event_name = 'page_view'
and ev.page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
qualify row_number() over (partition by ev.page_view_id order by ev.derived_tstamp, ev.dvce_created_tstamp) = 1
)
, page_view_events as (
select
p.page_view_id,
p.event_id,
p.app_id,
p.platform,
-- user fields
p.user_id,
p.domain_userid,
p.original_domain_userid,
p.stitched_user_id,
p.network_userid,
-- session fields
p.domain_sessionid,
p.original_domain_sessionid,
p.domain_sessionidx,
row_number() over (partition by p.domain_sessionid order by p.derived_tstamp, p.dvce_created_tstamp, p.event_id) AS page_view_in_session_index,
-- timestamp fields
p.dvce_created_tstamp,
p.collector_tstamp,
p.derived_tstamp,
p.start_tstamp,
coalesce(t.end_tstamp, p.derived_tstamp) as end_tstamp, -- only page views with pings will have a row in table t
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
coalesce(t.engaged_time_in_s, 0) as engaged_time_in_s, -- where there are no pings, engaged time is 0.
{{ datediff('p.derived_tstamp', 'coalesce(t.end_tstamp, p.derived_tstamp)', 'second') }} as absolute_time_in_s,
sd.hmax as horizontal_pixels_scrolled,
sd.vmax as vertical_pixels_scrolled,
sd.relative_hmax as horizontal_percentage_scrolled,
sd.relative_vmax as vertical_percentage_scrolled,
p.doc_width,
p.doc_height,
p.content_group,
p.page_title,
p.page_url,
p.page_urlscheme,
p.page_urlhost,
p.page_urlpath,
p.page_urlquery,
p.page_urlfragment,
p.mkt_medium,
p.mkt_source,
p.mkt_term,
p.mkt_content,
p.mkt_campaign,
p.mkt_clickid,
p.mkt_network,
p.default_channel_group,
p.page_referrer,
p.refr_urlscheme,
p.refr_urlhost,
p.refr_urlpath,
p.refr_urlquery,
p.refr_urlfragment,
p.refr_medium,
p.refr_source,
p.refr_term,
p.geo_country,
p.geo_region,
p.geo_region_name,
p.geo_city,
p.geo_zipcode,
p.geo_latitude,
p.geo_longitude,
p.geo_timezone,
p.user_ipaddress,
p.useragent,
p.screen_resolution,
p.br_lang,
p.br_viewwidth,
p.br_viewheight,
p.br_colordepth,
p.br_renderengine,
p.os_timezone,
p.category,
p.primary_impact,
p.reason,
p.spider_or_robot,
p.useragent_family,
p.useragent_major,
p.useragent_minor,
p.useragent_patch,
p.useragent_version,
p.os_family,
p.os_major,
p.os_minor,
p.os_patch,
p.os_patch_minor,
p.os_version,
p.device_family,
p.device_class,
p.agent_class,
p.agent_name,
p.agent_name_version,
p.agent_name_version_major,
p.agent_version,
p.agent_version_major,
p.device_brand,
p.device_name,
p.device_version,
p.layout_engine_class,
p.layout_engine_name,
p.layout_engine_name_version,
p.layout_engine_name_version_major,
p.layout_engine_version,
p.layout_engine_version_major,
p.operating_system_class,
p.operating_system_name,
p.operating_system_name_version,
p.operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, p.{{col}}
{%- endfor -%}
{%- endif %}
from prep p
left join {{ ref('snowplow_web_pv_engaged_time') }} t
on p.page_view_id = t.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and p.domain_sessionid = t.domain_sessionid {% endif %}
left join {{ ref('snowplow_web_pv_scroll_depth') }} sd
on p.page_view_id = sd.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and p.domain_sessionid = sd.domain_sessionid {% endif %}
)
select
pve.page_view_id,
pve.event_id,
pve.app_id,
pve.platform,
-- user fields
pve.user_id,
pve.domain_userid,
pve.original_domain_userid,
pve.stitched_user_id,
pve.network_userid,
-- session fields
pve.domain_sessionid,
pve.original_domain_sessionid,
pve.domain_sessionidx,
pve.page_view_in_session_index,
max(pve.page_view_in_session_index) over (partition by pve.domain_sessionid) as page_views_in_session,
-- timestamp fields
pve.dvce_created_tstamp,
pve.collector_tstamp,
pve.derived_tstamp,
pve.start_tstamp,
pve.end_tstamp,
pve.model_tstamp,
pve.engaged_time_in_s,
pve.absolute_time_in_s,
pve.horizontal_pixels_scrolled,
pve.vertical_pixels_scrolled,
pve.horizontal_percentage_scrolled,
pve.vertical_percentage_scrolled,
pve.doc_width,
pve.doc_height,
pve.content_group,
pve.page_title,
pve.page_url,
pve.page_urlscheme,
pve.page_urlhost,
pve.page_urlpath,
pve.page_urlquery,
pve.page_urlfragment,
pve.mkt_medium,
pve.mkt_source,
pve.mkt_term,
pve.mkt_content,
pve.mkt_campaign,
pve.mkt_clickid,
pve.mkt_network,
pve.page_referrer,
pve.refr_urlscheme,
pve.refr_urlhost,
pve.refr_urlpath,
pve.refr_urlquery,
pve.refr_urlfragment,
pve.refr_medium,
pve.refr_source,
pve.refr_term,
pve.default_channel_group,
pve.geo_country,
pve.geo_region,
pve.geo_region_name,
pve.geo_city,
pve.geo_zipcode,
pve.geo_latitude,
pve.geo_longitude,
pve.geo_timezone,
pve.user_ipaddress,
pve.useragent,
pve.br_lang,
pve.br_viewwidth,
pve.br_viewheight,
pve.br_colordepth,
pve.br_renderengine,
pve.os_timezone,
pve.category,
pve.primary_impact,
pve.reason,
pve.spider_or_robot,
pve.useragent_family,
pve.useragent_major,
pve.useragent_minor,
pve.useragent_patch,
pve.useragent_version,
pve.os_family,
pve.os_major,
pve.os_minor,
pve.os_patch,
pve.os_patch_minor,
pve.os_version,
pve.device_family,
pve.device_class,
case when pve.device_class = 'Desktop' then 'Desktop'
when pve.device_class = 'Phone' then 'Mobile'
when pve.device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
pve.screen_resolution,
pve.agent_class,
pve.agent_name,
pve.agent_name_version,
pve.agent_name_version_major,
pve.agent_version,
pve.agent_version_major,
pve.device_brand,
pve.device_name,
pve.device_version,
pve.layout_engine_class,
pve.layout_engine_name,
pve.layout_engine_name_version,
pve.layout_engine_name_version_major,
pve.layout_engine_version,
pve.layout_engine_version_major,
pve.operating_system_class,
pve.operating_system_name,
pve.operating_system_name_version,
pve.operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, pve.{{col}}
{%- endfor -%}
{%- endif %}
from page_view_events pve
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
with prep as (
select
ev.page_view_id,
ev.event_id,
ev.app_id,
ev.platform,
-- user fields
ev.user_id,
ev.domain_userid,
ev.original_domain_userid,
{% if var('snowplow__page_view_stitching') %}
-- updated with mapping as part of post hook on derived page_views table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ snowplow_utils.type_max_string() }}) as stitched_user_id,
{% endif %}
ev.network_userid,
-- session fields
ev.domain_sessionid,
ev.original_domain_sessionid,
ev.domain_sessionidx,
-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.derived_tstamp as start_tstamp,
ev.doc_width,
ev.doc_height,
ev.page_title,
{{ content_group_query() }} as content_group,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,
-- marketing fields
ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,
{{ channel_group_query() }} as default_channel_group,
-- referrer fields
ev.page_referrer,
ev.refr_urlscheme ,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,
-- geo fields
ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone ,
ev.user_ipaddress,
ev.useragent,
ev.dvce_screenwidth || 'x' || ev.dvce_screenheight as screen_resolution,
ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields: set iab variable to true to enable
{{snowplow_web.get_iab_context_fields('ev')}},
-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields('ev')}},
-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields('ev')}},
row_number() over (partition by ev.page_view_id order by ev.derived_tstamp, ev.dvce_created_tstamp) as page_view_id_dedupe_index
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__page_view_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} as ev
left join {{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
where ev.event_name = 'page_view'
and ev.page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
)
, page_view_events as (
select
p.page_view_id,
p.event_id,
p.app_id,
p.platform,
-- user fields
p.user_id,
p.domain_userid,
p.original_domain_userid,
p.stitched_user_id,
p.network_userid,
-- session fields
p.domain_sessionid,
p.original_domain_sessionid,
p.domain_sessionidx,
row_number() over (partition by p.domain_sessionid order by p.derived_tstamp, p.dvce_created_tstamp, p.event_id) AS page_view_in_session_index,
-- timestamp fields
p.dvce_created_tstamp,
p.collector_tstamp,
p.derived_tstamp,
p.start_tstamp,
coalesce(t.end_tstamp, p.derived_tstamp) as end_tstamp, -- only page views with pings will have a row in table t
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
coalesce(t.engaged_time_in_s, 0) as engaged_time_in_s, -- where there are no pings, engaged time is 0.
{{ datediff('p.derived_tstamp', 'coalesce(t.end_tstamp, p.derived_tstamp)', 'second') }} as absolute_time_in_s,
sd.hmax as horizontal_pixels_scrolled,
sd.vmax as vertical_pixels_scrolled,
sd.relative_hmax as horizontal_percentage_scrolled,
sd.relative_vmax as vertical_percentage_scrolled,
p.doc_width,
p.doc_height,
p.content_group,
p.page_title,
p.page_url,
p.page_urlscheme,
p.page_urlhost,
p.page_urlpath,
p.page_urlquery,
p.page_urlfragment,
p.mkt_medium,
p.mkt_source,
p.mkt_term,
p.mkt_content,
p.mkt_campaign,
p.mkt_clickid,
p.mkt_network,
p.default_channel_group,
p.page_referrer,
p.refr_urlscheme ,
p.refr_urlhost,
p.refr_urlpath,
p.refr_urlquery,
p.refr_urlfragment,
p.refr_medium,
p.refr_source,
p.refr_term,
p.geo_country,
p.geo_region,
p.geo_region_name,
p.geo_city,
p.geo_zipcode,
p.geo_latitude,
p.geo_longitude,
p.geo_timezone ,
p.user_ipaddress,
p.useragent,
p.screen_resolution,
p.br_lang,
p.br_viewwidth,
p.br_viewheight,
p.br_colordepth,
p.br_renderengine,
p.os_timezone,
p.iab_category as category,
p.iab_primary_impact as primary_impact,
p.iab_reason as reason,
p.iab_spider_or_robot as spider_or_robot,
p.ua_useragent_family as useragent_family,
p.ua_useragent_major as useragent_major,
p.ua_useragent_minor as useragent_minor,
p.ua_useragent_patch as useragent_patch,
p.ua_useragent_version as useragent_version,
p.ua_os_family as os_family,
p.ua_os_major as os_major,
p.ua_os_minor as os_minor,
p.ua_os_patch as os_patch,
p.ua_os_patch_minor as os_patch_minor,
p.ua_os_version as os_version,
p.ua_device_family as device_family,
p.yauaa_device_class as device_class,
p.yauaa_agent_class as agent_class,
p.yauaa_agent_name as agent_name,
p.yauaa_agent_name_version as agent_name_version,
p.yauaa_agent_name_version_major as agent_name_version_major,
p.yauaa_agent_version as agent_version,
p.yauaa_agent_version_major as agent_version_major,
p.yauaa_device_brand as device_brand,
p.yauaa_device_name as device_name,
p.yauaa_device_version as device_version,
p.yauaa_layout_engine_class as layout_engine_class,
p.yauaa_layout_engine_name as layout_engine_name,
p.yauaa_layout_engine_name_version as layout_engine_name_version,
p.yauaa_layout_engine_name_version_major as layout_engine_name_version_major,
p.yauaa_layout_engine_version as layout_engine_version,
p.yauaa_layout_engine_version_major as layout_engine_version_major,
p.yauaa_operating_system_class as operating_system_class,
p.yauaa_operating_system_name as operating_system_name,
p.yauaa_operating_system_name_version as operating_system_name_version,
p.yauaa_operating_system_version as operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, p.{{col}}
{%- endfor -%}
{%- endif %}
from prep as p
left join {{ ref('snowplow_web_pv_engaged_time') }} t
on p.page_view_id = t.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and p.domain_sessionid = t.domain_sessionid {% endif %}
left join {{ ref('snowplow_web_pv_scroll_depth') }} sd
on p.page_view_id = sd.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and p.domain_sessionid = sd.domain_sessionid {% endif %}
where page_view_id_dedupe_index = 1
)
select
pve.page_view_id,
pve.event_id,
pve.app_id,
pve.platform,
-- user fields
pve.user_id,
pve.domain_userid,
pve.original_domain_userid,
pve.stitched_user_id,
pve.network_userid,
-- session fields
pve.domain_sessionid,
pve.original_domain_sessionid,
pve.domain_sessionidx,
pve.page_view_in_session_index,
max(pve.page_view_in_session_index) over (partition by pve.domain_sessionid) as page_views_in_session,
-- timestamp fields
pve.dvce_created_tstamp,
pve.collector_tstamp,
pve.derived_tstamp,
pve.start_tstamp,
pve.end_tstamp,
pve.model_tstamp,
pve.engaged_time_in_s,
pve.absolute_time_in_s,
pve.horizontal_pixels_scrolled,
pve.vertical_pixels_scrolled,
pve.horizontal_percentage_scrolled,
pve.vertical_percentage_scrolled,
pve.doc_width,
pve.doc_height,
pve.content_group,
pve.page_title,
pve.page_url,
pve.page_urlscheme,
pve.page_urlhost,
pve.page_urlpath,
pve.page_urlquery,
pve.page_urlfragment,
pve.mkt_medium,
pve.mkt_source,
pve.mkt_term,
pve.mkt_content,
pve.mkt_campaign,
pve.mkt_clickid,
pve.mkt_network,
pve.default_channel_group,
pve.page_referrer,
pve.refr_urlscheme,
pve.refr_urlhost,
pve.refr_urlpath,
pve.refr_urlquery,
pve.refr_urlfragment,
pve.refr_medium,
pve.refr_source,
pve.refr_term,
pve.geo_country,
pve.geo_region,
pve.geo_region_name,
pve.geo_city,
pve.geo_zipcode,
pve.geo_latitude,
pve.geo_longitude,
pve.geo_timezone,
pve.user_ipaddress,
pve.useragent,
pve.br_lang,
pve.br_viewwidth,
pve.br_viewheight,
pve.br_colordepth,
pve.br_renderengine,
pve.os_timezone,
pve.category,
pve.primary_impact,
pve.reason,
pve.spider_or_robot,
pve.useragent_family,
pve.useragent_major,
pve.useragent_minor,
pve.useragent_patch,
pve.useragent_version,
pve.os_family,
pve.os_major,
pve.os_minor,
pve.os_patch,
pve.os_patch_minor,
pve.os_version,
pve.device_family,
pve.device_class,
case when pve.device_class = 'Desktop' then 'Desktop'
when pve.device_class = 'Phone' then 'Mobile'
when pve.device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
pve.screen_resolution,
pve.agent_class,
pve.agent_name,
pve.agent_name_version,
pve.agent_name_version_major,
pve.agent_version,
pve.agent_version_major,
pve.device_brand,
pve.device_name,
pve.device_version,
pve.layout_engine_class,
pve.layout_engine_name,
pve.layout_engine_name_version,
pve.layout_engine_name_version_major,
pve.layout_engine_version,
pve.layout_engine_version_major,
pve.operating_system_class,
pve.operating_system_name,
pve.operating_system_name_version,
pve.operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, pve.{{col}}
{%- endfor -%}
{%- endif %}
from page_view_events pve
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with prep as (
select
ev.page_view_id,
ev.event_id,
ev.app_id,
ev.platform,
-- user fields
ev.user_id,
ev.domain_userid,
ev.original_domain_userid,
{% if var('snowplow__page_view_stitching') %}
-- updated with mapping as part of post hook on derived page_views table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ type_string() }}) as stitched_user_id,
{% endif %}
ev.network_userid,
-- session fields
ev.domain_sessionid,
ev.original_domain_sessionid,
ev.domain_sessionidx,
-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.derived_tstamp as start_tstamp,
ev.doc_width,
ev.doc_height,
ev.page_title,
{{ content_group_query() }} as content_group,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,
-- marketing fields
ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,
{{ channel_group_query() }} as default_channel_group,
-- referrer fields
ev.page_referrer,
ev.refr_urlscheme,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,
-- geo fields
ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone ,
ev.user_ipaddress,
ev.useragent,
ev.dvce_screenwidth || 'x' || ev.dvce_screenheight as screen_resolution,
ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields: set iab variable to true to enable
{{snowplow_web.get_iab_context_fields()}},
-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields()}},
-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields()}}
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__page_view_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} as ev
left join {{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
where ev.event_name = 'page_view'
and ev.page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
qualify row_number() over (partition by ev.page_view_id order by ev.derived_tstamp, ev.dvce_created_tstamp) = 1
)
, page_view_events as (
select
p.page_view_id,
p.event_id,
p.app_id,
p.platform,
-- user fields
p.user_id,
p.domain_userid,
p.original_domain_userid,
p.stitched_user_id,
p.network_userid,
-- session fields
p.domain_sessionid,
p.original_domain_sessionid,
p.domain_sessionidx,
row_number() over (partition by p.domain_sessionid order by p.derived_tstamp, p.dvce_created_tstamp, p.event_id) AS page_view_in_session_index,
-- timestamp fields
p.dvce_created_tstamp,
p.collector_tstamp,
p.derived_tstamp,
p.start_tstamp,
coalesce(t.end_tstamp, p.derived_tstamp) as end_tstamp, -- only page views with pings will have a row in table t
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
coalesce(t.engaged_time_in_s, 0) as engaged_time_in_s, -- where there are no pings, engaged time is 0.
{{ datediff('p.derived_tstamp', 'coalesce(t.end_tstamp, p.derived_tstamp)', 'second') }} as absolute_time_in_s,
sd.hmax as horizontal_pixels_scrolled,
sd.vmax as vertical_pixels_scrolled,
sd.relative_hmax as horizontal_percentage_scrolled,
sd.relative_vmax as vertical_percentage_scrolled,
p.doc_width,
p.doc_height,
p.content_group,
p.page_title,
p.page_url,
p.page_urlscheme,
p.page_urlhost,
p.page_urlpath,
p.page_urlquery,
p.page_urlfragment,
p.mkt_medium,
p.mkt_source,
p.mkt_term,
p.mkt_content,
p.mkt_campaign,
p.mkt_clickid,
p.mkt_network,
p.default_channel_group,
p.page_referrer,
p.refr_urlscheme,
p.refr_urlhost,
p.refr_urlpath,
p.refr_urlquery,
p.refr_urlfragment,
p.refr_medium,
p.refr_source,
p.refr_term,
p.geo_country,
p.geo_region,
p.geo_region_name,
p.geo_city,
p.geo_zipcode,
p.geo_latitude,
p.geo_longitude,
p.geo_timezone,
p.user_ipaddress,
p.useragent,
p.screen_resolution,
p.br_lang,
p.br_viewwidth,
p.br_viewheight,
p.br_colordepth,
p.br_renderengine,
p.os_timezone,
p.category,
p.primary_impact,
p.reason,
p.spider_or_robot,
p.useragent_family,
p.useragent_major,
p.useragent_minor,
p.useragent_patch,
p.useragent_version,
p.os_family,
p.os_major,
p.os_minor,
p.os_patch,
p.os_patch_minor,
p.os_version,
p.device_family,
p.device_class,
p.agent_class,
p.agent_name,
p.agent_name_version,
p.agent_name_version_major,
p.agent_version,
p.agent_version_major,
p.device_brand,
p.device_name,
p.device_version,
p.layout_engine_class,
p.layout_engine_name,
p.layout_engine_name_version,
p.layout_engine_name_version_major,
p.layout_engine_version,
p.layout_engine_version_major,
p.operating_system_class,
p.operating_system_name,
p.operating_system_name_version,
p.operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, p.{{col}}
{%- endfor -%}
{%- endif %}
from prep p
left join {{ ref('snowplow_web_pv_engaged_time') }} t
on p.page_view_id = t.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and p.domain_sessionid = t.domain_sessionid {% endif %}
left join {{ ref('snowplow_web_pv_scroll_depth') }} sd
on p.page_view_id = sd.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and p.domain_sessionid = sd.domain_sessionid {% endif %}
)
select
pve.page_view_id,
pve.event_id,
pve.app_id,
pve.platform,
-- user fields
pve.user_id,
pve.domain_userid,
pve.original_domain_userid,
pve.stitched_user_id,
pve.network_userid,
-- session fields
pve.domain_sessionid,
pve.original_domain_sessionid,
pve.domain_sessionidx,
pve.page_view_in_session_index,
max(pve.page_view_in_session_index) over (partition by pve.domain_sessionid) as page_views_in_session,
-- timestamp fields
pve.dvce_created_tstamp,
pve.collector_tstamp,
pve.derived_tstamp,
pve.start_tstamp,
pve.end_tstamp,
pve.model_tstamp,
pve.engaged_time_in_s,
pve.absolute_time_in_s,
pve.horizontal_pixels_scrolled,
pve.vertical_pixels_scrolled,
pve.horizontal_percentage_scrolled,
pve.vertical_percentage_scrolled,
pve.doc_width,
pve.doc_height,
pve.content_group,
pve.page_title,
pve.page_url,
pve.page_urlscheme,
pve.page_urlhost,
pve.page_urlpath,
pve.page_urlquery,
pve.page_urlfragment,
pve.mkt_medium,
pve.mkt_source,
pve.mkt_term,
pve.mkt_content,
pve.mkt_campaign,
pve.mkt_clickid,
pve.mkt_network,
pve.default_channel_group,
pve.page_referrer,
pve.refr_urlscheme,
pve.refr_urlhost,
pve.refr_urlpath,
pve.refr_urlquery,
pve.refr_urlfragment,
pve.refr_medium,
pve.refr_source,
pve.refr_term,
pve.geo_country,
pve.geo_region,
pve.geo_region_name,
pve.geo_city,
pve.geo_zipcode,
pve.geo_latitude,
pve.geo_longitude,
pve.geo_timezone,
pve.user_ipaddress,
pve.useragent,
pve.br_lang,
pve.br_viewwidth,
pve.br_viewheight,
pve.br_colordepth,
pve.br_renderengine,
pve.os_timezone,
pve.category,
pve.primary_impact,
pve.reason,
pve.spider_or_robot,
pve.useragent_family,
pve.useragent_major,
pve.useragent_minor,
pve.useragent_patch,
pve.useragent_version,
pve.os_family,
pve.os_major,
pve.os_minor,
pve.os_patch,
pve.os_patch_minor,
pve.os_version,
pve.device_family,
pve.device_class,
case when pve.device_class = 'Desktop' then 'Desktop'
when pve.device_class = 'Phone' then 'Mobile'
when pve.device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
pve.screen_resolution,
pve.agent_class,
pve.agent_name,
pve.agent_name_version,
pve.agent_name_version_major,
pve.agent_version,
pve.agent_version_major,
pve.device_brand,
pve.device_name,
pve.device_version,
pve.layout_engine_class,
pve.layout_engine_name,
pve.layout_engine_name_version,
pve.layout_engine_name_version_major,
pve.layout_engine_version,
pve.layout_engine_version_major,
pve.operating_system_class,
pve.operating_system_name,
pve.operating_system_name_version,
pve.operating_system_version
{%- if var('snowplow__page_view_passthroughs', []) -%}
{%- for col in passthrough_names %}
, pve.{{col}}
{%- endfor -%}
{%- endif %}
from page_view_events pve
Depends On
- Models
- Macros
- macro.dbt.datediff
- macro.dbt.type_string
- macro.snowplow_utils.current_timestamp_in_utc
- macro.snowplow_utils.get_optional_fields
- macro.snowplow_utils.set_query_tag
- macro.snowplow_web.channel_group_query
- macro.snowplow_web.content_group_query
- macro.snowplow_web.filter_bots
- macro.snowplow_web.get_iab_context_fields
- macro.snowplow_web.get_ua_context_fields
- macro.snowplow_web.get_yauaa_context_fields
- macro.snowplow_web.iab_fields
- macro.snowplow_web.ua_fields
- macro.snowplow_web.yauaa_fields
Referenced By
Snowplow Web Pv Engaged Time
models/page_views/scratch/snowplow_web_pv_engaged_time.sql
Description
This model calculates the time a visitor spent engaged on a given page view. This is calculated using the number of page ping events received for that page view.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionid | text | |
end_tstamp | timestamp_ntz | |
engaged_time_in_s | number |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
select
ev.page_view_id,
{% if var('snowplow__limit_page_views_to_session', true) %}
ev.domain_sessionid,
{% endif %}
max(ev.derived_tstamp) as end_tstamp,
-- aggregate pings:
-- divides epoch tstamps by snowplow__heartbeat to get distinct intervals
-- floor rounds to nearest integer - duplicates all evaluate to the same number
-- count(distinct) counts duplicates only once
-- adding snowplow__min_visit_length accounts for the page view event itself.
{{ var("snowplow__heartbeat", 10) }} * (count(distinct(floor({{ snowplow_utils.to_unixtstamp('ev.dvce_created_tstamp') }}/{{ var("snowplow__heartbeat", 10) }}))) - 1) + {{ var("snowplow__min_visit_length", 5) }} as engaged_time_in_s
from {{ ref('snowplow_web_base_events_this_run') }} as ev
where ev.event_name = 'page_ping'
and ev.page_view_id is not null
group by 1 {% if var('snowplow__limit_page_views_to_session', true) %}, 2 {% endif %}
Depends On
- Models
- Macros
Referenced By
Snowplow Web Pv Iab
models/page_views/scratch/<adaptor>/snowplow_web_pv_iab.sql
Description
Redshift and Postgres only. This is a staging table containing context data generated by the IAB enrichment for the events in the given run. The model is disable by default. Refer to the docs to enable.
The IAB Spiders & Robots enrichment uses the IAB/ABC International Spiders and Bots List to determine whether an event was produced by a user or a robot/spider based on its’ IP address and user agent.
File Paths
- default
models/page_views/scratch/default/snowplow_web_pv_iab.sql
Details
Columns
Column Name | Description |
---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
Code
- default
{{
config(
enabled=(var('snowplow__enable_iab', false) and target.type in ['redshift', 'postgres'] | as_bool())
)
}}
with base as (
select
pv.page_view_id,
iab.category,
iab.primary_impact,
iab.reason,
iab.spider_or_robot,
row_number() over (partition by pv.page_view_id order by pv.collector_tstamp) as dedupe_index
from {{ var('snowplow__iab_context') }} iab
inner join {{ ref('snowplow_web_page_view_events') }} pv
on iab.root_id = pv.event_id
and iab.root_tstamp = pv.collector_tstamp
where iab.root_tstamp >= (select lower_limit from {{ ref('snowplow_web_pv_limits') }})
and iab.root_tstamp <= (select upper_limit from {{ ref('snowplow_web_pv_limits') }})
)
select *
from base
where dedupe_index = 1
Snowplow Web Pv Limits
models/page_views/scratch/<adaptor>/snowplow_web_pv_limits.sql
Description
This model calculates the lower and upper limit for the page views events in the given run. This is based taking the min and max collector_tstamp
across all page views. It is used to improve performance when selected rows from the various context tables such as the UA parser table.
File Paths
- default
models/page_views/scratch/default/snowplow_web_pv_limits.sql
Details
Code
- default
select
min(collector_tstamp) as lower_limit,
max(collector_tstamp) as upper_limit
from {{ ref('snowplow_web_base_events_this_run') }}
where page_view_id is not null
Snowplow Web Pv Scroll Depth
models/page_views/scratch/snowplow_web_pv_scroll_depth.sql
Description
This model calculates the horizontal and vertical scroll depth of the visitor on a given page view. Such metrics are useful when assessing engagement on a page view.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionid | text | |
doc_width | number | |
doc_height | number | |
br_viewwidth | number | |
br_viewheight | number | |
hmin | number | |
hmax | number | |
vmin | number | |
vmax | number | |
relative_hmin | float | |
relative_hmax | float | |
relative_vmin | float | |
relative_vmax | float |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with prep as (
select
ev.page_view_id,
{% if var('snowplow__limit_page_views_to_session', true) %}
ev.domain_sessionid,
{% endif %}
max(ev.doc_width) as doc_width,
max(ev.doc_height) as doc_height,
max(ev.br_viewwidth) as br_viewwidth,
max(ev.br_viewheight) as br_viewheight,
-- coalesce replaces null with 0 (because the page view event does send an offset)
-- greatest prevents outliers (negative offsets)
-- least also prevents outliers (offsets greater than the docwidth or docheight)
least(greatest(min(coalesce(ev.pp_xoffset_min, 0)), 0), max(ev.doc_width)) as hmin, -- should be zero
least(greatest(max(coalesce(ev.pp_xoffset_max, 0)), 0), max(ev.doc_width)) as hmax,
least(greatest(min(coalesce(ev.pp_yoffset_min, 0)), 0), max(ev.doc_height)) as vmin, -- should be zero (edge case: not zero because the pv event is missing)
least(greatest(max(coalesce(ev.pp_yoffset_max, 0)), 0), max(ev.doc_height)) as vmax
from {{ ref('snowplow_web_base_events_this_run') }} as ev
where ev.event_name in ('page_view', 'page_ping')
and ev.page_view_id is not null
and ev.doc_height > 0 -- exclude problematic (but rare) edge case
and ev.doc_width > 0 -- exclude problematic (but rare) edge case
group by 1 {% if var('snowplow__limit_page_views_to_session', true) %}, 2 {% endif %}
)
select
page_view_id,
{% if var('snowplow__limit_page_views_to_session', true) %}
domain_sessionid,
{% endif %}
doc_width,
doc_height,
br_viewwidth,
br_viewheight,
hmin,
hmax,
vmin,
vmax,
cast(round(100*(greatest(hmin, 0)/cast(doc_width as {{ type_float() }}))) as {{ type_float() }}) as relative_hmin, -- brackets matter: because hmin is of type int, we need to divide before we multiply by 100 or we risk an overflow
cast(round(100*(least(hmax + br_viewwidth, doc_width)/cast(doc_width as {{ type_float() }}))) as {{ type_float() }}) as relative_hmax,
cast(round(100*(greatest(vmin, 0)/cast(doc_height as {{ type_float() }}))) as {{ type_float() }}) as relative_vmin,
cast(round(100*(least(vmax + br_viewheight, doc_height)/cast(doc_height as {{ type_float() }}))) as {{ type_float() }}) as relative_vmax -- not zero when a user hasn't scrolled because it includes the non-zero viewheight
from prep
Depends On
- Models
- Macros
Referenced By
Snowplow Web Pv Ua Parser
models/page_views/scratch/<adaptor>/snowplow_web_pv_ua_parser.sql
Description
Redshift and Postgres only. This is a staging table containing context data generated by the UA parser enrichment for the events in the given run. The model is disable by default. Refer to the docs to enable.
File Paths
- default
models/page_views/scratch/default/snowplow_web_pv_ua_parser.sql
Details
Columns
Column Name | Description |
---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
Code
- default
{{
config(
enabled=(var('snowplow__enable_ua', false) and target.type in ['redshift', 'postgres'] | as_bool())
)
}}
with base as (
select
pv.page_view_id,
ua.useragent_family,
ua.useragent_major,
ua.useragent_minor,
ua.useragent_patch,
ua.useragent_version,
ua.os_family,
ua.os_major,
ua.os_minor,
ua.os_patch,
ua.os_patch_minor,
ua.os_version,
ua.device_family,
row_number() over (partition by pv.page_view_id order by pv.collector_tstamp) as dedupe_index
from {{ var('snowplow__ua_parser_context') }} as ua
inner join {{ ref('snowplow_web_page_view_events') }} pv
on ua.root_id = pv.event_id
and ua.root_tstamp = pv.collector_tstamp
where ua.root_tstamp >= (select lower_limit from {{ ref('snowplow_web_pv_limits') }})
and ua.root_tstamp <= (select upper_limit from {{ ref('snowplow_web_pv_limits') }})
)
select *
from base
where dedupe_index = 1
Snowplow Web Pv Yauaa
models/page_views/scratch/<adaptor>/snowplow_web_pv_yauaa.sql
Description
Redshift and Postgres only. This is a staging table containing context data generated by the YAUAA enrichment. The model is disable by default. Refer to the docs to enable.
File Paths
- default
models/page_views/scratch/default/snowplow_web_pv_yauaa.sql
Details
Columns
Column Name | Description |
---|---|
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
Code
- default
{{
config(
enabled=(var('snowplow__enable_yauaa', false) and target.type in ['redshift', 'postgres'] | as_bool())
)
}}
with base as (
select
pv.page_view_id,
ya.device_class,
ya.agent_class,
ya.agent_name,
ya.agent_name_version,
ya.agent_name_version_major,
ya.agent_version,
ya.agent_version_major,
ya.device_brand,
ya.device_name,
ya.device_version,
ya.layout_engine_class,
ya.layout_engine_name,
ya.layout_engine_name_version,
ya.layout_engine_name_version_major,
ya.layout_engine_version,
ya.layout_engine_version_major,
ya.operating_system_class,
ya.operating_system_name,
ya.operating_system_name_version,
ya.operating_system_version,
row_number() over (partition by pv.page_view_id order by pv.collector_tstamp) as dedupe_index
from {{ var('snowplow__yauaa_context') }} ya
inner join {{ ref('snowplow_web_page_view_events') }} pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp
where ya.root_tstamp >= (select lower_limit from {{ ref('snowplow_web_pv_limits') }})
and ya.root_tstamp <= (select upper_limit from {{ ref('snowplow_web_pv_limits') }})
)
select *
from base
where dedupe_index = 1
Snowplow Web Sessions
models/sessions/snowplow_web_sessions.sql
Description
This derived incremental table contains all historic sessions and should be the end point for any analysis or BI tools.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
app_id | Application ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev. | text |
platform | text | |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionidx | A visit / session index e.g. 3 | number |
start_tstamp | Timestamp for the start of the session, based on derived_tstamp | timestamp_ntz |
end_tstamp | Timestamp for the end of the session, based on derived_tstamp | timestamp_ntz |
model_tstamp | The current timestamp when the model processed this row. | timestamp_ntz |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
stitched_user_id | The user_id (or domain_user_id if not found during user stitching) when the snowplow__session_stitching or snowplow__page_view_stitching variable is enabled otherwise NULL. The user_id field to be used for stitching can be overwritten by var('snowplow__user_stitching_id') . | text |
network_userid | User ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’ | text |
page_views | The number of distinct page views within a session | number |
engaged_time_in_s | The total time engaged by a user within a session | number |
event_counts | A json-type (warehouse dependant) object that gives counts for all event_names of events within the session (note you can get more page view events than true page_views based on their id) | variant |
total_events | Count of all events in the session | number |
is_engaged | A calculated boolean for if it was an engaged session or not, defined as having 2 or more page views, engaged time greater than or equal to 2 heartbeat lengths, or having any conversion event (if enabled) | boolean |
absolute_time_in_s | The time in seconds between the start_tstamp and end_tstamp | number |
first_page_title | The title of the first page visited within the session | text |
first_page_url | The url of the first page visited within the session | text |
first_page_urlscheme | The urlscheme of the first page visited within the session | text |
first_page_urlhost | The urlhost of the first page visited within the session | text |
first_page_urlpath | The urlpath of the first page visited within the session | text |
first_page_urlquery | The urlquery of the first page visited within the session | text |
first_page_urlfragment | The urlfragment of the first page visited within the session | text |
last_page_title | The title of the last page visited within the session | text |
last_page_url | The url of the last page visited within the session | text |
last_page_urlscheme | The urlscheme of the last page visited within the session | text |
last_page_urlhost | The urlhost of the last page visited within the session | text |
last_page_urlpath | The urlpath of the last page visited within the session | text |
last_page_urlquery | The urlquery of the last page visited within the session | text |
last_page_urlfragment | The urlfragment of the last page visited within the session | text |
referrer | The referrer associated with the first page view of the session | text |
refr_urlscheme | Referer scheme e.g. ‘http’ | text |
refr_urlhost | Referer host e.g. ‘www.bing.com’ | text |
refr_urlpath | Referer page path e.g. ‘/images/search’ | text |
refr_urlquery | Referer URL querystring e.g. ‘q=psychic+oracle+cards’ | text |
refr_urlfragment | Referer URL fragment | text |
refr_medium | Type of referer e.g. ‘search’, ‘internal’ | text |
refr_source | Name of referer if recognised e.g. ‘Bing images’ | text |
refr_term | Keywords if source is a search engine e.g. ‘psychic oracle cards’ | text |
mkt_medium | Type of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’ | text |
mkt_source | The company / website where the traffic came from e.g. ‘Google’, ‘Facebook’ | text |
mkt_term | Any keywords associated with the referrer e.g. ‘new age tarot decks’ | text |
mkt_content | The content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723 | text |
mkt_campaign | The campaign ID e.g. ‘diageo-123’ | text |
mkt_clickid | The click ID e.g. ‘ac3d8e459’ | text |
mkt_network | The ad network to which the click ID belongs e.g. ‘DoubleClick’ | text |
mkt_source_platform | Source platform based off the utm_source_platform parameter of the first page_url in the session. | text |
default_channel_group | The channels by which users arrived at your site. | text |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
geo_region | ISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’ | text |
geo_region_name | Visitor region name e.g. ‘Florida’ | text |
geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
geo_zipcode | Postcode the visitor is in e.g. ‘94109’ | text |
geo_latitude | Visitor location latitude e.g. 37.443604 | float |
geo_longitude | Visitor location longitude e.g. -122.4124 | float |
geo_timezone | Visitor timezone name e.g. ‘Europe/London’ | text |
geo_country_name | Name of the country the visitor is located in | text |
geo_continent | Name of the continent the visitor is located in | text |
last_geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
last_geo_region_name | Visitor region name e.g. ‘Florida’ | text |
last_geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
last_geo_country_name | Name of the country the visitor is located in | text |
last_geo_continent | Name of the continent the visitor is located in | text |
user_ipaddress | User IP address e.g. ‘92.231.54.234’ | text |
useragent | Raw useragent | text |
br_renderengine | Browser rendering engine e.g. ‘GECKO’ | text |
br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
last_br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
last_br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
os_timezone | Client operating system timezone e.g. ‘Europe/London’ | text |
category | Category based on activity if the IP/UA is a spider or robot, BROWSER otherwise | text |
primary_impact | Whether the spider or robot would affect page impression measurement, ad impression measurement, both or none | text |
reason | Type of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwise | text |
spider_or_robot | True if the IP address or user agent checked against the list is a spider or robot, false otherwise | boolean |
useragent_family | Useragent family (browser) name | text |
useragent_major | Useragent major version | text |
useragent_minor | Useragent minor version | text |
useragent_patch | Useragent patch version | text |
useragent_version | Full version of the useragent | text |
os_family | Operating system family e.g. ‘Linux’ | text |
os_major | Operation system major version | text |
os_minor | Operation system minor version | text |
os_patch | Operation system patch version | text |
os_patch_minor | Operation system patch minor version | text |
os_version | Operation system full version | text |
device_family | Device type | text |
device_class | Class of device e.g. phone | text |
device_category | text | |
screen_resolution | text | |
agent_class | Class of agent e.g. browser | text |
agent_name | Name of agent e.g. Chrome | text |
agent_name_version | Name and version of agent e.g. Chrome 53.0.2785.124 | text |
agent_name_version_major | Name and major version of agent e.g. Chrome 53 | text |
agent_version | Version of agent e.g. 53.0.2785.124 | text |
agent_version_major | Major version of agent e.g. 53 | text |
device_brand | Brand of device e.g. Google | text |
device_name | Name of device e.g. Google Nexus 6 | text |
device_version | Version of device e.g. 6.0 | text |
layout_engine_class | Class of layout engine e.g. Browser | text |
layout_engine_name | Name of layout engine e.g. Blink | text |
layout_engine_name_version | Name and version of layout engine e.g. Blink 53.0 | text |
layout_engine_name_version_major | Name and major version of layout engine e.g. Blink 53 | text |
layout_engine_version | Version of layout engine e.g. 53.0 | text |
layout_engine_version_major | Major version of layout engine e.g. 53 | text |
operating_system_class | Class of the OS e.g. Mobile | text |
operating_system_name | Name of the OS e.g. Android | text |
operating_system_name_version | Name and version of the OS e.g. Android 7.0 | text |
operating_system_version | Version of the OS e.g. 7.0 | text |
cv_view_page_volume | number | |
cv_view_page_events | array | |
cv_view_page_values | array | |
cv_view_page_total | float | |
cv_view_page_first_conversion | timestamp_ntz | |
cv_view_page_converted | boolean | |
cv__all_volume | number | |
cv__all_total | float | |
event_id | text | |
event_id2 | text |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
on_schema_change='append_new_columns',
unique_key='domain_sessionid',
upsert_date_key='start_tstamp',
sort='start_tstamp',
dist='domain_sessionid',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_sessions(),
tags=["derived"],
post_hook="{{ snowplow_web.stitch_user_identifiers(
enabled=var('snowplow__session_stitching')
) }}",
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}
select *
{% if target.type in ['databricks', 'spark'] -%}
, DATE(start_tstamp) as start_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_sessions_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
Depends On
- Models
- Macros
Referenced By
Snowplow Web Sessions This Run
models/sessions/scratch/<adaptor>/snowplow_web_sessions_this_run.sql
Description
This staging table contains all the sessions for the given run of the Web model. It possess all the same columns as snowplow_web_sessions
. If building a custom module that requires session level data, this is the table you should reference.
Type: Table
File Paths
- bigquery
- databricks
- default
- snowflake
models/sessions/scratch/bigquery/snowplow_web_sessions_this_run.sql
models/sessions/scratch/databricks/snowplow_web_sessions_this_run.sql
models/sessions/scratch/default/snowplow_web_sessions_this_run.sql
models/sessions/scratch/snowflake/snowplow_web_sessions_this_run.sql
Details
Columns
Column Name | Description | Type |
---|---|---|
app_id | Application ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev. | text |
platform | Platform e.g. ‘web’ | text |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
domain_sessionidx | A visit / session index e.g. 3 | number |
start_tstamp | Timestamp for the start of the session, based on derived_tstamp | timestamp_ntz |
end_tstamp | Timestamp for the end of the session, based on derived_tstamp | timestamp_ntz |
model_tstamp | The current timestamp when the model processed this row. | timestamp_ntz |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
stitched_user_id | text | |
network_userid | User ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’ | text |
page_views | The number of distinct page views within a session | number |
engaged_time_in_s | The total time engaged by a user within a session | number |
event_counts | A json-type (warehouse dependant) object that gives counts for all event_names of events within the session (note you can get more page view events than true page_views based on their id) | variant |
total_events | Count of all events in the session | number |
is_engaged | A calculated boolean for if it was an engaged session or not, defined as having 2 or more page views, engaged time greater than or equal to 2 heartbeat lengths, or having any conversion event (if enabled) | boolean |
absolute_time_in_s | The time in seconds between the start_tstamp and end_tstamp | number |
first_page_title | The title of the first page visited within the session | text |
first_page_url | The url of the first page visited within the session | text |
first_page_urlscheme | The urlscheme of the first page visited within the session | text |
first_page_urlhost | The urlhost of the first page visited within the session | text |
first_page_urlpath | The urlpath of the first page visited within the session | text |
first_page_urlquery | The urlquery of the first page visited within the session | text |
first_page_urlfragment | The urlfragment of the first page visited within the session | text |
last_page_title | The title of the last page visited within the session | text |
last_page_url | The url of the last page visited within the session | text |
last_page_urlscheme | The urlscheme of the last page visited within the session | text |
last_page_urlhost | The urlhost of the last page visited within the session | text |
last_page_urlpath | The urlpath of the last page visited within the session | text |
last_page_urlquery | The urlquery of the last page visited within the session | text |
last_page_urlfragment | The urlfragment of the last page visited within the session | text |
referrer | The referrer associated with the first page view of the session | text |
refr_urlscheme | Referer scheme e.g. ‘http’ | text |
refr_urlhost | Referer host e.g. ‘www.bing.com’ | text |
refr_urlpath | Referer page path e.g. ‘/images/search’ | text |
refr_urlquery | Referer URL querystring e.g. ‘q=psychic+oracle+cards’ | text |
refr_urlfragment | Referer URL fragment | text |
refr_medium | Type of referer e.g. ‘search’, ‘internal’ | text |
refr_source | Name of referer if recognised e.g. ‘Bing images’ | text |
refr_term | Keywords if source is a search engine e.g. ‘psychic oracle cards’ | text |
mkt_medium | Type of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’ | text |
mkt_source | The company / website where the traffic came from e.g. ‘Google’, ‘Facebook’ | text |
mkt_term | Any keywords associated with the referrer e.g. ‘new age tarot decks’ | text |
mkt_content | The content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723 | text |
mkt_campaign | The campaign ID e.g. ‘diageo-123’ | text |
mkt_clickid | The click ID e.g. ‘ac3d8e459’ | text |
mkt_network | The ad network to which the click ID belongs e.g. ‘DoubleClick’ | text |
mkt_source_platform | Source platform based off the utm_source_platform parameter of the first page_url in the session. | text |
default_channel_group | The channels by which users arrived at your site. | text |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
geo_region | ISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’ | text |
geo_region_name | Visitor region name e.g. ‘Florida’ | text |
geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
geo_zipcode | Postcode the visitor is in e.g. ‘94109’ | text |
geo_latitude | Visitor location latitude e.g. 37.443604 | float |
geo_longitude | Visitor location longitude e.g. -122.4124 | float |
geo_timezone | Visitor timezone name e.g. ‘Europe/London’ | text |
geo_country_name | Name of the country the visitor is located in | text |
geo_continent | Name of the continent the visitor is located in | text |
last_geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
last_geo_region_name | Visitor region name e.g. ‘Florida’ | text |
last_geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
last_geo_country_name | Name of the country the visitor is located in | text |
last_geo_continent | Name of the continent the visitor is located in | text |
user_ipaddress | User IP address e.g. ‘92.231.54.234’ | text |
useragent | Raw useragent | text |
br_renderengine | Browser rendering engine e.g. ‘GECKO’ | text |
br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
last_br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
last_br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
os_timezone | Client operating system timezone e.g. ‘Europe/London’ | text |
category | Category based on activity if the IP/UA is a spider or robot, BROWSER otherwise | text |
primary_impact | Whether the spider or robot would affect page impression measurement, ad impression measurement, both or none | text |
reason | Type of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwise | text |
spider_or_robot | True if the IP address or user agent checked against the list is a spider or robot, false otherwise | boolean |
useragent_family | Useragent family (browser) name | text |
useragent_major | Useragent major version | text |
useragent_minor | Useragent minor version | text |
useragent_patch | Useragent patch version | text |
useragent_version | Full version of the useragent | text |
os_family | Operating system family e.g. ‘Linux’ | text |
os_major | Operation system major version | text |
os_minor | Operation system minor version | text |
os_patch | Operation system patch version | text |
os_patch_minor | Operation system patch minor version | text |
os_version | Operation system full version | text |
device_family | Device type | text |
device_class | Class of device e.g. phone | text |
device_category | Derived from the device_class it is used to classify devices into one of the following: Desktop / Mobile / Tablet / Other. | text |
screen_resolution | Combines dvce_screenwidth x dvce_screenheight. | text |
agent_class | Class of agent e.g. browser | text |
agent_name | Name of agent e.g. Chrome | text |
agent_name_version | Name and version of agent e.g. Chrome 53.0.2785.124 | text |
agent_name_version_major | Name and major version of agent e.g. Chrome 53 | text |
agent_version | Version of agent e.g. 53.0.2785.124 | text |
agent_version_major | Major version of agent e.g. 53 | text |
device_brand | Brand of device e.g. Google | text |
device_name | Name of device e.g. Google Nexus 6 | text |
device_version | Version of device e.g. 6.0 | text |
layout_engine_class | Class of layout engine e.g. Browser | text |
layout_engine_name | Name of layout engine e.g. Blink | text |
layout_engine_name_version | Name and version of layout engine e.g. Blink 53.0 | text |
layout_engine_name_version_major | Name and major version of layout engine e.g. Blink 53 | text |
layout_engine_version | Version of layout engine e.g. 53.0 | text |
layout_engine_version_major | Major version of layout engine e.g. 53 | text |
operating_system_class | Class of the OS e.g. Mobile | text |
operating_system_name | Name of the OS e.g. Android | text |
operating_system_name_version | Name and version of the OS e.g. Android 7.0 | text |
operating_system_version | Version of the OS e.g. 7.0 | text |
cv_view_page_volume | number | |
cv_view_page_events | array | |
cv_view_page_values | array | |
cv_view_page_total | float | |
cv_view_page_first_conversion | timestamp_ntz | |
cv_view_page_converted | boolean | |
cv__all_volume | number | |
cv__all_total | float | |
event_id | text | |
event_id2 | text |
Code
- bigquery
- databricks
- default
- snowflake
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
with session_firsts as (
select
-- app id
app_id as app_id,
platform,
-- session fields
domain_sessionid,
original_domain_sessionid,
domain_sessionidx,
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
-- user fields
user_id,
domain_userid,
original_domain_userid,
{% if var('snowplow__session_stitching') %}
-- updated with mapping as part of post hook on derived sessions table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ type_string() }}) as stitched_user_id,
{% endif %}
network_userid as network_userid,
-- first page fields
page_title as first_page_title,
page_url as first_page_url,
page_urlscheme as first_page_urlscheme,
page_urlhost as first_page_urlhost,
page_urlpath as first_page_urlpath,
page_urlquery as first_page_urlquery,
page_urlfragment as first_page_urlfragment,
-- referrer fields
page_referrer as referrer,
refr_urlscheme as refr_urlscheme,
refr_urlhost as refr_urlhost,
refr_urlpath as refr_urlpath,
refr_urlquery as refr_urlquery,
refr_urlfragment as refr_urlfragment,
refr_medium as refr_medium,
refr_source as refr_source,
refr_term as refr_term,
-- marketing fields
mkt_medium as mkt_medium,
mkt_source as mkt_source,
mkt_term as mkt_term,
mkt_content as mkt_content,
mkt_campaign as mkt_campaign,
mkt_clickid as mkt_clickid,
mkt_network as mkt_network,
regexp_extract(page_urlquery ,r'utm_source_platform=([^?&#]*)') as mkt_source_platform,
{{ channel_group_query() }} as default_channel_group,
-- geo fields
geo_country as geo_country,
geo_region as geo_region,
geo_region_name as geo_region_name,
geo_city as geo_city,
geo_zipcode as geo_zipcode,
geo_latitude as geo_latitude,
geo_longitude as geo_longitude,
geo_timezone as geo_timezone,
g.name as geo_country_name,
g.region as geo_continent,
-- ip address
user_ipaddress as user_ipaddress,
-- user agent
useragent as useragent,
dvce_screenwidth || 'x' || dvce_screenheight as screen_resolution,
br_renderengine as br_renderengine,
br_lang as br_lang,
l.name as br_lang_name,
os_timezone as os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields: set iab variable to true to enable
{{ snowplow_utils.get_optional_fields(
enabled=var('snowplow__enable_iab', false),
fields=iab_fields(),
col_prefix='contexts_com_iab_snowplow_spiders_and_robots_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='ev') }},
-- ua parser enrichment fields: set ua_parser variable to true to enable
{{ snowplow_utils.get_optional_fields(
enabled=var('snowplow__enable_ua', false),
fields=ua_fields(),
col_prefix='contexts_com_snowplowanalytics_snowplow_ua_parser_context_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='ev') }},
-- yauaa enrichment fields: set yauaa variable to true to enable
{{ snowplow_utils.get_optional_fields(
enabled=var('snowplow__enable_yauaa', false),
fields=yauaa_fields(),
col_prefix='contexts_nl_basjes_yauaa_context_1',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='ev') }},
row_number() over (partition by ev.domain_sessionid order by ev.derived_tstamp, ev.dvce_created_tstamp, ev.event_id) AS page_event_in_session_index,
event_name
{%- if var('snowplow__session_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__session_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(trim(ev.br_lang)) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
event_name in ('page_ping', 'page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
),
session_lasts as (
select
domain_sessionid,
page_title as last_page_title,
page_url as last_page_url,
page_urlscheme as last_page_urlscheme,
page_urlhost as last_page_urlhost,
page_urlpath as last_page_urlpath,
page_urlquery as last_page_urlquery,
page_urlfragment as last_page_urlfragment,
geo_country as last_geo_country,
geo_city as last_geo_city,
geo_region_name as last_geo_region_name,
g.name as last_geo_country_name,
g.region as last_geo_continent,
br_lang as last_br_lang,
l.name as last_br_lang_name,
row_number() over (partition by domain_sessionid order by derived_tstamp desc, dvce_created_tstamp desc, event_id) AS page_event_in_session_index
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(ev.br_lang) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
event_name in ('page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
),
session_aggs as (
select
domain_sessionid
, min(derived_tstamp) as start_tstamp
, max(derived_tstamp) as end_tstamp
{%- if var('snowplow__list_event_counts', false) %}
{% set event_names = dbt_utils.get_column_values(ref('snowplow_web_base_events_this_run'), 'event_name', order_by = 'event_name') %}
{# Loop over every event_name in this run, create a json string of the name and count ONLY if there are events with that name in the session (otherwise empty string),
then trim off the last comma (can't use loop.first/last because first/last entry may not have any events for that session)
#}
, '{' || RTRIM(
{%- for event_name in event_names %}
case when sum(case when event_name = '{{event_name}}' then 1 else 0 end) > 0 then '"{{event_name}}" :' || sum(case when event_name = '{{event_name}}' then 1 else 0 end) || ', ' else '' end ||
{%- endfor -%}
'', ', ') || '}' as event_counts_string
{% endif %}
, count(*) as total_events
-- engagement fields
, count(distinct case when event_name in ('page_ping', 'page_view') and page_view_id is not null then page_view_id else null end) as page_views
-- (hb * (#page pings - # distinct page view ids ON page pings)) + (# distinct page view ids ON page pings * min visit length)
, ({{ var("snowplow__heartbeat", 10) }} * (
-- number of (unqiue in heartbeat increment) pages pings following a page ping (gap of heartbeat)
count(distinct case
when event_name = 'page_ping' and page_view_id is not null then
-- need to get a unique list of floored time PER page view, so create a dummy surrogate key...
{{ dbt.concat(['page_view_id', "cast(floor("~snowplow_utils.to_unixtstamp('dvce_created_tstamp')~"/"~var('snowplow__heartbeat', 10)~") as "~dbt.type_string()~")" ]) }}
else
null end) -
count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end)
)) +
-- number of page pings following a page view (or no event) (gap of min visit length)
(count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end) * {{ var("snowplow__min_visit_length", 5) }}) as engaged_time_in_s
, {{ snowplow_utils.timestamp_diff('min(derived_tstamp)', 'max(derived_tstamp)', 'second') }} as absolute_time_in_s
{%- if var('snowplow__conversion_events', none) %}
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def)}}
{%- endfor %}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }}
where
1 = 1
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
group by
domain_sessionid
)
select
-- app id
a.app_id,
a.platform,
-- session fields
a.domain_sessionid,
a.original_domain_sessionid,
a.domain_sessionidx,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
case when a.event_name = 'page_ping' then
{{ snowplow_utils.timestamp_add(datepart="second", interval=-var("snowplow__min_visit_length", 5), tstamp="c.start_tstamp") }}
else c.start_tstamp end as start_tstamp,
c.end_tstamp,
a.model_tstamp,
-- user fields
a.user_id,
a.domain_userid,
a.original_domain_userid,
a.stitched_user_id,
a.network_userid,
-- engagement fields
c.page_views,
c.engaged_time_in_s,
{%- if var('snowplow__list_event_counts', false) %}
safe.parse_json(c.event_counts_string) as event_counts,
{% endif %}
c.total_events,
{{ engaged_session() }} as is_engaged,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
c.absolute_time_in_s + case when a.event_name = 'page_ping' then {{ var("snowplow__min_visit_length", 5) }} else 0 end as absolute_time_in_s,
-- first page fields
a.first_page_title,
a.first_page_url,
a.first_page_urlscheme,
a.first_page_urlhost,
a.first_page_urlpath,
a.first_page_urlquery,
a.first_page_urlfragment,
-- only take the first value when the last is genuinely missing (base on url as has to always be populated)
case when b.last_page_url is null then coalesce(b.last_page_title, a.first_page_title) else b.last_page_title end as last_page_title,
case when b.last_page_url is null then coalesce(b.last_page_url, a.first_page_url) else b.last_page_url end as last_page_url,
case when b.last_page_url is null then coalesce(b.last_page_urlscheme, a.first_page_urlscheme) else b.last_page_urlscheme end as last_page_urlscheme,
case when b.last_page_url is null then coalesce(b.last_page_urlhost, a.first_page_urlhost) else b.last_page_urlhost end as last_page_urlhost,
case when b.last_page_url is null then coalesce(b.last_page_urlpath, a.first_page_urlpath) else b.last_page_urlpath end as last_page_urlpath,
case when b.last_page_url is null then coalesce(b.last_page_urlquery, a.first_page_urlquery) else b.last_page_urlquery end as last_page_urlquery,
case when b.last_page_url is null then coalesce(b.last_page_urlfragment, a.first_page_urlfragment) else b.last_page_urlfragment end as last_page_urlfragment,
-- referrer fields
a.referrer,
a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,
a.refr_medium,
a.refr_source,
a.refr_term,
-- marketing fields
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.mkt_clickid,
a.mkt_network,
a.mkt_source_platform,
a.default_channel_group,
-- geo fields
a.geo_country,
a.geo_region,
a.geo_zipcode,
a.geo_latitude,
a.geo_longitude,
a.geo_timezone,
a.geo_region_name,
a.geo_city,
a.geo_country_name,
a.geo_continent,
case when b.last_geo_country is null then coalesce(b.last_geo_country, a.geo_country) else b.last_geo_country end as last_geo_country,
case when b.last_geo_country is null then coalesce(b.last_geo_region_name, a.geo_region_name) else b.last_geo_region_name end as last_geo_region_name,
case when b.last_geo_country is null then coalesce(b.last_geo_city, a.geo_city) else b.last_geo_city end as last_geo_city,
case when b.last_geo_country is null then coalesce(b.last_geo_country_name, a.geo_country_name) else b.last_geo_country_name end as last_geo_country_name,
case when b.last_geo_country is null then coalesce(b.last_geo_continent, a.geo_continent) else b.last_geo_continent end as last_geo_continent,
-- ip address
a.user_ipaddress,
-- user agent
a.useragent,
a.br_renderengine,
a.br_lang,
a.br_lang_name,
case when b.last_br_lang is null then coalesce(b.last_br_lang,a.br_lang) else b.last_br_lang end as last_br_lang,
case when b.last_br_lang is null then coalesce(b.last_br_lang_name, a.br_lang_name) else b.last_br_lang_name end as last_br_lang_name,
a.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields
a.category,
a.primary_impact,
a.reason,
a.spider_or_robot,
-- ua parser enrichment fields
a.useragent_family,
a.useragent_major,
a.useragent_minor,
a.useragent_patch,
a.useragent_version,
a.os_family,
a.os_major,
a.os_minor,
a.os_patch,
a.os_patch_minor,
a.os_version,
a.device_family,
-- yauaa enrichment fields
a.device_class,
case when a.device_class = 'Desktop' THEN 'Desktop'
when a.device_class = 'Phone' then 'Mobile'
when a.device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
a.screen_resolution,
a.agent_class,
a.agent_name,
a.agent_name_version,
a.agent_name_version_major,
a.agent_version,
a.agent_version_major,
a.device_brand,
a.device_name,
a.device_version,
a.layout_engine_class,
a.layout_engine_name,
a.layout_engine_name_version,
a.layout_engine_name_version_major,
a.layout_engine_version,
a.layout_engine_version_major,
a.operating_system_class,
a.operating_system_name,
a.operating_system_name_version,
a.operating_system_version
-- conversion fields
{%- if var('snowplow__conversion_events', none) %}
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def, names_only = true)}}
{%- endfor %}
{% if var('snowplow__total_all_conversions', false) %}
,{%- for conv_def in var('snowplow__conversion_events') %}{{'cv_' ~ conv_def['name'] ~ '_volume'}}{%- if not loop.last %} + {% endif -%}{%- endfor %} as cv__all_volume
{# Use 0 in case of no conversions having a value field #}
,0 {%- for conv_def in var('snowplow__conversion_events') %}{%- if conv_def.get('value') %} + {{'cv_' ~ conv_def['name'] ~ '_total'}}{% endif -%}{%- endfor %} as cv__all_total
{% endif %}
{%- endif %}
-- passthrough fields
{%- if var('snowplow__session_passthroughs', []) -%}
{%- for col in passthrough_names %}
, a.{{col}}
{%- endfor -%}
{%- endif %}
from
session_firsts a
left join
session_lasts b on a.domain_sessionid = b.domain_sessionid and b.page_event_in_session_index = 1
left join
session_aggs c on a.domain_sessionid = c.domain_sessionid
where
a.page_event_in_session_index = 1
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
with session_firsts as (
select
-- app id
app_id as app_id,
platform,
-- session fields
domain_sessionid,
original_domain_sessionid,
domain_sessionidx,
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
-- user fields
user_id,
domain_userid,
original_domain_userid,
{% if var('snowplow__session_stitching') %}
-- updated with mapping as part of post hook on derived sessions table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ type_string() }}) as stitched_user_id,
{% endif %}
network_userid as network_userid,
-- first page fields
page_title as first_page_title,
page_url as first_page_url,
page_urlscheme as first_page_urlscheme,
page_urlhost as first_page_urlhost,
page_urlpath as first_page_urlpath,
page_urlquery as first_page_urlquery,
page_urlfragment as first_page_urlfragment,
-- referrer fields
page_referrer as referrer,
refr_urlscheme as refr_urlscheme,
refr_urlhost as refr_urlhost,
refr_urlpath as refr_urlpath,
refr_urlquery as refr_urlquery,
refr_urlfragment as refr_urlfragment,
refr_medium as refr_medium,
refr_source as refr_source,
refr_term as refr_term,
-- marketing fields
mkt_medium as mkt_medium,
mkt_source as mkt_source,
mkt_term as mkt_term,
mkt_content as mkt_content,
mkt_campaign as mkt_campaign,
mkt_clickid as mkt_clickid,
mkt_network as mkt_network,
nullif(regexp_extract(page_urlquery ,r'utm_source_platform=([^?&#]*)'), '') as mkt_source_platform,
{{ channel_group_query() }} as default_channel_group,
-- geo fields
geo_country as geo_country,
geo_region as geo_region,
geo_region_name as geo_region_name,
geo_city as geo_city,
geo_zipcode as geo_zipcode,
geo_latitude as geo_latitude,
geo_longitude as geo_longitude,
geo_timezone as geo_timezone,
g.name as geo_country_name,
g.region as geo_continent,
-- ip address
user_ipaddress as user_ipaddress,
-- user agent
useragent as useragent,
dvce_screenwidth || 'x' || dvce_screenheight as screen_resolution,
br_renderengine as br_renderengine,
br_lang as br_lang,
l.name as br_lang_name,
os_timezone as os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields
{{snowplow_web.get_iab_context_fields()}},
-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields()}},
-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields()}},
event_name
{%- if var('snowplow__session_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__session_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(ev.br_lang) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
event_name in ('page_ping', 'page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
qualify row_number() over (partition by domain_sessionid order by derived_tstamp, dvce_created_tstamp, event_id) = 1
),
session_lasts as (
select
domain_sessionid,
page_title as last_page_title,
page_url as last_page_url,
page_urlscheme as last_page_urlscheme,
page_urlhost as last_page_urlhost,
page_urlpath as last_page_urlpath,
page_urlquery as last_page_urlquery,
page_urlfragment as last_page_urlfragment,
geo_country as last_geo_country,
geo_city as last_geo_city,
geo_region_name as last_geo_region_name,
g.name as last_geo_country_name,
g.region as last_geo_continent,
br_lang as last_br_lang,
l.name as last_br_lang_name
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(ev.br_lang) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
event_name in ('page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
qualify row_number() over (partition by domain_sessionid order by derived_tstamp desc, dvce_created_tstamp desc, event_id) = 1
),
session_aggs as (
select
domain_sessionid
, min(derived_tstamp) as start_tstamp
, max(derived_tstamp) as end_tstamp
{%- if var('snowplow__list_event_counts', false) %}
{% set event_names = dbt_utils.get_column_values(ref('snowplow_web_base_events_this_run'), 'event_name', order_by = 'event_name') %}
{# Loop over every event_name in this run, create a map of the name and count, later filter for only events with that name in the session #}
,map(
{%- for event_name in event_names %}
'{{event_name}}', sum(case when event_name = '{{event_name}}' then 1 else 0 end){% if not loop.last %},{% endif %}
{%- endfor -%}
) as event_counts
{%- endif %}
, count(*) as total_events
-- engagement fields
, count(distinct case when event_name in ('page_ping', 'page_view') and page_view_id is not null then page_view_id else null end) as page_views
-- (hb * (#page pings - # distinct page view ids ON page pings)) + (# distinct page view ids ON page pings * min visit length)
, ({{ var("snowplow__heartbeat", 10) }} * (
-- number of (unqiue in heartbeat increment) pages pings following a page ping (gap of heartbeat)
count(distinct case
when event_name = 'page_ping' and page_view_id is not null then
-- need to get a unique list of floored time PER page view, so create a dummy surrogate key...
{{ dbt.concat(['page_view_id', "cast(floor("~snowplow_utils.to_unixtstamp('dvce_created_tstamp')~"/"~var('snowplow__heartbeat', 10)~") as "~dbt.type_string()~")" ]) }}
else
null end) -
count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end)
)) +
-- number of page pings following a page view (or no event) (gap of min visit length)
(count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end) * {{ var("snowplow__min_visit_length", 5) }}) as engaged_time_in_s
, {{ snowplow_utils.timestamp_diff('min(derived_tstamp)', 'max(derived_tstamp)', 'second') }} as absolute_time_in_s
{%- if var('snowplow__conversion_events', none) %}
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def)}}
{%- endfor %}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }}
where
1 = 1
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
group by
domain_sessionid
)
select
-- app id
a.app_id,
a.platform,
-- session fields
a.domain_sessionid,
a.original_domain_sessionid,
a.domain_sessionidx,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
case when a.event_name = 'page_ping' then
{{ snowplow_utils.timestamp_add(datepart="second", interval=-var("snowplow__min_visit_length", 5), tstamp="c.start_tstamp") }}
else c.start_tstamp end as start_tstamp,
c.end_tstamp,
a.model_tstamp,
-- user fields
a.user_id,
a.domain_userid,
a.original_domain_userid,
a.stitched_user_id,
a.network_userid,
-- engagement fields
c.page_views,
c.engaged_time_in_s,
{%- if var('snowplow__list_event_counts', false) %}
map_filter(c.event_counts, (k, v) -> v > 0) as event_counts,
{%- endif %}
c.total_events,
{{ engaged_session() }} as is_engaged,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
c.absolute_time_in_s + case when a.event_name = 'page_ping' then {{ var("snowplow__min_visit_length", 5) }} else 0 end as absolute_time_in_s,
-- first page fields
a.first_page_title,
a.first_page_url,
a.first_page_urlscheme,
a.first_page_urlhost,
a.first_page_urlpath,
a.first_page_urlquery,
a.first_page_urlfragment,
-- only take the first value when the last is genuinely missing (base on url as has to always be populated)
case when b.last_page_url is null then coalesce(b.last_page_title, a.first_page_title) else b.last_page_title end as last_page_title,
case when b.last_page_url is null then coalesce(b.last_page_url, a.first_page_url) else b.last_page_url end as last_page_url,
case when b.last_page_url is null then coalesce(b.last_page_urlscheme, a.first_page_urlscheme) else b.last_page_urlscheme end as last_page_urlscheme,
case when b.last_page_url is null then coalesce(b.last_page_urlhost, a.first_page_urlhost) else b.last_page_urlhost end as last_page_urlhost,
case when b.last_page_url is null then coalesce(b.last_page_urlpath, a.first_page_urlpath) else b.last_page_urlpath end as last_page_urlpath,
case when b.last_page_url is null then coalesce(b.last_page_urlquery, a.first_page_urlquery) else b.last_page_urlquery end as last_page_urlquery,
case when b.last_page_url is null then coalesce(b.last_page_urlfragment, a.first_page_urlfragment) else b.last_page_urlfragment end as last_page_urlfragment,
-- referrer fields
a.referrer,
a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,
a.refr_medium,
a.refr_source,
a.refr_term,
-- marketing fields
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.mkt_clickid,
a.mkt_network,
a.mkt_source_platform,
a.default_channel_group,
-- geo fields
a.geo_country,
a.geo_region,
a.geo_region_name,
a.geo_city,
a.geo_zipcode,
a.geo_latitude,
a.geo_longitude,
a.geo_timezone,
a.geo_country_name,
a.geo_continent,
case when b.last_geo_country is null then coalesce(b.last_geo_country, a.geo_country) else b.last_geo_country end as last_geo_country,
case when b.last_geo_country is null then coalesce(b.last_geo_region_name, a.geo_region_name) else b.last_geo_region_name end as last_geo_region_name,
case when b.last_geo_country is null then coalesce(b.last_geo_city, a.geo_city) else b.last_geo_city end as last_geo_city,
case when b.last_geo_country is null then coalesce(b.last_geo_country_name,a.geo_country_name) else b.last_geo_country_name end as last_geo_country_name,
case when b.last_geo_country is null then coalesce(b.last_geo_continent, a.geo_continent) else b.last_geo_continent end as last_geo_continent,
-- ip address
a.user_ipaddress,
-- user agent
a.useragent,
a.br_renderengine,
a.br_lang,
a.br_lang_name,
case when b.last_br_lang is null then coalesce(b.last_br_lang, a.br_lang) else b.last_br_lang end as last_br_lang,
case when b.last_br_lang is null then coalesce(b.last_br_lang_name, a.br_lang_name) else b.last_br_lang_name end as last_br_lang_name,
a.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields
a.category,
a.primary_impact,
a.reason,
a.spider_or_robot,
-- ua parser enrichment fields
a.useragent_family,
a.useragent_major,
a.useragent_minor,
a.useragent_patch,
a.useragent_version,
a.os_family,
a.os_major,
a.os_minor,
a.os_patch,
a.os_patch_minor,
a.os_version,
a.device_family,
-- yauaa enrichment fields
a.device_class,
case when a.device_class = 'Desktop' THEN 'Desktop'
when a.device_class = 'Phone' then 'Mobile'
when a.device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
a.screen_resolution,
a.agent_class,
a.agent_name,
a.agent_name_version,
a.agent_name_version_major,
a.agent_version,
a.agent_version_major,
a.device_brand,
a.device_name,
a.device_version,
a.layout_engine_class,
a.layout_engine_name,
a.layout_engine_name_version,
a.layout_engine_name_version_major,
a.layout_engine_version,
a.layout_engine_version_major,
a.operating_system_class,
a.operating_system_name,
a.operating_system_name_version,
a.operating_system_version
-- conversion fields
{%- if var('snowplow__conversion_events', none) %}
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def, names_only = true)}}
{%- endfor %}
{% if var('snowplow__total_all_conversions', false) %}
,{%- for conv_def in var('snowplow__conversion_events') %}{{'cv_' ~ conv_def['name'] ~ '_volume'}}{%- if not loop.last %} + {% endif -%}{%- endfor %} as cv__all_volume
{# Use 0 in case of no conversions having a value field #}
,0 {%- for conv_def in var('snowplow__conversion_events') %}{%- if conv_def.get('value') %} + {{'cv_' ~ conv_def['name'] ~ '_total'}}{% endif -%}{%- endfor %} as cv__all_total
{% endif %}
{%- endif %}
-- passthrough fields
{%- if var('snowplow__session_passthroughs', []) -%}
{%- for col in passthrough_names %}
, a.{{col}}
{%- endfor -%}
{%- endif %}
from
session_firsts a
left join
session_lasts b on a.domain_sessionid = b.domain_sessionid
left join
session_aggs c on a.domain_sessionid = c.domain_sessionid
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"]
)
}}
with session_firsts as (
select
-- app id
app_id as app_id,
platform,
-- session fields
domain_sessionid,
original_domain_sessionid,
domain_sessionidx,
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
-- user fields
user_id,
domain_userid,
original_domain_userid,
{% if var('snowplow__session_stitching') %}
-- updated with mapping as part of post hook on derived sessions table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ snowplow_utils.type_max_string() }}) as stitched_user_id,
{% endif %}
network_userid as network_userid,
-- first page fields
page_title as first_page_title,
page_url as first_page_url,
page_urlscheme as first_page_urlscheme,
page_urlhost as first_page_urlhost,
page_urlpath as first_page_urlpath,
page_urlquery as first_page_urlquery,
page_urlfragment as first_page_urlfragment,
-- referrer fields
page_referrer as referrer,
refr_urlscheme as refr_urlscheme,
refr_urlhost as refr_urlhost,
refr_urlpath as refr_urlpath,
refr_urlquery as refr_urlquery,
refr_urlfragment as refr_urlfragment,
refr_medium as refr_medium,
refr_source as refr_source,
refr_term as refr_term,
-- marketing fields
mkt_medium as mkt_medium,
mkt_source as mkt_source,
mkt_term as mkt_term,
mkt_content as mkt_content,
mkt_campaign as mkt_campaign,
mkt_clickid as mkt_clickid,
mkt_network as mkt_network,
{% if target.type in ['postgres'] %}
(regexp_match(page_urlquery, 'utm_source_platform=([^?&#]*)'))[1] as mkt_source_platform,
{% else %}
nullif(regexp_substr(page_urlquery, 'utm_source_platform=([^?&#]*)', 1, 1, 'e'), '') as mkt_source_platform,
{% endif %}
{{ channel_group_query() }} as default_channel_group,
-- geo fields
geo_country as geo_country,
geo_region as geo_region,
geo_region_name as geo_region_name,
geo_city as geo_city,
geo_zipcode as geo_zipcode,
geo_latitude as geo_latitude,
geo_longitude as geo_longitude,
geo_timezone as geo_timezone,
g.name as geo_country_name,
g.region as geo_continent,
-- ip address
user_ipaddress as user_ipaddress,
-- user agent
useragent as useragent,
dvce_screenwidth || 'x' || dvce_screenheight as screen_resolution,
br_renderengine as br_renderengine,
br_lang as br_lang,
l.name as br_lang_name,
os_timezone as os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields: set iab variable to true to enable
{{snowplow_web.get_iab_context_fields('ev')}},
-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields('ev')}},
-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields('ev')}},
row_number() over (partition by ev.domain_sessionid order by ev.derived_tstamp, ev.dvce_created_tstamp, ev.event_id) AS page_event_in_session_index,
event_name
{%- if var('snowplow__session_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__session_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(ev.br_lang) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
ev.event_name in ('page_ping', 'page_view')
and ev.page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
),
session_lasts as (
select
domain_sessionid,
page_title as last_page_title,
page_url as last_page_url,
page_urlscheme as last_page_urlscheme,
page_urlhost as last_page_urlhost,
page_urlpath as last_page_urlpath,
page_urlquery as last_page_urlquery,
page_urlfragment as last_page_urlfragment,
geo_country as last_geo_country,
geo_city as last_geo_city,
geo_region_name as last_geo_region_name,
g.name as last_geo_country_name,
g.region as last_geo_continent,
br_lang as last_br_lang,
l.name as last_br_lang_name,
row_number() over (partition by ev.domain_sessionid order by ev.derived_tstamp desc, ev.dvce_created_tstamp desc, ev.event_id) AS page_event_in_session_index
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(ev.br_lang) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
event_name in ('page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
),
session_aggs as (
select
domain_sessionid
, min(derived_tstamp) as start_tstamp
, max(derived_tstamp) as end_tstamp
{%- if var('snowplow__list_event_counts', false) %}
{% set event_names = dbt_utils.get_column_values(ref('snowplow_web_base_events_this_run'), 'event_name', order_by = 'event_name') %}
{# Loop over every event_name in this run, create a json string of the name and count ONLY if there are events with that name in the session (otherwise empty string),
then trim off the last comma (can't use loop.first/last because first/last entry may not have any events for that session)
#}
, '{' || rtrim(
{%- for event_name in event_names %}
case when sum(case when event_name = '{{event_name}}' then 1 else 0 end) > 0 then '"{{event_name}}" :' || sum(case when event_name = '{{event_name}}' then 1 else 0 end) || ', ' else '' end ||
{%- endfor -%}
'', ', ') || '}' as event_counts_string
{% endif %}
, count(*) as total_events
-- engagement fields
, count(distinct case when event_name in ('page_ping', 'page_view') and page_view_id is not null then page_view_id else null end) as page_views
-- (hb * (#page pings - # distinct page view ids ON page pings)) + (# distinct page view ids ON page pings * min visit length)
, ({{ var("snowplow__heartbeat", 10) }} * (
-- number of (unqiue in heartbeat increment) pages pings following a page ping (gap of heartbeat)
count(distinct case
when event_name = 'page_ping' and page_view_id is not null then
-- need to get a unique list of floored time PER page view, so create a dummy surrogate key...
{{ dbt.concat(['page_view_id', "cast(floor("~snowplow_utils.to_unixtstamp('dvce_created_tstamp')~"/"~var('snowplow__heartbeat', 10)~") as "~snowplow_utils.type_max_string()~")" ]) }}
else
null end) -
count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end)
)) +
-- number of page pings following a page view (or no event) (gap of min visit length)
(count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end) * {{ var("snowplow__min_visit_length", 5) }}) as engaged_time_in_s
, {{ snowplow_utils.timestamp_diff('min(derived_tstamp)', 'max(derived_tstamp)', 'second') }} as absolute_time_in_s
from {{ ref('snowplow_web_base_events_this_run') }}
where
1 = 1
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
group by
domain_sessionid
)
{# Redshift doesn't allow listagg and other aggregations in the same CTE #}
{%- if var('snowplow__conversion_events', none) %}
,session_convs as (
select
domain_sessionid
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def)}}
{%- endfor %}
from {{ ref('snowplow_web_base_events_this_run') }}
where
1 = 1
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
group by
domain_sessionid
)
{%- endif %}
select
-- app id
a.app_id,
a.platform,
-- session fields
a.domain_sessionid,
a.original_domain_sessionid,
a.domain_sessionidx,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
case when a.event_name = 'page_ping' then
{{ snowplow_utils.timestamp_add(datepart="second", interval=-var("snowplow__min_visit_length", 5), tstamp="c.start_tstamp") }}
else c.start_tstamp end as start_tstamp,
c.end_tstamp,
a.model_tstamp,
-- user fields
a.user_id,
a.domain_userid,
a.original_domain_userid,
a.stitched_user_id,
a.network_userid,
-- engagement fields
c.page_views,
c.engaged_time_in_s,
{%- if var('snowplow__list_event_counts', false) %}
{% if target.type in ['postgres'] %}
cast(event_counts_string as json) as event_counts,
{% elif target.type in ['redshift'] %}
json_parse(event_counts_string) as event_counts,
{% endif %}
{% endif %}
c.total_events,
{{ engaged_session() }} as is_engaged,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
c.absolute_time_in_s + case when a.event_name = 'page_ping' then {{ var("snowplow__min_visit_length", 5) }} else 0 end as absolute_time_in_s,
-- first page fields
a.first_page_title,
a.first_page_url,
a.first_page_urlscheme,
a.first_page_urlhost,
a.first_page_urlpath,
a.first_page_urlquery,
a.first_page_urlfragment,
-- only take the first value when the last is genuinely missing (base on url as has to always be populated)
case when b.last_page_url is null then coalesce(b.last_page_title, a.first_page_title) else b.last_page_title end as last_page_title,
case when b.last_page_url is null then coalesce(b.last_page_url, a.first_page_url) else b.last_page_url end as last_page_url,
case when b.last_page_url is null then coalesce(b.last_page_urlscheme, a.first_page_urlscheme) else b.last_page_urlscheme end as last_page_urlscheme,
case when b.last_page_url is null then coalesce(b.last_page_urlhost, a.first_page_urlhost) else b.last_page_urlhost end as last_page_urlhost,
case when b.last_page_url is null then coalesce(b.last_page_urlpath, a.first_page_urlpath) else b.last_page_urlpath end as last_page_urlpath,
case when b.last_page_url is null then coalesce(b.last_page_urlquery, a.first_page_urlquery) else b.last_page_urlquery end as last_page_urlquery,
case when b.last_page_url is null then coalesce(b.last_page_urlfragment, a.first_page_urlfragment) else b.last_page_urlfragment end as last_page_urlfragment,
-- referrer fields
a.referrer,
a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,
a.refr_medium,
a.refr_source,
a.refr_term,
-- marketing fields
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.mkt_clickid,
a.mkt_network,
a.mkt_source_platform,
a.default_channel_group,
-- geo fields
a.geo_country,
a.geo_region,
a.geo_region_name,
a.geo_city,
a.geo_zipcode,
a.geo_latitude,
a.geo_longitude,
a.geo_timezone,
a.geo_country_name,
a.geo_continent,
case when b.last_geo_country is null then coalesce(b.last_geo_country, a.geo_country) else b.last_geo_country end as last_geo_country,
case when b.last_geo_country is null then coalesce(b.last_geo_region_name, a.geo_region_name) else b.last_geo_region_name end as last_geo_region_name,
case when b.last_geo_country is null then coalesce(b.last_geo_city, a.geo_city) else b.last_geo_city end as last_geo_city,
case when b.last_geo_country is null then coalesce(b.last_geo_country_name,a.geo_country_name) else b.last_geo_country_name end as last_geo_country_name,
case when b.last_geo_country is null then coalesce(b.last_geo_continent, a.geo_continent) else b.last_geo_continent end as last_geo_continent,
-- ip address
a.user_ipaddress,
-- user agent
a.useragent,
a.br_renderengine,
a.br_lang,
a.br_lang_name,
case when b.last_br_lang is null then coalesce(b.last_br_lang, a.br_lang) else b.last_br_lang end as last_br_lang,
case when b.last_br_lang is null then coalesce(b.last_br_lang_name, a.br_lang_name) else b.last_br_lang_name end as last_br_lang_name,
a.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields
a.iab_category as category,
a.iab_primary_impact as primary_impact,
a.iab_reason as reason,
a.iab_spider_or_robot as spider_or_robot,
-- ua parser enrichment fields
a.ua_useragent_family as useragent_family,
a.ua_useragent_major as useragent_major,
a.ua_useragent_minor as useragent_minor,
a.ua_useragent_patch as useragent_patch,
a.ua_useragent_version as useragent_version,
a.ua_os_family as os_family,
a.ua_os_major as os_major,
a.ua_os_minor as os_minor,
a.ua_os_patch as os_patch,
a.ua_os_patch_minor as os_patch_minor,
a.ua_os_version as os_version,
a.ua_device_family as device_family,
-- yauaa enrichment fields
a.yauaa_device_class as device_class,
case when a.yauaa_device_class = 'Desktop' THEN 'Desktop'
when a.yauaa_device_class = 'Phone' then 'Mobile'
when a.yauaa_device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
a.screen_resolution,
a.yauaa_agent_class as agent_class,
a.yauaa_agent_name as agent_name,
a.yauaa_agent_name_version as agent_name_version,
a.yauaa_agent_name_version_major as agent_name_version_major,
a.yauaa_agent_version as agent_version,
a.yauaa_agent_version_major as agent_version_major,
a.yauaa_device_brand as device_brand,
a.yauaa_device_name as device_name,
a.yauaa_device_version as device_version,
a.yauaa_layout_engine_class as layout_engine_class,
a.yauaa_layout_engine_name as layout_engine_name,
a.yauaa_layout_engine_name_version as layout_engine_name_version,
a.yauaa_layout_engine_name_version_major as layout_engine_name_version_major,
a.yauaa_layout_engine_version as layout_engine_version,
a.yauaa_layout_engine_version_major as layout_engine_version_major,
a.yauaa_operating_system_class as operating_system_class,
a.yauaa_operating_system_name as operating_system_name,
a.yauaa_operating_system_name_version as operating_system_name_version,
a.yauaa_operating_system_version as operating_system_version
-- conversion fields
{%- if var('snowplow__conversion_events', none) %}
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def, names_only = true)}}
{%- endfor %}
{% if var('snowplow__total_all_conversions', false) %}
,{%- for conv_def in var('snowplow__conversion_events') %}{{'cv_' ~ conv_def['name'] ~ '_volume'}}{%- if not loop.last %} + {% endif -%}{%- endfor %} as cv__all_volume
{# Use 0 in case of no conversions having a value field #}
,0 {%- for conv_def in var('snowplow__conversion_events') %}{%- if conv_def.get('value') %} + {{'cv_' ~ conv_def['name'] ~ '_total'}}{% endif -%}{%- endfor %} as cv__all_total
{% endif %}
{%- endif %}
-- passthrough fields
{%- if var('snowplow__session_passthroughs', []) -%}
{%- for col in passthrough_names %}
, a.{{col}}
{%- endfor -%}
{%- endif %}
from
session_firsts a
left join
session_lasts b on a.domain_sessionid = b.domain_sessionid and b.page_event_in_session_index = 1
left join
session_aggs c on a.domain_sessionid = c.domain_sessionid
{%- if var('snowplow__conversion_events', none) %}
left join
session_convs d on a.domain_sessionid = d.domain_sessionid
{%- endif %}
where
a.page_event_in_session_index = 1
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with session_firsts as (
select
-- app id
app_id as app_id,
platform,
-- session fields
domain_sessionid,
original_domain_sessionid,
domain_sessionidx,
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
-- user fields
user_id,
domain_userid,
original_domain_userid,
{% if var('snowplow__session_stitching') %}
-- updated with mapping as part of post hook on derived sessions table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ type_string() }}) as stitched_user_id,
{% endif %}
network_userid as network_userid,
-- first page fields
page_title as first_page_title,
page_url as first_page_url,
page_urlscheme as first_page_urlscheme,
page_urlhost as first_page_urlhost,
page_urlpath as first_page_urlpath,
page_urlquery as first_page_urlquery,
page_urlfragment as first_page_urlfragment,
-- referrer fields
page_referrer as referrer,
refr_urlscheme as refr_urlscheme,
refr_urlhost as refr_urlhost,
refr_urlpath as refr_urlpath,
refr_urlquery as refr_urlquery,
refr_urlfragment as refr_urlfragment,
refr_medium as refr_medium,
refr_source as refr_source,
refr_term as refr_term,
-- marketing fields
mkt_medium as mkt_medium,
mkt_source as mkt_source,
mkt_term as mkt_term,
mkt_content as mkt_content,
mkt_campaign as mkt_campaign,
mkt_clickid as mkt_clickid,
mkt_network as mkt_network,
regexp_substr(page_urlquery, 'utm_source_platform=([^?&#]*)', 1, 1, 'e') as mkt_source_platform,
{{ channel_group_query() }} as default_channel_group,
-- geo fields
geo_country as geo_country,
geo_region as geo_region,
geo_region_name as geo_region_name,
geo_city as geo_city,
geo_zipcode as geo_zipcode,
geo_latitude as geo_latitude,
geo_longitude as geo_longitude,
geo_timezone as geo_timezone,
g.name as geo_country_name,
g.region as geo_continent,
-- ip address
user_ipaddress as user_ipaddress,
-- user agent
useragent as useragent,
dvce_screenwidth || 'x' || dvce_screenheight as screen_resolution,
br_renderengine as br_renderengine,
br_lang as br_lang,
l.name as br_lang_name,
os_timezone as os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields
{{snowplow_web.get_iab_context_fields()}},
-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields()}},
-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields()}},
-- event name for use later
event_name
{%- if var('snowplow__session_passthroughs', []) -%}
{%- set passthrough_names = [] -%}
{%- for identifier in var('snowplow__session_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- do passthrough_names.append(identifier['alias']) -%}
{%- else -%}
,ev.{{identifier}}
{%- do passthrough_names.append(identifier) -%}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__ga4_categories_seed')) }} c on lower(trim(ev.mkt_source)) = lower(c.source)
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(ev.br_lang) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
event_name in ('page_ping', 'page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
qualify row_number() over (partition by domain_sessionid order by derived_tstamp, dvce_created_tstamp, event_id) = 1
)
, session_lasts as (
select
domain_sessionid,
page_title as last_page_title,
page_url as last_page_url,
page_urlscheme as last_page_urlscheme,
page_urlhost as last_page_urlhost,
page_urlpath as last_page_urlpath,
page_urlquery as last_page_urlquery,
page_urlfragment as last_page_urlfragment,
geo_country as last_geo_country,
geo_city as last_geo_city,
geo_region_name as last_geo_region_name,
g.name as last_geo_country_name,
g.region as last_geo_continent,
br_lang as last_br_lang,
l.name as last_br_lang_name
from {{ ref('snowplow_web_base_events_this_run') }} ev
left join
{{ ref(var('snowplow__rfc_5646_seed')) }} l on lower(ev.br_lang) = lower(l.lang_tag)
left join
{{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(ev.geo_country) = lower(g.alpha_2)
where
event_name = 'page_view'
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
qualify row_number() over (partition by domain_sessionid order by derived_tstamp desc, dvce_created_tstamp desc, event_id) = 1
)
, session_aggs as (
select
domain_sessionid
, min(derived_tstamp) as start_tstamp
, max(derived_tstamp) as end_tstamp
{%- if var('snowplow__list_event_counts', false) %}
{% set event_names = dbt_utils.get_column_values(ref('snowplow_web_base_events_this_run'), 'event_name', order_by = 'event_name') %}
{# Loop over every event_name in this run, create a json string of the name and count ONLY if there are events with that name in the session (otherwise empty string),
then trim off the last comma (can't use loop.first/last because first/last entry may not have any events for that session)
#}
, '{' || rtrim(
{%- for event_name in event_names %}
case when sum(case when event_name = '{{event_name}}' then 1 else 0 end) > 0 then '"{{event_name}}" :' || sum(case when event_name = '{{event_name}}' then 1 else 0 end) || ', ' else '' end ||
{%- endfor -%}
'', ', ') || '}' as event_counts_string
{%- endif %}
, count(*) as total_events
-- engagement fields
, count(distinct case when event_name in ('page_ping', 'page_view') and page_view_id is not null then page_view_id else null end) as page_views
-- (hb * (#page pings - # distinct page view ids ON page pings)) + (# distinct page view ids ON page pings * min visit length)
, ({{ var("snowplow__heartbeat", 10) }} * (
-- number of (unqiue in heartbeat increment) pages pings following a page ping (gap of heartbeat)
count(distinct case
when event_name = 'page_ping' and page_view_id is not null then
-- need to get a unique list of floored time PER page view, so create a dummy surrogate key...
{{ dbt.concat(['page_view_id', "cast(floor("~snowplow_utils.to_unixtstamp('dvce_created_tstamp')~"/"~var('snowplow__heartbeat', 10)~") as "~dbt.type_string()~")" ]) }}
else
null end) -
count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end)
)) +
-- number of page pings following a page view (or no event) (gap of min visit length)
(count(distinct case when event_name = 'page_ping' and page_view_id is not null then page_view_id else null end) * {{ var("snowplow__min_visit_length", 5) }}) as engaged_time_in_s
, {{ snowplow_utils.timestamp_diff('min(derived_tstamp)', 'max(derived_tstamp)', 'second') }} as absolute_time_in_s
{%- if var('snowplow__conversion_events', none) %}
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def)}}
{%- endfor %}
{%- endif %}
from {{ ref('snowplow_web_base_events_this_run') }}
where
1 = 1
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
group by
domain_sessionid
)
select
-- app id
a.app_id,
a.platform,
-- session fields
a.domain_sessionid,
a.original_domain_sessionid,
a.domain_sessionidx,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
case when a.event_name = 'page_ping' then
{{ snowplow_utils.timestamp_add(datepart="second", interval=-var("snowplow__min_visit_length", 5), tstamp="c.start_tstamp") }}
else c.start_tstamp end as start_tstamp,
c.end_tstamp,
a.model_tstamp,
-- user fields
a.user_id,
a.domain_userid,
a.original_domain_userid,
a.stitched_user_id,
a.network_userid,
-- engagement fields
c.page_views,
c.engaged_time_in_s,
{%- if var('snowplow__list_event_counts', false) %}
try_parse_json(c.event_counts_string) as event_counts,
{%- endif %}
c.total_events,
{{ engaged_session() }} as is_engaged,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
c.absolute_time_in_s + case when a.event_name = 'page_ping' then {{ var("snowplow__min_visit_length", 5) }} else 0 end as absolute_time_in_s,
-- first page fields
a.first_page_title,
a.first_page_url,
a.first_page_urlscheme,
a.first_page_urlhost,
a.first_page_urlpath,
a.first_page_urlquery,
a.first_page_urlfragment,
-- only take the first value when the last is genuinely missing (base on url as has to always be populated)
case when b.last_page_url is null then coalesce(b.last_page_title, a.first_page_title) else b.last_page_title end as last_page_title,
case when b.last_page_url is null then coalesce(b.last_page_url, a.first_page_url) else b.last_page_url end as last_page_url,
case when b.last_page_url is null then coalesce(b.last_page_urlscheme, a.first_page_urlscheme) else b.last_page_urlscheme end as last_page_urlscheme,
case when b.last_page_url is null then coalesce(b.last_page_urlhost, a.first_page_urlhost) else b.last_page_urlhost end as last_page_urlhost,
case when b.last_page_url is null then coalesce(b.last_page_urlpath, a.first_page_urlpath) else b.last_page_urlpath end as last_page_urlpath,
case when b.last_page_url is null then coalesce(b.last_page_urlquery, a.first_page_urlquery) else b.last_page_urlquery end as last_page_urlquery,
case when b.last_page_url is null then coalesce(b.last_page_urlfragment, a.first_page_urlfragment) else b.last_page_urlfragment end as last_page_urlfragment,
-- referrer fields
a.referrer,
a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,
a.refr_medium,
a.refr_source,
a.refr_term,
-- marketing fields
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.mkt_clickid,
a.mkt_network,
a.mkt_source_platform,
a.default_channel_group,
-- geo fields
a.geo_country,
a.geo_region,
a.geo_region_name,
a.geo_city,
a.geo_zipcode,
a.geo_latitude,
a.geo_longitude,
a.geo_timezone,
a.geo_country_name,
a.geo_continent,
case when b.last_geo_country is null then coalesce(b.last_geo_country, a.geo_country) else b.last_geo_country end as last_geo_country,
case when b.last_geo_country is null then coalesce(b.last_geo_region_name, a.geo_region_name) else b.last_geo_region_name end as last_geo_region_name,
case when b.last_geo_country is null then coalesce(b.last_geo_city, a.geo_city) else b.last_geo_city end as last_geo_city,
case when b.last_geo_country is null then coalesce(b.last_geo_country_name,a.geo_country_name) else b.last_geo_country_name end as last_geo_country_name,
case when b.last_geo_country is null then coalesce(b.last_geo_continent, a.geo_continent) else b.last_geo_continent end as last_geo_continent,
-- ip address
a.user_ipaddress,
-- user agent
a.useragent,
a.br_renderengine,
a.br_lang,
a.br_lang_name,
case when b.last_br_lang is null then coalesce(b.last_br_lang, a.br_lang) else b.last_br_lang end as last_br_lang,
case when b.last_br_lang is null then coalesce(b.last_br_lang_name, a.br_lang_name) else b.last_br_lang_name end as last_br_lang_name,
a.os_timezone,
-- optional fields, only populated if enabled.
-- iab enrichment fields
a.category,
a.primary_impact,
a.reason,
a.spider_or_robot,
-- ua parser enrichment fields
a.useragent_family,
a.useragent_major,
a.useragent_minor,
a.useragent_patch,
a.useragent_version,
a.os_family,
a.os_major,
a.os_minor,
a.os_patch,
a.os_patch_minor,
a.os_version,
a.device_family,
-- yauaa enrichment fields
a.device_class,
case when a.device_class = 'Desktop' THEN 'Desktop'
when a.device_class = 'Phone' then 'Mobile'
when a.device_class = 'Tablet' then 'Tablet'
else 'Other' end as device_category,
a.screen_resolution,
a.agent_class,
a.agent_name,
a.agent_name_version,
a.agent_name_version_major,
a.agent_version,
a.agent_version_major,
a.device_brand,
a.device_name,
a.device_version,
a.layout_engine_class,
a.layout_engine_name,
a.layout_engine_name_version,
a.layout_engine_name_version_major,
a.layout_engine_version,
a.layout_engine_version_major,
a.operating_system_class,
a.operating_system_name,
a.operating_system_name_version,
a.operating_system_version
-- conversion fields
{%- if var('snowplow__conversion_events', none) %}
{%- for conv_def in var('snowplow__conversion_events') %}
{{ snowplow_web.get_conversion_columns(conv_def, names_only = true)}}
{%- endfor %}
{% if var('snowplow__total_all_conversions', false) %}
,{%- for conv_def in var('snowplow__conversion_events') %}{{'cv_' ~ conv_def['name'] ~ '_volume'}}{%- if not loop.last %} + {% endif -%}{%- endfor %} as cv__all_volume
{# Use 0 in case of no conversions having a value field #}
,0 {%- for conv_def in var('snowplow__conversion_events') %}{%- if conv_def.get('value') %} + {{'cv_' ~ conv_def['name'] ~ '_total'}}{% endif -%}{%- endfor %} as cv__all_total
{% endif %}
{%- endif %}
-- passthrough fields
{%- if var('snowplow__session_passthroughs', []) -%}
{%- for col in passthrough_names %}
, a.{{col}}
{%- endfor -%}
{%- endif %}
from
session_firsts a
left join
session_lasts b on a.domain_sessionid = b.domain_sessionid
left join
session_aggs c on a.domain_sessionid = c.domain_sessionid
Depends On
- Models
- Macros
- macro.dbt.concat
- macro.dbt.type_string
- macro.dbt_utils.get_column_values
- macro.snowplow_utils.current_timestamp_in_utc
- macro.snowplow_utils.get_optional_fields
- macro.snowplow_utils.set_query_tag
- macro.snowplow_utils.timestamp_add
- macro.snowplow_utils.timestamp_diff
- macro.snowplow_utils.to_unixtstamp
- macro.snowplow_utils.type_max_string
- macro.snowplow_web.channel_group_query
- macro.snowplow_web.engaged_session
- macro.snowplow_web.filter_bots
- macro.snowplow_web.get_conversion_columns
- macro.snowplow_web.get_iab_context_fields
- macro.snowplow_web.get_ua_context_fields
- macro.snowplow_web.get_yauaa_context_fields
- macro.snowplow_web.iab_fields
- macro.snowplow_web.ua_fields
- macro.snowplow_web.yauaa_fields
Referenced By
Snowplow Web User Mapping
models/user_mapping/snowplow_web_user_mapping.sql
Description
A mapping table between domain_userid
and user_id
.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
end_tstamp | The collector_tstamp when the user was last active | timestamp_ntz |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
unique_key='domain_userid',
sort='end_tstamp',
dist='domain_userid',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "end_tstamp",
"data_type": "timestamp"
}),
tags=["derived"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
select distinct
domain_userid,
last_value({{ var('snowplow__user_stitching_id', 'user_id') }}) over(
partition by domain_userid
order by collector_tstamp
rows between unbounded preceding and unbounded following
) as user_id,
max(collector_tstamp) over (partition by domain_userid) as end_tstamp
from {{ ref('snowplow_web_base_events_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
and {{ var('snowplow__user_stitching_id', 'user_id') }} is not null
and domain_userid is not null
Depends On
- Models
- Macros
Referenced By
Snowplow Web Users
models/users/snowplow_web_users.sql
Description
This derived incremental table contains all historic users data and should be the end point for any analysis or BI tools.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
network_userid | User ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’ | text |
start_tstamp | Timestamp for the start of the users lifecycle, based on derived_tstamp | timestamp_ntz |
end_tstamp | Timestamp for the last time the user was seen, based on derived_tstamp | timestamp_ntz |
model_tstamp | The current timestamp when the model processed this row. | timestamp_ntz |
page_views | The total page views by the user | number |
sessions | The total sessions by the user | number |
engaged_time_in_s | The total engaged time in seconds by the user | number |
first_page_title | The title of the first page visited by the user | text |
first_page_url | The url of the first page visited by the user | text |
first_page_urlscheme | The urlscheme of the first page visited by the user | text |
first_page_urlhost | The urlhost of the first page visited by the user | text |
first_page_urlpath | The urlpath of the first page visited by the user | text |
first_page_urlquery | The urlquery of the first page visited by the user | text |
first_page_urlfragment | The urlfragment of the first page visited by the user | text |
first_geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
first_geo_country_name | Name of the country the visitor is located in | text |
first_geo_continent | Name of the continent the visitor is located in | text |
first_geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
first_geo_region_name | Visitor region name e.g. ‘Florida’ | text |
first_br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
first_br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
last_page_title | The title of the last page visited by the user | text |
last_page_url | The url of the last page visited by the user | text |
last_page_urlscheme | The urlscheme of the last page visited by the user | text |
last_page_urlhost | The urlhost of the last page visited by the user | text |
last_page_urlpath | The urlpath of the last page visited by the user | text |
last_page_urlquery | The urlquery of the last page visited by the user | text |
last_page_urlfragment | The urlfragment of the last page visited by the user | text |
last_geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
last_geo_country_name | Name of the country the visitor is located in | text |
last_geo_continent | Name of the continent the visitor is located in | text |
last_geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
last_geo_region_name | Visitor region name e.g. ‘Florida’ | text |
last_br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
last_br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
referrer | The referrer associated with the first page view of the user | text |
refr_urlscheme | Referer scheme e.g. ‘http’ | text |
refr_urlhost | Referer host e.g. ‘www.bing.com’ | text |
refr_urlpath | Referer page path e.g. ‘/images/search’ | text |
refr_urlquery | Referer URL querystring e.g. ‘q=psychic+oracle+cards’ | text |
refr_urlfragment | Referer URL fragment | text |
refr_medium | Type of referer e.g. ‘search’, ‘internal’ | text |
refr_source | Name of referer if recognised e.g. ‘Bing images’ | text |
refr_term | Keywords if source is a search engine e.g. ‘psychic oracle cards’ | text |
mkt_medium | Type of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’ | text |
mkt_source | The company / website where the traffic came from e.g. ‘Google’, ‘Facebook’ | text |
mkt_term | Any keywords associated with the referrer e.g. ‘new age tarot decks’ | text |
mkt_content | The content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723 | text |
mkt_campaign | The campaign ID e.g. ‘diageo-123’ | text |
mkt_clickid | The click ID e.g. ‘ac3d8e459’ | text |
mkt_network | The ad network to which the click ID belongs e.g. ‘DoubleClick’ | text |
mkt_source_platform | Source platform based off the utm_source_platform parameter of the first page_url in the session. | text |
default_channel_group | The channels by which users arrived at your site. | text |
first_event_id | text | |
first_event_id2 | text | |
last_event_id | text | |
last_event_id2 | text |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='incremental',
on_schema_change='append_new_columns',
unique_key='domain_userid',
upsert_date_key='start_tstamp',
disable_upsert_lookback=true,
sort='start_tstamp',
dist='domain_userid',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_users(),
tags=["derived"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}
select *
{% if target.type in ['databricks', 'spark'] -%}
, DATE(start_tstamp) as start_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_users_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
Depends On
- Models
- Macros
Snowplow Web Users Aggs
models/users/scratch/snowplow_web_users_aggs.sql
Description
This model aggregates various metrics derived from sessions to a users level.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
start_tstamp | timestamp_ntz | |
end_tstamp | timestamp_ntz | |
first_domain_sessionid | text | |
last_domain_sessionid | text | |
page_views | number | |
sessions | number | |
engaged_time_in_s | number |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}),
cluster_by=snowplow_utils.get_value_by_target_type(bigquery_val=["domain_userid"]),
sort='domain_userid',
dist='domain_userid',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
select
domain_userid,
-- time
user_start_tstamp as start_tstamp,
user_end_tstamp as end_tstamp,
-- first/last session. Max to resolve edge case with multiple sessions with the same start/end tstamp
max(case when start_tstamp = user_start_tstamp then domain_sessionid end) as first_domain_sessionid,
max(case when end_tstamp = user_end_tstamp then domain_sessionid end) as last_domain_sessionid,
-- engagement
sum(page_views) as page_views,
count(distinct domain_sessionid) as sessions,
sum(engaged_time_in_s) as engaged_time_in_s
from {{ ref('snowplow_web_users_sessions_this_run') }}
group by 1,2,3
Depends On
- Models
- Macros
Referenced By
Snowplow Web Users Lasts
models/users/scratch/snowplow_web_users_lasts.sql
Description
This model identifies the last page view for a user and returns various dimensions associated with that page view.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
last_page_title | text | |
last_page_url | text | |
last_page_urlscheme | text | |
last_page_urlhost | text | |
last_page_urlpath | text | |
last_page_urlquery | text | |
last_page_urlfragment | text | |
last_geo_country | text | |
last_geo_country_name | text | |
last_geo_continent | text | |
last_geo_city | text | |
last_geo_region_name | text | |
last_br_lang | text | |
last_br_lang_name | text | |
last_event_id | text | |
last_event_id2 | text |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
select
a.domain_userid,
a.last_page_title,
a.last_page_url,
a.last_page_urlscheme,
a.last_page_urlhost,
a.last_page_urlpath,
a.last_page_urlquery,
a.last_page_urlfragment,
a.last_geo_country,
a.last_geo_country_name,
a.last_geo_continent,
a.last_geo_city,
a.last_geo_region_name,
a.last_br_lang,
a.last_br_lang_name
{%- if var('snowplow__user_last_passthroughs', []) -%}
{%- for identifier in var('snowplow__user_last_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- else -%}
,a.{{identifier}} as last_{{identifier}}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_users_sessions_this_run') }} a
inner join {{ ref('snowplow_web_users_aggs') }} b
on a.domain_sessionid = b.last_domain_sessionid
Depends On
- Models
- Macros
Referenced By
Snowplow Web Users Sessions This Run
models/users/scratch/snowplow_web_users_sessions_this_run.sql
Description
This model contains all sessions data related to users contained in the given run of the Web model
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
app_id | text | |
platform | text | |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ | text |
original_domain_sessionid | text | |
domain_sessionidx | number | |
start_tstamp | timestamp_ntz | |
end_tstamp | timestamp_ntz | |
model_tstamp | timestamp_ntz | |
user_id | text | |
domain_userid | text | |
original_domain_userid | text | |
stitched_user_id | text | |
network_userid | text | |
page_views | number | |
engaged_time_in_s | number | |
event_counts | variant | |
total_events | number | |
is_engaged | boolean | |
absolute_time_in_s | number | |
first_page_title | text | |
first_page_url | text | |
first_page_urlscheme | text | |
first_page_urlhost | text | |
first_page_urlpath | text | |
first_page_urlquery | text | |
first_page_urlfragment | text | |
last_page_title | text | |
last_page_url | text | |
last_page_urlscheme | text | |
last_page_urlhost | text | |
last_page_urlpath | text | |
last_page_urlquery | text | |
last_page_urlfragment | text | |
referrer | text | |
refr_urlscheme | text | |
refr_urlhost | text | |
refr_urlpath | text | |
refr_urlquery | text | |
refr_urlfragment | text | |
refr_medium | text | |
refr_source | text | |
refr_term | text | |
mkt_medium | text | |
mkt_source | text | |
mkt_term | text | |
mkt_content | text | |
mkt_campaign | text | |
mkt_clickid | text | |
mkt_network | text | |
mkt_source_platform | text | |
default_channel_group | text | |
geo_country | text | |
geo_region | text | |
geo_region_name | text | |
geo_city | text | |
geo_zipcode | text | |
geo_latitude | float | |
geo_longitude | float | |
geo_timezone | text | |
geo_country_name | text | |
geo_continent | text | |
last_geo_country | text | |
last_geo_region_name | text | |
last_geo_city | text | |
last_geo_country_name | text | |
last_geo_continent | text | |
user_ipaddress | text | |
useragent | text | |
br_renderengine | text | |
br_lang | text | |
br_lang_name | text | |
last_br_lang | text | |
last_br_lang_name | text | |
os_timezone | text | |
category | text | |
primary_impact | text | |
reason | text | |
spider_or_robot | boolean | |
useragent_family | text | |
useragent_major | text | |
useragent_minor | text | |
useragent_patch | text | |
useragent_version | text | |
os_family | text | |
os_major | text | |
os_minor | text | |
os_patch | text | |
os_patch_minor | text | |
os_version | text | |
device_family | text | |
device_class | text | |
device_category | text | |
screen_resolution | text | |
agent_class | text | |
agent_name | text | |
agent_name_version | text | |
agent_name_version_major | text | |
agent_version | text | |
agent_version_major | text | |
device_brand | text | |
device_name | text | |
device_version | text | |
layout_engine_class | text | |
layout_engine_name | text | |
layout_engine_name_version | text | |
layout_engine_name_version_major | text | |
layout_engine_version | text | |
layout_engine_version_major | text | |
operating_system_class | text | |
operating_system_name | text | |
operating_system_name_version | text | |
operating_system_version | text | |
cv_view_page_volume | number | |
cv_view_page_events | array | |
cv_view_page_values | array | |
cv_view_page_total | float | |
cv_view_page_first_conversion | timestamp_ntz | |
cv_view_page_converted | boolean | |
cv__all_volume | number | |
cv__all_total | float | |
event_id | text | |
event_id2 | text | |
user_start_tstamp | timestamp_ntz | |
user_end_tstamp | timestamp_ntz |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
select
a.*,
min(a.start_tstamp) over(partition by a.domain_userid) as user_start_tstamp,
max(a.end_tstamp) over(partition by a.domain_userid) as user_end_tstamp
from {{ var('snowplow__sessions_table') }} a
where exists (select 1 from {{ ref('snowplow_web_base_sessions_this_run') }} b where a.domain_userid = b.user_identifier)
Depends On
- Models
- Macros
Referenced By
Snowplow Web Users This Run
models/users/scratch/snowplow_web_users_this_run.sql
Description
This staging table contains all the users for the given run of the Web model. It possess all the same columns as snowplow_web_users
. If building a custom module that requires session level data, this is the table you should reference.
Type: Table
Details
Columns
Column Name | Description | Type |
---|---|---|
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ | text |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ | text |
network_userid | User ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’ | text |
start_tstamp | Timestamp for the start of the users lifecycle, based on derived_tstamp | timestamp_ntz |
end_tstamp | Timestamp for the last time the user was seen, based on derived_tstamp | timestamp_ntz |
model_tstamp | The current timestamp when the model processed this row. | timestamp_ntz |
page_views | The total page views by the user | number |
sessions | The total sessions by the user | number |
engaged_time_in_s | The total engaged time in seconds by the user | number |
first_page_title | The title of the first page visited by the user | text |
first_page_url | The url of the first page visited by the user | text |
first_page_urlscheme | The urlscheme of the first page visited by the user | text |
first_page_urlhost | The urlhost of the first page visited by the user | text |
first_page_urlpath | The urlpath of the first page visited by the user | text |
first_page_urlquery | The urlquery of the first page visited by the user | text |
first_page_urlfragment | The urlfragment of the first page visited by the user | text |
first_geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
first_geo_country_name | Name of the country the visitor is located in | text |
first_geo_continent | Name of the continent the visitor is located in | text |
first_geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
first_geo_region_name | Visitor region name e.g. ‘Florida’ | text |
first_br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
first_br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
last_page_title | The title of the last page visited by the user | text |
last_page_url | The url of the last page visited by the user | text |
last_page_urlscheme | The urlscheme of the last page visited by the user | text |
last_page_urlhost | The urlhost of the last page visited by the user | text |
last_page_urlpath | The urlpath of the last page visited by the user | text |
last_page_urlquery | The urlquery of the last page visited by the user | text |
last_page_urlfragment | The urlfragment of the last page visited by the user | text |
last_geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ | text |
last_geo_country_name | Name of the country the visitor is located in | text |
last_geo_continent | Name of the continent the visitor is located in | text |
last_geo_city | City the visitor is in e.g. ‘New York’, ‘London’ | text |
last_geo_region_name | Visitor region name e.g. ‘Florida’ | text |
last_br_lang | Language the browser is set to e.g. ‘en-GB’ | text |
last_br_lang_name | Full name of the language the browser is set to e.g. ‘English (United Kingdom)’ | text |
referrer | The referrer associated with the first page view of the user | text |
refr_urlscheme | Referer scheme e.g. ‘http’ | text |
refr_urlhost | Referer host e.g. ‘www.bing.com’ | text |
refr_urlpath | Referer page path e.g. ‘/images/search’ | text |
refr_urlquery | Referer URL querystring e.g. ‘q=psychic+oracle+cards’ | text |
refr_urlfragment | Referer URL fragment | text |
refr_medium | Type of referer e.g. ‘search’, ‘internal’ | text |
refr_source | Name of referer if recognised e.g. ‘Bing images’ | text |
refr_term | Keywords if source is a search engine e.g. ‘psychic oracle cards’ | text |
mkt_medium | Type of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’ | text |
mkt_source | The company / website where the traffic came from e.g. ‘Google’, ‘Facebook’ | text |
mkt_term | Any keywords associated with the referrer e.g. ‘new age tarot decks’ | text |
mkt_content | The content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723 | text |
mkt_campaign | The campaign ID e.g. ‘diageo-123’ | text |
mkt_clickid | The click ID e.g. ‘ac3d8e459’ | text |
mkt_network | The ad network to which the click ID belongs e.g. ‘DoubleClick’ | text |
mkt_source_platform | Source platform based off the utm_source_platform parameter of the first page_url in the session. | text |
default_channel_group | The channels by which users arrived at your site. | text |
first_event_id | text | |
first_event_id2 | text | |
last_event_id | text | |
last_event_id2 | text |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
select
-- user fields
a.user_id,
a.domain_userid,
a.original_domain_userid,
a.network_userid,
b.start_tstamp,
b.end_tstamp,
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,
-- engagement fields
b.page_views,
b.sessions,
b.engaged_time_in_s,
-- first page fields
a.first_page_title,
a.first_page_url,
a.first_page_urlscheme,
a.first_page_urlhost,
a.first_page_urlpath,
a.first_page_urlquery,
a.first_page_urlfragment,
a.geo_country as first_geo_country,
a.geo_country_name as first_geo_country_name,
a.geo_continent as first_geo_continent,
a.geo_city as first_geo_city,
a.geo_region_name as first_geo_region_name,
a.br_lang as first_br_lang,
a.br_lang_name as first_br_lang_name,
c.last_page_title,
c.last_page_url,
c.last_page_urlscheme,
c.last_page_urlhost,
c.last_page_urlpath,
c.last_page_urlquery,
c.last_page_urlfragment,
c.last_geo_country,
c.last_geo_country_name,
c.last_geo_continent,
c.last_geo_city,
c.last_geo_region_name,
c.last_br_lang,
c.last_br_lang_name,
-- referrer fields
a.referrer,
a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,
a.refr_medium,
a.refr_source,
a.refr_term,
-- marketing fields
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.mkt_clickid,
a.mkt_network,
a.mkt_source_platform,
a.default_channel_group
{%- if var('snowplow__user_first_passthroughs', []) -%}
{%- for identifier in var('snowplow__user_first_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,{{identifier['sql']}} as {{identifier['alias']}}
{%- else -%}
,a.{{identifier}} as first_{{identifier}}
{%- endif -%}
{% endfor -%}
{%- endif %}
{%- if var('snowplow__user_last_passthroughs', []) -%}
{%- for identifier in var('snowplow__user_last_passthroughs', []) %}
{# Check if it's a simple column or a sql+alias #}
{%- if identifier is mapping -%}
,c.{{identifier['alias']}}
{%- else -%}
,c.last_{{identifier}}
{%- endif -%}
{% endfor -%}
{%- endif %}
from {{ ref('snowplow_web_users_aggs') }} as b
inner join {{ ref('snowplow_web_users_sessions_this_run') }} as a
on a.domain_sessionid = b.first_domain_sessionid
inner join {{ ref('snowplow_web_users_lasts') }} c
on b.domain_userid = c.domain_userid
Depends On
- Models
- Macros
Referenced By
Snowplow Web Vital Events This Run
models/optional_modules/core_web_vitals/scratch/<adaptor>/snowplow_web_vital_events_this_run.sql
Description
An upstream scratch table extracting all the relevant fields that could be used to model core web vital metrics.
File Paths
- bigquery
- databricks
- default
- snowflake
models/optional_modules/core_web_vitals/scratch/bigquery/snowplow_web_vital_events_this_run.sql
models/optional_modules/core_web_vitals/scratch/databricks/snowplow_web_vital_events_this_run.sql
models/optional_modules/core_web_vitals/scratch/default/snowplow_web_vital_events_this_run.sql
models/optional_modules/core_web_vitals/scratch/snowflake/snowplow_web_vital_events_this_run.sql
Details
Columns
Column Name | Description |
---|---|
event_id | A UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
event_name | Event name e.g. ‘link_click’ |
app_id | Application ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev. |
platform | Platform e.g. ‘web’ |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ |
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
collector_tstamp | Time stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’ |
derived_tstamp | Timestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’ |
dvce_created_tstamp | Timestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’ |
load_tstamp | The timestamp of the event landing the data warehouse. |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ |
page_url | The page URL e.g. ‘http://www.example.com’ |
page_title | Web page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’ |
useragent | Raw useragent |
device_class | Class of device e.g. phone |
agent_class | Class of agent e.g. browser |
agent_name | Name of agent e.g. Chrome |
agent_name_version | Name and version of agent e.g. Chrome 53.0.2785.124 |
agent_name_version_major | Name and major version of agent e.g. Chrome 53 |
agent_version | Version of agent e.g. 53.0.2785.124 |
agent_version_major | Major version of agent e.g. 53 |
device_brand | Brand of device e.g. Google |
device_name | Name of device e.g. Google Nexus 6 |
device_version | Version of device e.g. 6.0 |
layout_engine_class | Class of layout engine e.g. Browser |
layout_engine_name | Name of layout engine e.g. Blink |
layout_engine_name_version | Name and version of layout engine e.g. Blink 53.0 |
layout_engine_name_version_major | Name and major version of layout engine e.g. Blink 53 |
layout_engine_version | Version of layout engine e.g. 53.0 |
layout_engine_version_major | Major version of layout engine e.g. 53 |
operating_system_class | Class of the OS e.g. Mobile |
operating_system_name | Name of the OS e.g. Android |
operating_system_name_version | Name and version of the OS e.g. Android 7.0 |
operating_system_version | Version of the OS e.g. 7.0 |
lcp | A metric for measuring perceived load speed because it marks the point in the page load timeline when the page's main content has likely loaded. Measured in milliseconds. For more information https://web.dev/lcp/. |
fcp | A metric for measuring the time from when the page starts loading to when any part of the page's content is rendered on the screen. |
fid | A metric for measuring load responsiveness because it quantifies the experience users feel when trying to interact with unresponsive pages. Measured in milliseconds. For more information https://web.dev/fid/. |
cls | A unitless metric for measuring visual stability because it helps quantify how often users experience unexpected layout shifts. For more information https://web.dev/cls/. |
inp | A metric that assesses responsiveness. INP observes the latency of all interactions a user has made with the page, and reports a single value which all (or nearly all) interactions were below that value. For more information https://web.dev/inp/. |
ttfb | A DOMHighResTimeStamp referring to the time in milliseconds between the browser requesting a page and when it receives the first byte of information from the server. For more information https://web.dev/ttfb/. |
navigation_type | The navigation type recognised from the Navigation Timing API https://www.w3.org/TR/navigation-timing-2/. E.g. 'navigate', 'reload', 'back-forward', 'back-forward-cache', 'prerender', 'restore'. |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ |
Code
- bigquery
- databricks
- default
- snowflake
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_cwv", false) and target.type == 'bigquery' | as_bool(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with prep as (
select
e.event_id,
e.event_name,
e.app_id,
e.platform,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.collector_tstamp,
e.derived_tstamp,
e.dvce_created_tstamp,
e.load_tstamp,
e.geo_country,
e.page_url,
e.page_title,
e.useragent,
{{ snowplow_utils.get_optional_fields(
enabled=true,
fields=yauaa_fields(),
col_prefix='contexts_nl_basjes_yauaa_context_1_0_0',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='e') }},
{{ snowplow_utils.get_optional_fields(
enabled= true,
fields=[{'field': 'lcp', 'dtype': 'string'}, {'field': 'fcp', 'dtype': 'string'}, {'field': 'fid', 'dtype': 'string'}, {'field': 'cls', 'dtype': 'string'}, {'field': 'inp', 'dtype': 'string'}, {'field': 'ttfb', 'dtype': 'string'}, {'field': 'navigation_type', 'dtype': 'string'}],
col_prefix='unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1_0_0',
relation=ref('snowplow_web_base_events_this_run'),
relation_alias='e') }}
from {{ ref("snowplow_web_base_events_this_run") }} as e
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
and event_name = 'web_vitals'
and page_view_id is not null
-- exclude bot traffic
{% if var('snowplow__enable_iab', false) %}
and not {{ snowplow_utils.get_field(column_name = 'contexts_com_iab_snowplow_spiders_and_robots_1_0_0',
field_name = 'spider_or_robot',
table_alias = 'e',
type = 'boolean',
array_index = 0)}} = True
{% endif %}
{{ filter_bots() }}
)
select
event_id,
event_name,
app_id,
platform,
domain_userid,
original_domain_userid,
user_id,
page_view_id,
domain_sessionid,
original_domain_sessionid,
collector_tstamp,
derived_tstamp,
dvce_created_tstamp,
load_tstamp,
geo_country,
page_url,
page_title,
useragent,
lower(device_class) as device_class,
agent_class,
agent_name,
agent_name_version,
agent_name_version_major,
agent_version,
agent_version_major,
device_brand,
device_name,
device_version,
layout_engine_class,
layout_engine_name,
layout_engine_name_version,
layout_engine_name_version_major,
layout_engine_version,
layout_engine_version_major,
operating_system_class,
operating_system_name,
operating_system_name_version,
operating_system_version,
ceil(cast(lcp as decimal)) /1000 as lcp,
ceil(cast(fcp as decimal)) /1000 as fcp,
ceil(safe_cast(fid as decimal) * 1000) /1000 as fid,
ceil(cast(cls as decimal) * 1000) /1000 as cls,
ceil(cast(inp as decimal) * 1000) /1000 as inp,
ceil(cast(ttfb as decimal) * 1000) /1000 as ttfb,
navigation_type
from prep p
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_cwv", false) and target.type in ('databricks', 'spark') | as_bool(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with prep as (
select
e.event_id,
e.event_name,
e.app_id,
e.platform,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.collector_tstamp,
e.derived_tstamp,
e.dvce_created_tstamp,
e.load_tstamp,
e.geo_country,
e.page_url,
e.page_title,
e.useragent,
{{snowplow_web.get_yauaa_context_fields()}},
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1.lcp::decimal(14,4)) /1000 as lcp,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1.fcp::decimal(14,4), 3) as fcp,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1.fid::decimal(14,4), 3) as fid,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1.cls::decimal(14,4), 3) as cls,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1.inp::decimal(14,4), 3) as inp,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1.ttfb::decimal(14,4), 3) as ttfb,
e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1.navigation_type::varchar(128) as navigation_type
from {{ ref("snowplow_web_base_events_this_run") }} as e
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
and event_name = 'web_vitals'
and page_view_id is not null
-- exclude bot traffic
{% if var('snowplow__enable_iab', false) %}
and not {{ snowplow_utils.get_field(column_name = 'contexts_com_iab_snowplow_spiders_and_robots_1',
field_name = 'spider_or_robot',
table_alias = 'e',
type = 'boolean',
array_index = 0)}} = True
{% endif %}
{{ filter_bots() }}
)
select
event_id,
event_name,
app_id,
platform,
domain_userid,
original_domain_userid,
user_id,
page_view_id,
domain_sessionid,
original_domain_sessionid,
collector_tstamp,
derived_tstamp,
dvce_created_tstamp,
load_tstamp,
geo_country,
page_url,
page_title,
useragent,
lower(device_class) as device_class,
agent_class,
agent_name,
agent_name_version,
agent_name_version_major,
agent_version,
agent_version_major,
device_brand,
device_name,
device_version,
layout_engine_class,
layout_engine_name,
layout_engine_name_version,
layout_engine_name_version_major,
layout_engine_version,
layout_engine_version_major,
operating_system_class,
operating_system_name,
operating_system_name_version,
operating_system_version,
lcp,
fcp,
fid,
cls,
inp,
ttfb,
navigation_type
from prep p
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_cwv", false) and target.type in ('redshift', 'postgres') | as_bool()
)
}}
with prep as (
select
e.event_id,
e.event_name,
e.app_id,
e.platform,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.collector_tstamp,
e.derived_tstamp,
e.dvce_created_tstamp,
e.load_tstamp,
e.geo_country,
e.page_url,
e.page_title,
e.useragent,
{{snowplow_web.get_yauaa_context_fields()}},
ceil(cast(cwv_lcp/1000 as decimal(14,4))*1000) /1000 as lcp,
ceil(cast(cwv_fcp as decimal(14,4))*1000) /1000 as fcp,
ceil(cast(cwv_fid as decimal(14,4))*1000) /1000 as fid,
ceil(cast(cwv_cls as decimal(14,4))*1000) /1000 as cls,
ceil(cast(cwv_inp as decimal(14,4))*1000) /1000 as inp,
ceil(cast(cwv_ttfb as decimal(14,4))*1000) /1000 as ttfb,
cast(cwv_navigation_type as {{ dbt.type_string() }}) as navigation_type
from {{ ref("snowplow_web_base_events_this_run") }} as e
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
and event_name = 'web_vitals'
and page_view_id is not null
-- exclude bot traffic
{% if var('snowplow__enable_iab', false) %}
and not e.iab_spider_or_robot = True
{% endif %}
{{ filter_bots() }}
)
select
event_id,
event_name,
app_id,
platform,
domain_userid,
original_domain_userid,
user_id,
page_view_id,
domain_sessionid,
original_domain_sessionid,
collector_tstamp,
derived_tstamp,
dvce_created_tstamp,
load_tstamp,
geo_country,
page_url,
page_title,
useragent,
lower(yauaa_device_class) as device_class,
yauaa_agent_class as agent_class,
yauaa_agent_name as agent_name,
yauaa_agent_name_version as agent_name_version,
yauaa_agent_name_version_major as agent_name_version_major,
yauaa_agent_version as agent_version,
yauaa_agent_version_major as agent_version_major,
yauaa_device_brand as device_brand,
yauaa_device_name as device_name,
yauaa_device_version as device_version,
yauaa_layout_engine_class as layout_engine_class,
yauaa_layout_engine_name as layout_engine_name,
yauaa_layout_engine_name_version as layout_engine_name_version,
yauaa_layout_engine_name_version_major as layout_engine_name_version_major,
yauaa_layout_engine_version as layout_engine_version,
yauaa_layout_engine_version_major as layout_engine_version_major,
yauaa_operating_system_class as operating_system_class,
yauaa_operating_system_name as operating_system_name,
yauaa_operating_system_name_version as operating_system_name_version,
yauaa_operating_system_version as operating_system_version,
lcp,
fcp,
fid,
cls,
inp,
ttfb,
navigation_type
from prep p
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
tags=["this_run"],
enabled=var("snowplow__enable_cwv", false) and target.type == 'snowflake' | as_bool(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with prep as (
select
e.event_id,
e.event_name,
e.app_id,
e.platform,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.collector_tstamp,
e.derived_tstamp,
e.dvce_created_tstamp,
e.load_tstamp,
e.geo_country,
e.page_url,
e.page_title,
e.useragent,
{{snowplow_web.get_yauaa_context_fields()}},
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1:lcp::decimal(14,4), 3) /1000 as lcp,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1:fcp::decimal(14,4), 3) as fcp,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1:fid::decimal(14,4), 3) as fid,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1:cls::decimal(14,4), 3) as cls,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1:inp::decimal(14,4), 3) as inp,
ceil(e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1:ttfb::decimal(14,4), 3) as ttfb,
e.unstruct_event_com_snowplowanalytics_snowplow_web_vitals_1:navigationType::varchar as navigation_type
from {{ ref("snowplow_web_base_events_this_run") }} as e
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
and event_name = 'web_vitals'
and page_view_id is not null
-- exclude bot traffic
{% if var('snowplow__enable_iab', false) %}
and not {{ snowplow_utils.get_field(column_name = 'contexts_com_iab_snowplow_spiders_and_robots_1',
field_name = 'spiderOrRobot',
table_alias = 'e',
type = 'boolean',
array_index = 0)}} = True
{% endif %}
{{ filter_bots() }}
)
select
event_id,
event_name,
app_id,
platform,
domain_userid,
original_domain_userid,
user_id,
page_view_id,
domain_sessionid,
original_domain_sessionid,
collector_tstamp,
derived_tstamp,
dvce_created_tstamp,
load_tstamp,
geo_country,
page_url,
page_title,
useragent,
lower(device_class) as device_class,
agent_class,
agent_name,
agent_name_version,
agent_name_version_major,
agent_version,
agent_version_major,
device_brand,
device_name,
device_version,
layout_engine_class,
layout_engine_name,
layout_engine_name_version,
layout_engine_name_version_major,
layout_engine_version,
layout_engine_version_major,
operating_system_class,
operating_system_name,
operating_system_name_version,
operating_system_version,
lcp,
fcp,
fid,
cls,
inp,
ttfb,
navigation_type
from prep p
Depends On
- Models
- Macros
- macro.dbt.type_string
- macro.snowplow_utils.get_field
- macro.snowplow_utils.get_optional_fields
- macro.snowplow_utils.get_sde_or_context
- macro.snowplow_utils.is_run_with_new_events
- macro.snowplow_utils.set_query_tag
- macro.snowplow_web.filter_bots
- macro.snowplow_web.get_yauaa_context_fields
- macro.snowplow_web.yauaa_fields
Referenced By
Snowplow Web Vital Measurements
models/optional_modules/core_web_vitals/<adaptor>/snowplow_web_vital_measurements.sql
Description
A table aimed to use for visualisations that takes core web vital measurements at the user specified percentile point. Defaulted to 75.
File Paths
- bigquery
- databricks
- default
- snowflake
models/optional_modules/core_web_vitals/bigquery/snowplow_web_vital_measurements.sql
models/optional_modules/core_web_vitals/databricks/snowplow_web_vital_measurements.sql
models/optional_modules/core_web_vitals/default/snowplow_web_vital_measurements.sql
models/optional_modules/core_web_vitals/snowflake/snowplow_web_vital_measurements.sql
Details
Columns
Column Name | Description |
---|---|
compound_key | A compound key for the table. |
measurement_type | The category to be measured. E.g. By country |
page_url | The page URL e.g. ‘http://www.example.com’ |
device_class | Class of device e.g. phone |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ |
country | Name of the country the visitor is located in |
time_period | The specific time period (usually day) of the measured period. |
page_view_count | The number of page_views within the measured range. |
lcp_75p | The lcp result at the given percentile point. |
fid_75p | The fid result at the given percentile point. |
cls_75p | The cls result at the given percentile point. |
ttfb_75p | The ttfb result at the given percentile point. |
lcp_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query() |
fid_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query() |
cls_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query() |
ttfb_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query() |
passed | Evaluation that only passes when all of the lcp/fid/cls results pass. |
Code
- bigquery
- databricks
- default
- snowflake
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='table',
enabled=var("snowplow__enable_cwv", false) and target.type == 'bigquery' | as_bool()
)
}}
with by_url_and_device as (
select distinct
page_url,
device_class,
'all' as geo_country,
'last {{var("snowplow__cwv_days_to_measure")|string }} days' as time_period,
count(*) over (partition by page_url, device_class) as page_view_count,
percentile_cont(lcp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by page_url, device_class) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(fid, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by page_url, device_class) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(cls, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by page_url, device_class) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(ttfb, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by page_url, device_class) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(inp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by page_url, device_class) as inp_{{ var('snowplow__cwv_percentile') }}p,
'by_url_and_device' as measurement_type
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, overall as (
select distinct
'all' as page_url,
'all' as device_class,
'all' as geo_country,
'last {{var("snowplow__cwv_days_to_measure")|string }} days' as time_period,
count(*) over() as page_view_count,
percentile_cont(lcp, 0.{{ var('snowplow__cwv_percentile') }}) over() as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(fid, 0.{{ var('snowplow__cwv_percentile') }}) over() as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(cls, 0.{{ var('snowplow__cwv_percentile') }}) over() as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(ttfb, 0.{{ var('snowplow__cwv_percentile') }}) over() as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(inp, 0.{{ var('snowplow__cwv_percentile') }}) over() as inp_{{ var('snowplow__cwv_percentile') }}p,
'overall' as measurement_type
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, by_device as (
select distinct
'all' as page_url,
device_class,
'all' as geo_country,
'last {{var("snowplow__cwv_days_to_measure")|string }} days' as time_period,
count(*) over (partition by device_class) as page_view_count,
percentile_cont(lcp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by device_class) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(fid, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by device_class) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(cls, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by device_class) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(ttfb, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by device_class) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(inp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by device_class) as inp_{{ var('snowplow__cwv_percentile') }}p,
'by_device' as measurement_type
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, by_day as (
select distinct
'all' as page_url,
'all' as device_class,
'all' as geo_country,
cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }}) as time_period,
count(*) over (partition by cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }})) as page_view_count,
percentile_cont(lcp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(fid, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(cls, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(ttfb, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(inp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}) as inp_{{ var('snowplow__cwv_percentile') }}p,
'by_day' as measurement_type
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, by_day_and_device as (
select distinct
'all' as page_url,
device_class,
'all' as geo_country,
cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }}) as time_period,
count(*) over (partition by device_class, cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }})) as page_view_count,
percentile_cont(lcp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}, device_class) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(fid, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}, device_class) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(cls, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}, device_class) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(ttfb, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}, device_class) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(inp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by {{ dbt.date_trunc('day', 'derived_tstamp') }}, device_class) as inp_{{ var('snowplow__cwv_percentile') }}p,
'by_day_and_device' as measurement_type
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, by_country as (
select distinct
'all' as page_url,
'all' as device_class,
geo_country,
'last {{var("snowplow__cwv_days_to_measure")|string }} days' as time_period,
count(*) over (partition by geo_country) as page_view_count,
percentile_cont(lcp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(fid, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(cls, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(ttfb, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(inp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country) as inp_{{ var('snowplow__cwv_percentile') }}p,
'by_country' as measurement_type
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, by_country_and_device as (
select distinct
'all' as page_url,
device_class,
geo_country,
'last {{var("snowplow__cwv_days_to_measure")|string }} days' as time_period,
count(*) over (partition by geo_country, device_class) as page_view_count,
percentile_cont(lcp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country, device_class) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(fid, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country, device_class) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(cls, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country, device_class) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(ttfb, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country, device_class) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(inp, 0.{{ var('snowplow__cwv_percentile') }}) over (partition by geo_country, device_class) as inp_{{ var('snowplow__cwv_percentile') }}p,
'by_country_and_device' as measurement_type
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, measurements as (
select *,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from by_url_and_device
union all
select *,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from overall
union all
select *,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from by_device
union all
select *,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from by_day
union all
select *,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from by_day_and_device
union all
select *,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from by_country
union all
select *,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from by_country_and_device
)
, coalesce as (
select
m.measurement_type,
m.page_url,
m.device_class,
m.geo_country,
coalesce(g.name, 'all') as country,
m.time_period,
m.page_view_count,
ceil(cast(m.lcp_{{ var('snowplow__cwv_percentile') }}p as decimal) * 1000) /1000 as lcp_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(m.fid_{{ var('snowplow__cwv_percentile') }}p as decimal) * 1000) /1000 as fid_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(m.cls_{{ var('snowplow__cwv_percentile') }}p as decimal) * 1000) /1000 as cls_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(m.ttfb_{{ var('snowplow__cwv_percentile') }}p as decimal) * 1000) /1000 as ttfb_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(m.inp_{{ var('snowplow__cwv_percentile') }}p as decimal) * 1000) /1000 as inp_{{ var('snowplow__cwv_percentile') }}p,
m.lcp_result,
m.fid_result,
m.cls_result,
m.ttfb_result,
m.inp_result,
{{ snowplow_web.core_web_vital_pass_query() }} as passed
from measurements m
left join {{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(m.geo_country) = lower(g.alpha_2)
order by 1
)
select
{{ dbt.concat(['page_url', "'-'" , 'device_class', "'-'" , 'geo_country', "'-'" , 'time_period' ]) }} compound_key,
*
from coalesce
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='table',
enabled=var("snowplow__enable_cwv", false) and target.type in ('databricks', 'spark') | as_bool()
)
}}
with measurements as (
select
page_url,
device_class,
geo_country,
cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }}) as time_period,
count(*) as page_view_count,
grouping_id() as grouping_ids,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by lcp) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by fid) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by cls) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by ttfb) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by inp) as inp_{{ var('snowplow__cwv_percentile') }}p
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
group by cube(page_url, device_class,cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }}), geo_country)
)
, measurement_type as (
select
*,
case when grouping_ids = 15 then 'overall'
when grouping_ids = 3 then 'by_url_and_device'
when grouping_ids = 9 then 'by_day_and_device'
when grouping_ids = 10 then 'by_country_and_device'
when grouping_ids = 14 then 'by_country'
when grouping_ids = 11 then 'by_device'
when grouping_ids = 13 then 'by_day'
end as measurement_type,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from measurements
)
, coalesce as (
select
m.measurement_type,
coalesce(m.page_url, 'all') as page_url,
coalesce(m.device_class, 'all') as device_class,
coalesce(m.geo_country, 'all') as geo_country,
coalesce(g.name, 'all') as country,
coalesce(m.time_period, 'last {{var("snowplow__cwv_days_to_measure")|string }} days') as time_period,
m.page_view_count,
ceil(m.lcp_{{ var('snowplow__cwv_percentile') }}p, 3) as lcp_{{ var('snowplow__cwv_percentile') }}p,
ceil(m.fid_{{ var('snowplow__cwv_percentile') }}p, 3) as fid_{{ var('snowplow__cwv_percentile') }}p,
ceil(m.cls_{{ var('snowplow__cwv_percentile') }}p, 3) as cls_{{ var('snowplow__cwv_percentile') }}p,
ceil(m.ttfb_{{ var('snowplow__cwv_percentile') }}p, 3) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
ceil(m.inp_{{ var('snowplow__cwv_percentile') }}p, 3) as inp_{{ var('snowplow__cwv_percentile') }}p,
m.lcp_result,
m.fid_result,
m.cls_result,
m.ttfb_result,
m.inp_result,
{{ snowplow_web.core_web_vital_pass_query() }} as passed
from measurement_type m
left join {{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(m.geo_country) = lower(g.alpha_2)
where measurement_type is not null
order by 1
)
select
{{ dbt.concat(['page_url', "'-'" , 'device_class', "'-'" , 'geo_country', "'-'" , 'time_period' ]) }} compound_key,
*
from coalesce
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='table',
enabled=var("snowplow__enable_cwv", false) and target.type in ('redshift', 'postgres') | as_bool(),
)
}}
{% if target.type == 'redshift'%}
{% set grouping_function = 'grouping_id' %}
{% else %}
{% set grouping_function = 'grouping' %}
{% endif %}
with prep as (
select
page_url,
device_class,
geo_country,
concat(cast({{ date_trunc('day', 'derived_tstamp') }} as {{ type_string() }}),'.000') as time_period,
lcp,
fid,
cls,
ttfb,
inp
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
)
, lcp_measurements as (
select
{{ dbt_utils.generate_surrogate_key(['page_url', 'device_class', 'geo_country', 'time_period' ]) }} surrogate_key,
page_url,
device_class,
geo_country,
time_period,
count(*) as page_view_count,
{{ grouping_function }}(page_url, device_class) as id_url_and_device,
{{ grouping_function }}(device_class) as id_device,
{{ grouping_function }}(time_period) as id_period,
{{ grouping_function }}(time_period, device_class) as id_period_and_device,
{{ grouping_function }}(geo_country) as id_country,
{{ grouping_function }}(geo_country, device_class) as id_country_and_device,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by lcp) as lcp_{{ var('snowplow__cwv_percentile') }}p
from prep
group by grouping sets ((), (page_url, device_class), (device_class), (time_period), (time_period, device_class), (geo_country), (geo_country, device_class))
)
, fid_measurements as (
select
{{ dbt_utils.generate_surrogate_key(['page_url', 'device_class', 'geo_country', 'time_period' ]) }} surrogate_key,
page_url,
device_class,
geo_country,
time_period,
count(*) as page_view_count,
{{ grouping_function }}(page_url, device_class) as id_url_and_device,
{{ grouping_function }}(device_class) as id_device,
{{ grouping_function }}(time_period) as id_period,
{{ grouping_function }}(time_period, device_class) as id_period_and_device,
{{ grouping_function }}(geo_country) as id_country,
{{ grouping_function }}(geo_country, device_class) as id_country_and_device,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by fid) as fid_{{ var('snowplow__cwv_percentile') }}p
from prep
group by grouping sets ((), (page_url, device_class), (device_class), (time_period), (time_period, device_class), (geo_country), (geo_country, device_class))
)
, cls_measurements as (
select
{{ dbt_utils.generate_surrogate_key(['page_url', 'device_class', 'geo_country', 'time_period' ]) }} surrogate_key,
page_url,
device_class,
geo_country,
time_period,
count(*) as page_view_count,
{{ grouping_function }}(page_url, device_class) as id_url_and_device,
{{ grouping_function }}(device_class) as id_device,
{{ grouping_function }}(time_period) as id_period,
{{ grouping_function }}(time_period, device_class) as id_period_and_device,
{{ grouping_function }}(geo_country) as id_country,
{{ grouping_function }}(geo_country, device_class) as id_country_and_device,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by cls) as cls_{{ var('snowplow__cwv_percentile') }}p
from prep
group by grouping sets ((), (page_url, device_class), (device_class), (time_period), (time_period, device_class), (geo_country), (geo_country, device_class))
)
, ttfb_measurements as (
select
{{ dbt_utils.generate_surrogate_key(['page_url', 'device_class', 'geo_country', 'time_period' ]) }} surrogate_key,
page_url,
device_class,
geo_country,
time_period,
count(*) as page_view_count,
{{ grouping_function }}(page_url, device_class) as id_url_and_device,
{{ grouping_function }}(device_class) as id_device,
{{ grouping_function }}(time_period) as id_period,
{{ grouping_function }}(time_period, device_class) as id_period_and_device,
{{ grouping_function }}(geo_country) as id_country,
{{ grouping_function }}(geo_country, device_class) as id_country_and_device,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by ttfb) as ttfb_{{ var('snowplow__cwv_percentile') }}p
from prep
group by grouping sets ((), (page_url, device_class), (device_class), (time_period), (time_period, device_class), (geo_country), (geo_country, device_class))
)
, inp_measurements as (
select
{{ dbt_utils.generate_surrogate_key(['page_url', 'device_class', 'geo_country', 'time_period' ]) }} surrogate_key,
page_url,
device_class,
geo_country,
time_period,
count(*) as page_view_count,
{{ grouping_function }}(page_url, device_class) as id_url_and_device,
{{ grouping_function }}(device_class) as id_device,
{{ grouping_function }}(time_period) as id_period,
{{ grouping_function }}(time_period, device_class) as id_period_and_device,
{{ grouping_function }}(geo_country) as id_country,
{{ grouping_function }}(geo_country, device_class) as id_country_and_device,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by inp) as inp_{{ var('snowplow__cwv_percentile') }}p
from prep
group by grouping sets ((), (page_url, device_class), (device_class), (time_period), (time_period, device_class), (geo_country), (geo_country, device_class))
)
, measurements as (
select
l.*,
f.fid_{{ var('snowplow__cwv_percentile') }}p,
c.cls_{{ var('snowplow__cwv_percentile') }}p,
t.ttfb_{{ var('snowplow__cwv_percentile') }}p,
i.inp_{{ var('snowplow__cwv_percentile') }}p
from lcp_measurements l
left join fid_measurements f on l.surrogate_key = f.surrogate_key
left join cls_measurements c on l.surrogate_key = c.surrogate_key
left join ttfb_measurements t on l.surrogate_key = t.surrogate_key
left join inp_measurements i on l.surrogate_key = i.surrogate_key
)
, measurement_type as (
select
*,
case when id_url_and_device <> 0 and id_device <> 0 and id_period <> 0 and id_period_and_device <> 0 and id_country <> 0 and id_country_and_device <> 0 then 'overall'
when id_url_and_device = 0 then 'by_url_and_device'
when id_period_and_device = 0 then 'by_day_and_device'
when id_country_and_device = 0 then 'by_country_and_device'
when id_country = 0 then 'by_country'
when id_device = 0 then 'by_device'
when id_period = 0 then 'by_day'
end as measurement_type,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from measurements
)
, coalesce as (
select
measurement_type,
coalesce(m.page_url, 'all') as page_url,
coalesce(m.device_class, 'all') as device_class,
coalesce(m.geo_country, 'all') as geo_country,
coalesce(g.name, 'all') as country,
coalesce(time_period, 'last {{var("snowplow__cwv_days_to_measure")|string }} days') as time_period,
page_view_count,
ceil(cast(lcp_{{ var('snowplow__cwv_percentile') }}p as decimal(14,4))*1000) /1000 as lcp_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(fid_{{ var('snowplow__cwv_percentile') }}p as decimal(14,4))*1000) /1000 as fid_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(cls_{{ var('snowplow__cwv_percentile') }}p as decimal(14,4))*1000) /1000 as cls_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(ttfb_{{ var('snowplow__cwv_percentile') }}p as decimal(14,4))*1000) /1000 as ttfb_{{ var('snowplow__cwv_percentile') }}p,
ceil(cast(inp_{{ var('snowplow__cwv_percentile') }}p as decimal(14,4))*1000) /1000 as inp_{{ var('snowplow__cwv_percentile') }}p,
m.lcp_result,
m.fid_result,
m.cls_result,
m.ttfb_result,
m.inp_result,
{{ snowplow_web.core_web_vital_pass_query() }} as passed
from measurement_type m
left join {{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(m.geo_country) = lower(g.alpha_2)
order by 1
)
select
{{ dbt.concat(['page_url', "'-'" , 'device_class', "'-'" , 'geo_country', "'-'" , 'time_period' ]) }} compound_key,
measurement_type,
page_url,
device_class,
geo_country,
country,
time_period,
page_view_count,
lcp_{{ var('snowplow__cwv_percentile') }}p,
fid_{{ var('snowplow__cwv_percentile') }}p,
cls_{{ var('snowplow__cwv_percentile') }}p,
ttfb_{{ var('snowplow__cwv_percentile') }}p,
inp_{{ var('snowplow__cwv_percentile') }}p,
lcp_result,
fid_result,
cls_result,
ttfb_result,
inp_result,
passed
from coalesce
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized='table',
enabled=var("snowplow__enable_cwv", false) and target.type == 'snowflake' | as_bool(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with measurements as (
select
page_url,
device_class,
geo_country,
cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }}) as time_period,
count(*) as page_view_count,
grouping_id(page_url, device_class) as id_url_and_device,
grouping_id(device_class) as id_device,
grouping_id(cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }})) as id_period,
grouping_id(cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }}), device_class) as id_period_and_device,
grouping_id(geo_country) as id_country,
grouping_id(geo_country, device_class) as id_country_and_device,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by lcp) as lcp_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by fid) as fid_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by cls) as cls_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by ttfb) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
percentile_cont(0.{{ var('snowplow__cwv_percentile') }}) within group (order by inp) as inp_{{ var('snowplow__cwv_percentile') }}p
from {{ ref('snowplow_web_vitals') }}
where cast(derived_tstamp as date) >= {{ dateadd('day', '-'+var('snowplow__cwv_days_to_measure')|string, date_trunc('day', snowplow_utils.current_timestamp_in_utc())) }}
group by grouping sets ((), (page_url, device_class), (device_class), (cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }})), (cast( {{ dbt.date_trunc('day', 'derived_tstamp') }} as {{ dbt.type_string() }}), device_class), (geo_country), (geo_country, device_class))
)
, measurement_type as (
select
*,
case when id_url_and_device <> 0 and id_device <> 0 and id_period <> 0 and id_period_and_device <> 0 and id_country <> 0 and id_country_and_device <> 0 then 'overall'
when id_url_and_device = 0 then 'by_url_and_device'
when id_period_and_device = 0 then 'by_day_and_device'
when id_country_and_device = 0 then 'by_country_and_device'
when id_country = 0 then 'by_country'
when id_device = 0 then 'by_device'
when id_period = 0 then 'by_day'
end as measurement_type,
{{ snowplow_web.core_web_vital_results_query('_' + var('snowplow__cwv_percentile') | string + 'p') }}
from measurements
)
, coalesce as (
select
m.measurement_type,
coalesce(m.page_url, 'all') as page_url,
coalesce(m.device_class, 'all') as device_class,
coalesce(m.geo_country, 'all') as geo_country,
coalesce(g.name, 'all') as country,
coalesce(time_period, 'last {{var("snowplow__cwv_days_to_measure")|string }} days') as time_period,
page_view_count,
ceil(lcp_{{ var('snowplow__cwv_percentile') }}p, 3) as lcp_{{ var('snowplow__cwv_percentile') }}p,
ceil(fid_{{ var('snowplow__cwv_percentile') }}p, 3) as fid_{{ var('snowplow__cwv_percentile') }}p,
ceil(cls_{{ var('snowplow__cwv_percentile') }}p, 3) as cls_{{ var('snowplow__cwv_percentile') }}p,
ceil(ttfb_{{ var('snowplow__cwv_percentile') }}p, 3) as ttfb_{{ var('snowplow__cwv_percentile') }}p,
ceil(inp_{{ var('snowplow__cwv_percentile') }}p, 3) as inp_{{ var('snowplow__cwv_percentile') }}p,
m.lcp_result,
m.fid_result,
m.cls_result,
m.ttfb_result,
m.inp_result,
{{ snowplow_web.core_web_vital_pass_query() }} as passed
from measurement_type m
left join {{ ref(var('snowplow__geo_mapping_seed')) }} g on lower(m.geo_country) = lower(g.alpha_2)
order by 1
)
select
{{ dbt.concat(['page_url', "'-'" , 'device_class', "'-'" , 'geo_country', "'-'" , 'time_period' ]) }} compound_key,
*
from coalesce
Depends On
- Models
- Macros
- macro.dbt.concat
- macro.dbt.date_trunc
- macro.dbt.dateadd
- macro.dbt.type_string
- macro.dbt_utils.generate_surrogate_key
- macro.snowplow_utils.current_timestamp_in_utc
- macro.snowplow_utils.set_query_tag
- macro.snowplow_web.core_web_vital_pass_query
- macro.snowplow_web.core_web_vital_results_query
Snowplow Web Vitals
models/optional_modules/core_web_vitals/snowplow_web_vitals.sql
Description
An incremental table used as a base for storing core web vital events (first event per pageview).
Details
Columns
Column Name | Description |
---|---|
event_id | A UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
event_name | Event name e.g. ‘link_click’ |
app_id | Application ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev. |
platform | Platform e.g. ‘web’ |
domain_userid | User identifier specified in your project variables. By default this is the true domain_userid, a user ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ |
user_id | Unique ID set by business e.g. ‘jon.doe@email.com’ |
page_view_id | A UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
domain_sessionid | Session identifier specified in your project variables. By default this is the true domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
collector_tstamp | Time stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’ |
derived_tstamp | Timestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’ |
load_tstamp | The timestamp of the event landing the data warehouse. |
geo_country | ISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’ |
page_url | The page URL e.g. ‘http://www.example.com’ |
page_title | Web page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’ |
useragent | Raw useragent |
device_class | Class of device e.g. phone |
device_name | Name of device e.g. Google Nexus 6 |
agent_name | Name of agent e.g. Chrome |
agent_version | Version of agent e.g. 53.0.2785.124 |
operating_system_name | Name of the OS e.g. Android |
lcp | A metric for measuring perceived load speed because it marks the point in the page load timeline when the page's main content has likely loaded. Measured in milliseconds. For more information https://web.dev/lcp/. |
fcp | A metric for measuring the time from when the page starts loading to when any part of the page's content is rendered on the screen. |
fid | A metric for measuring load responsiveness because it quantifies the experience users feel when trying to interact with unresponsive pages. Measured in milliseconds. For more information https://web.dev/fid/. |
cls | A unitless metric for measuring visual stability because it helps quantify how often users experience unexpected layout shifts. For more information https://web.dev/cls/. |
inp | A metric that assesses responsiveness. INP observes the latency of all interactions a user has made with the page, and reports a single value which all (or nearly all) interactions were below that value. For more information https://web.dev/inp/. |
ttfb | A DOMHighResTimeStamp referring to the time in milliseconds between the browser requesting a page and when it receives the first byte of information from the server. For more information https://web.dev/ttfb/. |
navigation_type | The navigation type recognised from the Navigation Timing API https://www.w3.org/TR/navigation-timing-2/. E.g. 'navigate', 'reload', 'back-forward', 'back-forward-cache', 'prerender', 'restore'. |
lcp_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query(). |
fid_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query() |
cls_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query() |
ttfb_result | The evaluation of the metric in question. One of 'good' / 'needs improvement' / 'poor' depending on the tresholds defined in macro core_web_vital_results_query() |
original_domain_sessionid | True domain_sessionid i.e. a visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’ |
original_domain_userid | True User ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’ |
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
materialized= 'incremental',
enabled=var("snowplow__enable_cwv", false) | as_bool(),
unique_key='page_view_id',
upsert_date_key='derived_tstamp',
sort='derived_tstamp',
dist='page_view_id',
tags=["derived"],
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val = {
"field": "derived_tstamp",
"data_type": "timestamp"
}, databricks_val = 'derived_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_cwv(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize= true
)
}}
select
*
{% if target.type in ['databricks', 'spark'] -%}
, DATE(derived_tstamp) as derived_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_vitals_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
Depends On
- Models
- Macros
Referenced By
Snowplow Web Vitals This Run
models/optional_modules/core_web_vitals/scratch/snowplow_web_vitals_this_run.sql
Description
A scratch table used as a base for creating the main incremental core web vital events (first event per pageview).
Details
Code
- default
{#
Copyright (c) 2020-present Snowplow Analytics Ltd. All rights reserved.
This program is licensed to you under the Snowplow Community License Version 1.0,
and you may not use this file except in compliance with the Snowplow Community License Version 1.0.
You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0
#}
{{
config(
enabled=var("snowplow__enable_cwv", false) | as_bool(),
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}
with prep as (
select
e.event_id,
e.event_name,
e.app_id,
e.platform,
e.domain_userid,
e.original_domain_userid,
e.user_id,
e.page_view_id,
e.domain_sessionid,
e.original_domain_sessionid,
e.collector_tstamp,
e.derived_tstamp,
e.load_tstamp,
coalesce(e.geo_country, 'unknown_geo_country') as geo_country,
coalesce(e.page_url, 'unknown_page_url') as page_url,
{{ core_web_vital_page_groups() }} as url_group,
e.page_title,
e.useragent,
coalesce(e.device_class, 'unknown_device_class') as device_class,
e.device_name,
e.agent_name,
e.agent_version,
e.operating_system_name,
e.lcp,
e.fcp,
e.fid,
e.cls,
e.inp,
e.ttfb,
e.navigation_type,
row_number() over (partition by e.page_view_id order by e.derived_tstamp, e.dvce_created_tstamp, e.event_id) dedupe_index
from {{ ref("snowplow_web_vital_events_this_run") }} as e
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
)
select
*,
{{ snowplow_web.core_web_vital_results_query() }}
from prep p
where dedupe_index = 1
Depends On
- Models
- Macros