Skip to main content

Snowplow Normalize Models

caution

This page is auto-generated from our dbt packages, some information may be incomplete

Snowplow Normalize

Snowplow Normalize Base Events This Run

models/base/scratch/<adaptor>/snowplow_normalize_base_events_this_run.sql

Description

For any given run, this table contains all required events to be consumed by subsequent nodes in the Snowplow dbt normalize package. This is a cleaned, deduped dataset, containing all columns from the raw events table.

Note: This table should be used as the input to any custom modules that require event level data, rather than selecting straight from atomic.events

File Paths

models/base/scratch/bigquery/snowplow_normalize_base_events_this_run.sql

Details

Columns
note

Base event this run table column lists may be incomplete and is missing contexts/unstructs, please check your warehouse for a more accurate column list.

Column NameDescription
app_idApplication ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.
platformPlatform e.g. ‘web’
etl_tstampTimestamp event began ETL e.g. ‘2017-01-26 00:01:25.292’
collector_tstampTime stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’
dvce_created_tstampTimestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’
eventThe type of event recorded e.g. ‘page_view’
event_idA UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
txn_idTransaction ID set client-side, used to de-dupe records e.g. 421828
name_trackerTracker namespace e.g. ‘sp1’
v_trackerTracker version e.g. ‘js-3.0.0’
v_collectorCollector version e.g. ‘ssc-2.1.0-kinesis’
v_etlETL version e.g. ‘snowplow-micro-1.1.0-common-1.4.2’
user_idUnique ID set by business e.g. ‘jon.doe@email.com
user_ipaddressUser IP address e.g. ‘92.231.54.234’
user_fingerprintA user fingerprint generated by looking at the individual browser features e.g. 2161814971
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’
domain_sessionidxA visit / session index e.g. 3
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’
geo_regionISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’
geo_cityCity the visitor is in e.g. ‘New York’, ‘London’
geo_zipcodePostcode the visitor is in e.g. ‘94109’
geo_latitudeVisitor location latitude e.g. 37.443604
geo_longitudeVisitor location longitude e.g. -122.4124
geo_region_nameVisitor region name e.g. ‘Florida’
ip_ispVisitor’s ISP e.g. ‘FDN Communications’
ip_organizationOrganization associated with the visitor’s IP address – defaults to ISP name if none is found e.g. ‘Bouygues Telecom’
ip_domainSecond level domain name associated with the visitor’s IP address e.g. ‘nuvox.net’
ip_netspeedVisitor’s connection type e.g. ‘Cable/DSL’
page_urlThe page URL e.g. ‘http://www.example.com’
page_titleWeb page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’
page_referrerURL of the referrer e.g. ‘http://www.referrer.com’
page_urlschemeScheme aka protocol e.g. ‘https’
page_urlhostHost aka domain e.g. ‘“www.snowplow.io’
page_urlportPort if specified, 80 if not 80
page_urlpathPath to page e.g. ‘/product/index.html’
page_urlqueryQuerystring e.g. ‘id=GTM-DLRG’
page_urlfragmentFragment aka anchor e.g. ‘4-conclusion’
refr_urlschemeReferer scheme e.g. ‘http’
refr_urlhostReferer host e.g. ‘www.bing.com’
refr_urlportReferer port e.g. 80
refr_urlpathReferer page path e.g. ‘/images/search’
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’
refr_urlfragmentReferer URL fragment
refr_mediumType of referer e.g. ‘search’, ‘internal’
refr_sourceName of referer if recognised e.g. ‘Bing images’
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723
mkt_campaignThe campaign ID e.g. ‘diageo-123’
se_categoryCategory of event e.g. ‘ecomm’, ‘video’
se_actionAction performed / event name e.g. ‘add-to-basket’, ‘play-video’
se_labelThe object of the action e.g. the ID of the video played or SKU of the product added-to-basket e.g. ‘pbz00123’
se_propertyA property associated with the object of the action e.g. ‘HD’, ‘large’
se_valueA value associated with the event / action e.g. the value of goods added-to-basket e.g. 9.99
tr_orderidOrder ID e.g. ‘#134’
tr_affiliationTransaction affiliation (e.g. store where sale took place) e.g. ‘web’
tr_totalTotal transaction value e.g. 12.99
tr_taxTotal tax included in transaction value e.g. 3.00
tr_shippingDelivery cost charged e.g. 0.00
tr_cityDelivery address, city e.g. ‘London’
tr_stateDelivery address, state e.g. ‘Washington’
tr_countryDelivery address, country e.g. ‘France’
ti_orderidOrder ID e.g. ‘#134’
ti_skuProduct SKU e.g. ‘pbz00123’
ti_nameProduct name e.g. ‘Cone pendulum’
ti_categoryProduct category e.g. ‘New Age’
ti_priceProduct unit price e.g. 9.99
ti_quantityNumber of product in transaction e.g. 2
pp_xoffset_minMinimum page x offset seen in the last ping period e.g. 0
pp_xoffset_maxMaximum page x offset seen in the last ping period e.g. 100
pp_yoffset_minMinimum page y offset seen in the last ping period e.g. 0
pp_yoffset_maxMaximum page y offset seen in the last ping period e.g. 200
useragentRaw useragent
br_nameBrowser name e.g. ‘Firefox 12’
br_familyBrowser family e.g. ‘Firefox’
br_versionBrowser version e.g. ‘12.0’
br_typeBrowser type e.g. ‘Browser’
br_renderengineBrowser rendering engine e.g. ‘GECKO’
br_langLanguage the browser is set to e.g. ‘en-GB’
br_features_pdfWhether the browser recognizes PDFs e.g. True
br_features_flashWhether Flash is installed e.g. True
br_features_javaWhether Java is installed e.g. True
br_features_directorWhether Adobe Shockwave is installed e.g. True
br_features_quicktimeWhether QuickTime is installed e.g. True
br_features_realplayerWhether RealPlayer is installed e.g. True
br_features_windowsmediaWhether mplayer2 is installed e.g. True
br_features_gearsWhether Google Gears is installed e.g. True
br_features_silverlightWhether Microsoft Silverlight is installed e.g. True
br_cookiesWhether cookies are enabled e.g. True
br_colordepthBit depth of the browser color palette e.g. 24
br_viewwidthViewport width e.g. 1000
br_viewheightViewport height e.g. 1000
os_nameName of operating system e.g. ‘Android’
os_familyOperating system family e.g. ‘Linux’
os_manufacturerCompany responsible for OS e.g. ‘Apple’
os_timezoneClient operating system timezone e.g. ‘Europe/London’
dvce_typeType of device e.g. ‘Computer’
dvce_ismobileIs the device mobile? e.g. True
dvce_screenwidthScreen width in pixels e.g. 1900
dvce_screenheightScreen height in pixels e.g. 1024
doc_charsetThe page’s character encoding e.g. , ‘UTF-8’
doc_widthThe page’s width in pixels e.g. 1024
doc_heightThe page’s height in pixels e.g. 3000
tr_currencyCurrency e.g. ‘USD’
tr_total_baseTotal in base currency e.g. 12.99
tr_tax_baseTotal tax in base currency e.g. 3.00
tr_shipping_basedecimal Delivery cost in base currency e.g. 0.00
ti_currencyCurrency e.g. ‘EUR’
ti_price_basedecimal Price in base currency e.g. 9.99
base_currencyReporting currency e.g. ‘GBP’
geo_timezoneVisitor timezone name e.g. ‘Europe/London’
mkt_clickidThe click ID e.g. ‘ac3d8e459’
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’
etl_tagsJSON of tags for this ETL run e.g. “[‘prod’]
dvce_sent_tstampWhen the event was sent by the client device e.g. ‘2013-11-26 00:03:58.032’
refr_domain_useridThe Snowplow domain_userid of the referring website e.g. ‘bc2e92ec6c204a14’
refr_dvce_tstampThe time of attaching the domain_userid to the inbound link e.g. ‘2013-11-26 00:02:05’
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
derived_tstampTimestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’
event_vendorWho defined the event e.g. ‘com.acme’
event_nameEvent name e.g. ‘link_click’
event_formatFormat for event e.g. ‘jsonschema’
event_versionVersion of event schema e.g. ‘1-0-2’
event_fingerprintHash client-set event fields e.g. AADCE520E20C2899F4CED228A79A3083
true_tstampUser-set “true timestamp” for the event e.g. ‘2013-11-26 00:02:04’
page_view_id
Code
Source
{{
config(
tags=["this_run"]
)
}}

{%- set lower_limit, upper_limit, session_start_limit = snowplow_utils.return_base_new_event_limits(ref('snowplow_normalize_base_new_event_limits')) %}

-- without downstream joins, it's safe to dedupe by picking the first event_id found.
select
array_agg(e order by e.collector_tstamp limit 1)[offset(0)].*

from (

select
a.*

from {{ var('snowplow__events') }} as a

where
{# dvce_sent_tstamp is an optional field and not all trackers/webhooks populate it, this means this filter needs to be optional #}
{% if var("snowplow__days_late_allowed") == -1 %}
1 = 1
{% else %}
a.dvce_sent_tstamp <= {{ snowplow_utils.timestamp_add('day', var("snowplow__days_late_allowed", 3), 'a.dvce_created_tstamp') }}
{% endif %}
and a.collector_tstamp >= {{ lower_limit }}
and a.collector_tstamp <= {{ upper_limit }}
{% if var('snowplow__derived_tstamp_partitioned', true) and target.type == 'bigquery' | as_bool() %}
and a.derived_tstamp >= {{ snowplow_utils.timestamp_add('hour', -1, lower_limit) }}
and a.derived_tstamp <= {{ upper_limit }}
{% endif %}
and {{ snowplow_utils.app_id_filter(var("snowplow__app_id",[])) }}

) e
group by e.event_id

Depends On

Snowplow Normalize Base New Event Limits

models/base/scratch/snowplow_normalize_base_new_event_limits.sql

Description

This table contains the lower and upper timestamp limits for the given run of the normalize model. These limits are used to select new events from the events table.

Type: Table

Details

Columns
Column NameDescriptionType
lower_limitThe lower collector_tstamp limit for the runtimestamp
upper_limitThe upper collector_tstamp limit for the runtimestamp
Code
Source
{{ config(
post_hook=["{{snowplow_utils.print_run_limits(this)}}"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}


{%- set models_in_run = snowplow_utils.get_enabled_snowplow_models('snowplow_normalize') -%}

{% set min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models = snowplow_utils.get_incremental_manifest_status(ref('snowplow_normalize_incremental_manifest'), models_in_run) -%}


{% set run_limits_query = snowplow_utils.get_run_limits(min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models,
var("snowplow__start_date","2020-01-01")) -%}


{{ run_limits_query }}

Depends On

Referenced By

Snowplow Normalize Incremental Manifest

models/base/manifest/snowplow_normalize_incremental_manifest.sql

Description

This incremental table is a manifest of the timestamp of the latest event consumed per model within the snowplow-normalize package as well as any models leveraging the incremental framework provided by the package. The latest event's timestamp is based off collector_tstamp. This table is used to determine what events should be processed in the next run of the model.

Type: Table

Details

Columns
Column NameDescriptionType
modelThe name of the model.string
last_successThe latest event consumed by the model, based on collector_tstamptimestamp
Code
Source
{{
config(
materialized='incremental',
full_refresh=snowplow_normalize.allow_refresh()
)
}}

-- Boilerplate to generate table.
-- Table updated as part of end-run hook

with prep as (
select
cast(null as {{ snowplow_utils.type_max_string() }}) model,
cast('1970-01-01' as {{ type_timestamp() }}) as last_success
)

select *

from prep
where false

Depends On

Referenced By

Was this page helpful?