Skip to main content

Understanding the structure of Snowplow data

Overviewโ€‹

In order to analyze Snowplow data, it is important to understand how it is structured. We have tried to make the structure of Snowplow data as simple, logical, and easy-to-query as possible.

Each line represents one eventโ€‹

Each line in the Snowplow events table represents a single event, be that a page view, add to basket, play video, like etc.

Structured dataโ€‹

Snowplow data is structured: individual fields are stored in their own columns, making writing sophisticated queries on the data easy, and making it straightforward for analysts to plugin any kind of analysis tool into their Snowplow data to compose and execute queries.

Extensible schemaโ€‹

Snowplow started life as a web analytics data warehousing platform, and has a basic schema suitable for performing web analytics, with a wide range of web-specific dimensions (related to page URLs, browsers, operating systems, devices, IP addresses, cookie IDs) and web-specific events (page views, page pings, transactions). All of these fields can be found in the atomic.events table, which is a "fat" (many columns) table.

As Snowplow has evolved into a general purpose event analytics platform, we've enabled Snowplow users to define additional event types (we call these self describing events) and define their own entities (we call these custom entities) so that they can extend the schema to suit their own businesses.

For Snowplow users running Amazon Redshift, each type of self-describing event and each type of entity will be stored in their own dedicated tables. These additional tables can be joined back to the core atomic.events table, by joining on the root_id field in the self-describing event / entity table with the event_id in the atomic.events table, and the root_tstamp and collector_tstamp field in the respective tables. For users on other warehouses these will be additional columns in the atomic.events table.

Single tableโ€‹

All the events are effectively stored in a single table, making running queries across the data very easy. Even if you're running Snowplow with Redshift and have extended the schema as described above, you can still query the data as if it were in a single fat table. This is because:

  • The joins from the additional tables to the core atomic.events table are one-to-one.
  • The field joined on is the distribution key for both tables, so queries are as fast as if the data were in a single table.

Immutable logโ€‹

The Snowplow data table is designed to be immutable: the data in each line should not change over time. Data points that we would expect to change over time (e.g. what cohort a particular user belongs to, how we classify a particular visitor) can be derived from Snowplow data. However, our recommendation is that these derived fields should be defined and calculated at analysis time, stored in a separate table and joined to the Snowplow events table when performing any analysis.

Canonical event modelโ€‹

The sections below go over the standard fields found in all Snowplow events. We provide a Source for each field, however sometimes a field may have multiple sources e.g. a value originally set by a tracker but is overwritten by a later enrichment; in this case we have tended to classify this as the earliest source.

Common fields (platform and event independent)โ€‹

Application fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
app_idtextApplication IDYes'angry-birds'Tracking
platformtextPlatform, limited to specific valuesYes'web'Tracking

The application ID is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.

The platform ID is used to distinguish the same app running on different platforms, e.g. iOS vs web.

Date / time fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
collector_tstamptimestampTimestamp for the event recorded by the collectorYes'2013-11-26 00:02:05'Pipeline
dvce_created_tstamptimestampTimestamp for the event recorded on the client deviceNo'2013-11-26 00:03:57.885'Tracking
dvce_sent_tstamptimestampWhen the event was actually sent by the client deviceNo'2013-11-26 00:03:58.032'Tracking
etl_tstamptimestampTimestamp for when the event was validated and enriched. Note: the name is historical and does not mean that the event is loaded at this point (this is further downstream).No'2017-01-26 00:01:25.292'Pipeline
os_timezone (not available in BDP Cloud)textClient operating system timezoneNo'Europe/London'Tracking Timezone Plugin
derived_tstamptimestampTimestamp making allowance for inaccurate device clockNo'2013-11-26 00:02:04'Default Enrichment
true_tstamptimestampUser-set "true timestamp" for the eventNo'2013-11-26 00:02:04'Tracking
load_tstamptimestampTimestamp for when the data was loaded into the warehouse, best choice for incremental processingNo'2013-11-26 00:02:04'Pipeline

Event / transaction fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
eventtextThe type of event recordedYes'page_view'Tracking
event_idtextA UUID for each eventYes'c6ef3124-b53a-4b13-a233-0088f79dcbcb'Tracking (or enrichment if empty)
txn_idintTransaction ID set client-side, used to de-dupe recordsNo421828Tracking (Deprecated)
event_fingerprinttextHash client-set event fields, used to de-dupe recordsNoAADCE520E20C2899F4CED228A79A3083Event Fingerprint Enrichment

A complete list of event types is given here.

Snowplow version fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
v_trackertextTracker versionYes'js-3.0.0'Tracking
v_collectortextCollector versionYes'ssc-2.1.0-kinesis'Pipeline
v_etltextETL versionYes'snowplow-micro-1.1.0-common-1.4.2'Default Enrichment
name_trackertextTracker namespaceNo'sp1'Tracking
etl_tags (not available in BDP Cloud)textJSON of tags for this ETL runNo"['prod']"Deprecated

Some Snowplow Trackers allow the user to name each specific Tracker instance. name_tracker corresponds to this name, and can be used to distinguish which tracker generated which events.

FieldTypeDescriptionReqd?ExampleSource
user_idtextUnique ID set by businessNo'c94f860b-1266-4dad-ae57-3a36a414a521'Tracking
domain_useridtextUser ID set by Snowplow using 1st party client-set cookieNo'4b0dfa75-9a8c-46a1-9691-01add9db4200'Tracking
network_useridtextUser ID set by Snowplow using server-set cookie, which may be 1st or 3rd party, depending on collector configuration.No'ecdff4d0-9175-40ac-a8bb-325c49733607'Tracking or Pipeline
user_ipaddresstextUser IP address, can be overwritten with the IP Anonymization EnrichmentNo'92.231.54.234'Tracking or Pipeline
domain_sessionidxintA visit / session indexNo3Tracking
domain_sessionidtextA visit / session identifierNo'c6ef3124-b53a-4b13-a233-0088f79dcbcb'Tracking

domain_sessionidx is the number of the current user session. For example, an event occurring during a user's first session would have domain_sessionidx set to 1. The JavaScript Tracker calculates this field by storing a visit count in a first-party cookie. Whenever the Tracker fires an event, if more than 30 minutes have elapsed since the last event, the visitor count is increased by 1. (Whenever an event is fired, a "session cookie" is created and set to expire in 30 minutes. This is how the Tracker can tell whether the visit count should be incremented.) Thirty minutes is the default value and can be changed using the sessionCookieTimeout configuration option in the tracker.

Device and operating system fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
useragenttextRaw useragentNoTracking or Pipeline
dvce_type (not available in BDP Cloud)textType of deviceNo'Computer'Deprecated
dvce_ismobile (not available in BDP Cloud)booleanIs the device mobile?No1Deprecated
dvce_screenheightintScreen height in pixelsNo1024Tracking
dvce_screenwidthintScreen width in pixelsNo1900Tracking
os_name (not available in BDP Cloud)textName of operating systemNo'Android'Deprecated
os_family (not available in BDP Cloud)textOperating system familyNo'Linux'Deprecated
os_manufacturer (not available in BDP Cloud)textCompany responsible for OSNo'Apple'Deprecated

Location fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
geo_countrytextISO 3166-1 code for the country the visitor is located inNo'GB', 'US'IP Enrichment
geo_regiontextISO-3166-2 code for country region the visitor is inNo'I9', 'TX'IP Enrichment
geo_citytextCity the visitor is inNo'New York', 'London'IP Enrichment
geo_zipcodetextPostcode the visitor is inNo'94109'IP Enrichment
geo_latitudetextVisitor location latitudeNo37.443604IP Enrichment
geo_longitudetextVisitor location longitudeNo-122.4124IP Enrichment
geo_region_nametextVisitor region nameNo'Florida'IP Enrichment
geo_timezonetextVisitor timezone nameNo'Europe/London'IP Enrichment

IP address-based fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
ip_isptextVisitor's ISPNo'FDN Communications'IP Enrichment
ip_organizationtextOrganization associated with the visitor's IP address - defaults to ISP name if none is foundNo'Bouygues Telecom'IP Enrichment
ip_domaintextSecond level domain name associated with the visitor's IP addressNo'nuvox.net'IP Enrichment
ip_netspeedtextVisitor's connection typeNo'Cable/DSL'IP Enrichment

Metadata fieldsโ€‹

Fields containing information about the event type.

FieldTypeDescriptionReqd?ExampleSource
event_vendortextWho defined the eventYes'com.acme'Default Enrich (event schema)
event_nametextEvent nameYes'link_click'Default Enrich (event schema)
event_formattextFormat for eventYes'jsonschema'Default Enrich (event schema)
event_versiontextVersion of event schemaYes'1-0-2'Default Enrich (event schema)

Platform-specific fieldsโ€‹

Web-specific fieldsโ€‹

FieldTypeDescriptionReqd?ExampleSource
Page fields
page_urltextThe page URLNo'http://www.example.com'Tracking
page_urlschemetextScheme aka protocolNo'https'Default Enrichment (url)
page_urlhosttextHost aka domainNo'โ€œwww.snowplowanalytics.com'Default Enrichment (url)
page_urlportintPort if specified, scheme dependent if not (443 for https, 80 for http)No80Default Enrichment (url)
page_urlpathtextPath to pageNo'/product/index.html'Default Enrichment (url)
page_urlquerytextQuerystringNo'id=GTM-DLRG'Default Enrichment (url)
page_urlfragmenttextFragment aka anchorNo'4-conclusion'Default Enrichment (url)
page_referrertextURL of the referrerNo'http://www.referrer.com'Tracking
page_titletextWeb page titleNo'Snowplow Docs - Understanding the structure of Snowplow data'Tracking
refr_urlschemetextReferrer schemeNo'http'Default Enrichment (referer)
refr_urlhosttextReferrer hostNo'www.bing.com'Default Enrichment (referer)
refr_urlportintReferrer portNo80Default Enrichment (referer)
refr_urlpathtextReferrer page pathNo'/images/search'Default Enrichment (referer)
refr_urlquerytextReferrer URL querystringNo'q=psychic+oracle+cards'Default Enrichment (referer)
refr_urlfragmenttextReferrer URL fragmentNoDefault Enrichment (referer)
refr_mediumtextType of referrerNo'search', 'internal'Referrer Parser Enrichment
refr_sourcetextName of referrer if recognizedNo'Bing images'Referrer Parser Enrichment
refr_termtextKeywords if source is a search engineNo'psychic oracle cards'Referrer Parser Enrichment
refr_domain_useridtextThe Snowplow domain_userid of the referring websiteNo'bc2e92ec6c204a14'Default Enrichment (cross_domain)
refr_dvce_tstamptimestampThe time of attaching the domain_userid to the inbound linkNo'2013-11-26 00:02:05'Default Enrichment (cross_domain)
Document fields
doc_charsettextThe pageโ€™s character encodingNo, 'UTF-8'Tracking
doc_widthintThe page's width in pixelsNo1024Tracking
doc_heightintThe page's height in pixelsNo3000Tracking
Marketing / traffic source fields
mkt_mediumtextType of traffic sourceNo'cpc', 'affiliate', 'organic', 'social'Campaign Attribution Enrichment
mkt_sourcetextThe company / website where the traffic came fromNo'Google', 'Facebook'Campaign Attribution Enrichment
mkt_termtextAny keywords associated with the referrerNo'new age tarot decks'Campaign Attribution Enrichment
mkt_contenttextThe content of the ad. (Or an ID so that it can be looked up.)No13894723Campaign Attribution Enrichment
mkt_campaigntextThe campaign IDNo'diageo-123'Campaign Attribution Enrichment
mkt_clickidtextThe click IDNo'ac3d8e459'Campaign Attribution Enrichment
mkt_networktextThe ad network to which the click ID belongsNo'DoubleClick'Campaign Attribution Enrichment
Browser fields
user_fingerprint (not available in BDP Cloud)intA user fingerprint generated by looking at the individual browser featuresNo2161814971Tracking (Deprecated)
br_name (not available in BDP Cloud)textBrowser nameNo'Firefox 12'Deprecated
br_version (not available in BDP Cloud)textBrowser versionNo'12.0'Deprecated
br_family (not available in BDP Cloud)textBrowser familyNo'Firefox'Deprecated
br_type (not available in BDP Cloud)textBrowser typeNo'Browser'Deprecated
br_renderengine (not available in BDP Cloud)textBrowser rendering engineNo'GECKO'Deprecated
br_langtextLanguage the browser is set toNo'en-GB'Tracking
br_features_pdf (not available in BDP Cloud)booleanWhether the browser recognizes PDFsNo1Tracking (browser_features plugin)
br_features_flash (not available in BDP Cloud)booleanWhether Flash is installedNo1Tracking (browser_features plugin)
br_features_java (not available in BDP Cloud)booleanWhether Java is installedNo1Tracking (browser_features plugin)
br_features_director (not available in BDP Cloud)booleanWhether Adobe Shockwave is installedNo1Tracking (browser_features plugin)
br_features_quicktime (not available in BDP Cloud)booleanWhether QuickTime is installedNo1Tracking (browser_features plugin)
br_features_realplayer (not available in BDP Cloud)booleanWhether RealPlayer is installedNo1Tracking (browser_features plugin)
br_features_windowsmedia (not available in BDP Cloud)booleanWhether mplayer2 is installedNo1Tracking (browser_features plugin)
br_features_gears (not available in BDP Cloud)booleanWhether Google Gears is installedNo1Tracking (browser_features plugin)
br_features_silverlight (not available in BDP Cloud)booleanWhether Microsoft Silverlight is installedNo1Tracking (browser_features plugin)
br_cookiesbooleanWhether cookies are enabledNo1Tracking
br_colordepthintBit depth of the browser color paletteNo24Tracking
br_viewheightintViewport heightNo1000Tracking
br_viewwidthintViewport widthNo1000Tracking

Event-specific fieldsโ€‹

Snowplow includes specific fields to capture data associated with specific events.

Note that to date, all event types have been defined by Snowplow. Also note that event_vendor values follow the Java package naming convention.

Snowplow currently supports the following event types:

Event typeValue of event field in model
Page views'page_view'
Page pings'page_ping'
E-commerce transactions'transaction' and 'transaction_item'
Structured events'struct'
Self-describing events'unstruct' (for legacy reasons)

Details of which fields are available for which events are given below. In some cases these events will store values in the atomic.events columns, which are listed below.

Page viewsโ€‹

There are currently no fields that are specific to page_view events: all the fields that are required are part of the standard fields available for any web-based event e.g. page_urlscheme, page_title.

Page pingsโ€‹

There are four additional fields included with page pings that indicate how a user has scrolled over a web page since the last page ping:

FieldTypeDescriptionReqd?Example
pp_xoffset_minintegerMinimum page x offset seen in the last ping periodNo10
pp_xoffset_maxintegerMaximum page x offset seen in the last ping periodNo100
pp_yoffset_minintegerMinimum page y offset seen in the last ping periodNo5
pp_yoffset_maxintegerMaximum page y offset seen in the last ping periodNo200

E-commerce transactionsโ€‹

There are a large number of fields specifically for transaction events.

Fields that start tr_ relate to the transaction as a whole. Fields that start ti_ refer to the specific item included in the transaction. (E.g. a product in the basket.) Single transactions typically span multiple lines of data: there will be a single line where event = transaction, where the tr_ fields are set, and multiple lines (one for each product included) where event = transaction_item and the ti_ fields are set.

FieldTypeDescriptionReqd?Example
tr_orderidtextOrder IDYes'#134'
tr_affiliationtextTransaction affiliation (e.g. store where sale took place)No'web'
tr_totaldecimalTotal transaction valueYes12.99
tr_taxdecimalTotal tax included in transaction valueNo3.00
tr_shippingdecimalDelivery cost chargedNo0.00
tr_total_base*decimalTotal in base currencyNo12.99
tr_tax_base*decimalTotal tax in base currencyNo3.00
tr_shipping_base*decimalDelivery cost in base currencyNo0.00
tr_citytextDelivery address, cityNo'London'
tr_statetextDelivery address, stateNo'Washington'
tr_countrytextDelivery address, countryNo'France'
tr_currencytextCurrencyNo'USD'
ti_orderidtextOrder IDYes'#134'
ti_skutextProduct SKUYes'pbz00123'
ti_nametextProduct nameNo'Cone pendulum'
ti_categorytextProduct categoryNo'New Age'
ti_pricedecimalProduct unit priceYes9.99
ti_price_base*decimalPrice in base currencyNo9.99
ti_quantityintegerNumber of product in transactionYes2
ti_currencytextCurrencyNo'EUR'
base_currency*textReporting currencyNo'GBP'

* Set exclusively by the Currency conversion enrichment.

Structured eventsโ€‹

Structured events allow you to send your own custom data, as long as it fits in the following 5 fields:

FieldTypeDescriptionReqd?Example
se_categorytextCategory of eventYes*'ecomm', 'video'
se_actiontextAction performed / event nameYes*'add-to-basket', 'play-video'
se_labeltextThe object of the action e.g. the ID of the video played or SKU of the product added-to-basketNo'pbz00123'
se_propertytextA property associated with the object of the actionNo'HD', 'large'
se_valuedecimalA value associated with the event / action e.g. the value of goods added-to-basketNo9.99

* These fields are only required for struct events.

Self-describing eventsโ€‹

Self-describing events can contain their own set of fields, defined by their schema.

For each type of self-describing event, there will be a dedicated column (or table, in case of Redshift and Postgres) that holds the event-specific fields.

See querying data for more details on the structure and how to query it in different warehouses. You might also want to check how schema definitions translate to the warehouse.

Entitiesโ€‹

Entities (also known as contexts) provide extra information about the event, such as data describing a product or a user.

For each type of entity, there will be a dedicated column (or table, in case of Redshift and Postgres) that holds entity-specific fields. Note that an event can have any number of entities attached, including multiple entities of the same type. For this reason, the data inside the entity columns is an array.

See querying data for more details on the structure and how to query it in different warehouses. You might also want to check how schema definitions translate to the warehouse.

Out-of-the-box self-describing events and entitiesโ€‹

These are also a variety of self-describing events and entities defined by Snowplow. You can find their schemas here.

Was this page helpful?