Skip to main content

E-commerce

Package Configuration Variablesโ€‹

This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your dbt_project.yml file.

caution

All variables in Snowplow packages start with snowplow__ but we have removed these in the below table for brevity.

Warehouse and trackerโ€‹

Variable NameDescriptionDefault
atomic_schemaThe schema (dataset for BigQuery) that contains your atomic events table.atomic
categories_separatorThe separator used to split out your subcategories from your main subcategory. If for example your category field is filled as follows: books/fiction/magical-fiction then you should specify '/' as the separator in order for the subcolumns to be properly parsed.'/'
databaseThe database that contains your atomic events table.target.database
dev_target_nameThe target name of your development environment as defined in your profiles.yml file. See the 'Manifest Tables' section for more details.dev
eventsThis is used internally by the packages to reference your events table based on other variable values and should not be changed.events
number_category_levelsThe maximum number of levels (depth) of subcategories that exist on your website for products. These subcategories are recorded in the category field of the product context, and should be separated using the separator which is defined below. For example, books/fiction/magical-fiction has a level of 3. The value is the number of columns that will be generated in the product tables created by this Snowplow dbt package. Please note that some products can have less than the maximum number of categories specified.4
number_checkout_stepsThe index of the checkout step which represents a completed transaction. This is required to enable working checkout funnel analysis, and has a default value of 4.4

Operation and logicโ€‹

Variable NameDescriptionDefault
allow_refreshUsed as the default value to return from the allow_refresh() macro. This macro determines whether the manifest tables can be refreshed or not, depending on your environment. See the Manifest Tables section for more details.false
backfill_limit_daysThe maximum numbers of days of new data to be processed since the latest event processed. Please refer to the incremental logic section for more details.30
days_late_allowedThe maximum allowed number of days between the event creation and it being sent to the collector. Exists to reduce lengthy table scans that can occur as a result of late arriving data.3
ecommerce_event_namesThe list of event names that the Snowplow e-commerce package will filter on when extracting events from your atomic events table. If you have included any custom e-commerce events, feel free to add their event name in this list to include them in your data models.['snowplow_ecommerce_action']
enable_mobile_eventsWhether to use the mobile contexts for mobile e-commerce events in the processing (based on the client session and screen view context).false
lookback_window_hoursThe number of hours to look before the latest event processed - to account for late arriving data, which comes out of order.6
max_session_daysThe maximum allowed session length in days. For a session exceeding this length, all events after this limit will stop being processed. Exists to reduce lengthy table scans that can occur due to long sessions which are usually a result of bots.3
session_lookback_daysNumber of days to limit scan on snowplow_ecommerce_base_sessions_lifecycle_manifest manifest. Exists to improve performance of model when we have a lot of sessions. Should be set to as large a number as practical.730
start_dateThe date to start processing events from in the package on first run or a full refresh, based on collector_tstamp.'2020-01-01'
upsert_lookback_daysNumber of days to look back over the incremental derived tables during the upsert. Where performance is not a concern, should be set to as long a value as possible. Having too short a period can result in duplicates. Please see the Snowplow Optimized Materialization section for more details.30
use_product_quantityWhether the product_quantity field in the product context should be used to sum up the total number of products in a transaction. If this value is set to false, then your number_products field in your transaction tables will instead be calculated by counting the number of product entities within the transaction i.e. treating each product as having a quantity of 1.false

Contexts, filters, and logsโ€‹

Variable NameDescriptionDefault
app_idA list of app_ids to filter the events table on for processing within the package.[ ] (no filter applied)
disable_ecommerce_cartsFlag to exclude the Snowplow E-commerce cart entity context in case this is disabled or not used in your tracking.false
disable_ecommerce_checkoutsFlag to exclude the Snowplow E-commerce checkout entity context in case this is disabled or not used in your tracking.false
disable_ecommerce_page_contextFlag to exclude the Snowplow E-commerce page entity context in case this is disabled or not used in your tracking.false
disable_ecommerce_productsFlag to exclude the Snowplow E-commerce product entity context in case this is disabled or not used in your tracking.false
disable_ecommerce_transactionsFlag to exclude the Snowplow E-commerce transaction entity context in case this is disabled or not used in your tracking.false
disable_ecommerce_user_contextFlag to exclude the Snowplow E-commerce user entity context in case this is disabled or not used in your tracking.false

Warehouse Specificโ€‹

Variable NameDescriptionDefault
databricks_catalogThe catalogue your atomic events table is in. Depending on the use case it should either be the catalog (for Unity Catalog users from databricks connector 1.1.1 onwards, defaulted to hive_metastore) or the same value as your snowplow__atomic_schema (unless changed it should be 'atomic').

Output Schemasโ€‹

By default all scratch/staging tables will be created in the <target.schema>_scratch schema, the derived tables, will be created in <target.schema>_derived and all manifest tables in <target.schema>_snowplow_manifest. Some of these schemas are only used by specific packages, ensure you add the correct configurations for each packages you are using. To change, please add the following to your dbt_project.yml file:

tip

If you want to use just your connection schema with no suffixes, set the +schema: values to null

models:
snowplow_ecommerce:
base:
manifest:
+schema: my_manifest_schema
scratch:
+schema: my_scratch_schema
carts:
+schema: my_derived_schema
scratch:
+schema: my_scratch_schema
checkouts:
+schema: my_derived_schema
scratch:
+schema: my_scratch_schema
products:
+schema: my_derived_schema
scratch:
+schema: my_scratch_schema
transactions:
+schema: my_derived_schema
scratch:
+schema: my_scratch_schema
users:
+schema: my_derived_schema
scratch:
+schema: my_scratch_schema

Config Generatorโ€‹

You can use the below inputs to generate the code that you need to place into your dbt_project.yml file to configure the package as you require. Any values not specified will use their default values from the package.

Warehouse and tracker
Schema (dataset) that contains your atomic events
Separator used to split out your subcategories from your main subcategory
Database that contains your atomic events
Target name of your development environment as defined in your `profiles.yml` file
The **maximum** number of levels (depth) of subcategories
Index of the checkout step which represents a completed transaction
Operation and Logic
The maximum numbers of days of new data to be processed since the latest event processed
The maximum allowed number of days between the event creation and it being sent to the collector
E-Commerce Event Names

> Click the plus sign to add a new entry
The number of hours to look before the latest event processed - to account for late arriving data, which comes out of order
The maximum allowed session length in days. For a session exceeding this length, all events after this limit will stop being processed
Number of days to limit scan on `snowplow_web_base_sessions_lifecycle_manifest` manifest
The date to start processing events from in the package on first run or a full refresh, based on `collector_tstamp`
Number of days to look back over the incremental derived tables during the upsert
Contexts, Filters, and Logs
App IDs

> Click the plus sign to add a new entry
Warehouse Specific
The catalogue your atomic events table is in

Project Variables:

vars:
snowplow_ecommerce: null
Was this page helpful?