Web Quickstart

Unleash the power of your behavioral data

If you're looking for a more guided approach that contains information about tracking and modeling your data, check out our Advanced Analytics for Web Accelerator!
👉 Take me there! 👈

Requirements

In addition to dbt being installed and a web events dataset being available in your database:

Snowplow Javascript tracker version 2 or later implemented.
Web Page context enabled (enabled by default in v3+).
Page view events implemented.
From version v0.13.0 onwards you must be using RDB Loader v4.0.0 and above, or BigQuery Loader v1.0.0 and above. If you are not using these versions, or are using the Postgres loader, you will need to set snowplow__enable_load_tstamp to false in your dbt_project.yml and will not be able to use the consent models.

In addition to the standard privileges required by dbt, our packages by default write to additional schemas beyond just your profile schema. If your connected user does not have create schema privileges, you will need to ensure that the following schemas exist in your warehouse and the user can create tables in them:

<profile_schema>_derived
<profile_schema>_scratch
<profile_schema>_snowplow_manifest

Alternatively, you can override the output schemas our models write to, see the relevant package configuration page for how to do this.

Snowflake
BigQuery
Databricks
Redshift
Postgres

  grant create schema on database <database_name> to role <role_name>;

   --alternatively
   create schema <profile_schema>_derived;
   create schema <profile_schema>_scratch;
   create schema <profile_schema>_manifest;
   grant usage on schema <profile_schema>_derived to role <role_name>;
   grant usage on schema <profile_schema>_scratch to role <role_name>;
   grant usage on schema <profile_schema>_manifest to role <role_name>;

For more information, please refer to the Official Guide on setting up permissions.

Please refer to the Official Guide on setting up permissions.

   -- user with "use catalog" privilege on the catalog
  grant create schema on catalog <catalog_name> to <principal_name>

   --alternatively
   create schema <profile_schema>_derived;
   create schema <profile_schema>_scratch;
   create schema <profile_schema>_manifest;
   grant usage on schema <profile_schema>_derived to <user_name>;
   grant usage on schema <profile_schema>_scratch to <user_name>;
   grant usage on schema <profile_schema>_manifest to <user_name>;

For more options (e.g.: granting to service principal, or group instead of users), please refer to the Official Guide on setting up permissions.

   -- someone with superuser access
   create schema authorization <user_name>;

   --alternatively
   create schema <profile_schema>_derived;
   create schema <profile_schema>_scratch;
   create schema <profile_schema>_manifest;
   grant usage on schema <profile_schema>_derived to <user_name>;
   grant usage on schema <profile_schema>_scratch to <user_name>;
   grant usage on schema <profile_schema>_manifest to <user_name>;

For more options (e.g.: granting to role, or group instead of users), please refer to the Official Guide on setting up permissions.

   -- someone with superuser access
   create schema authorization <user_name>;

   --alternatively
   create schema <profile_schema>_derived;
   create schema <profile_schema>_scratch;
   create schema <profile_schema>_manifest;
   grant usage on schema <profile_schema>_derived to <user_name>;
   grant usage on schema <profile_schema>_scratch to <user_name>;
   grant usage on schema <profile_schema>_manifest to <user_name>;

For more information, please refer to the Official Guide on setting up permissions.

Installation

Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.

note

Make sure to run the dbt deps command after updating your packages.yml to ensure you have the specified version of each package installed in your project.

Setup

1. Override the dispatch order in your project

To take advantage of the optimized upsert that the Snowplow packages offer you need to ensure that certain macros are called from snowplow_utils first before dbt-core. This can be achieved by adding the following to the top level of your dbt_project.yml file:

dbt_project.yml
dispatch:
  - macro_namespace: dbt
    search_order: ['snowplow_utils', 'dbt']

If you do not do this the package will still work, but the incremental upserts will become more costly over time.

2. Adding the `selectors.yml` file

Within the packages we have provided a suite of suggested selectors to run and test the models within the package together with the web model. This leverages dbt's selector flag. You can find out more about each selector in the YAML Selectors section.

These are defined in the selectors.yml file (source) within the package, however in order to use these selections you will need to copy this file into your own dbt project directory. This is a top-level file and therefore should sit alongside your dbt_project.yml file. If you are using multiple packages in your project you will need to combine the contents of these into a single file.

3. Check source data

This package will by default assume your Snowplow events data is contained in the atomic schema of your target.database. In order to change this, please add the following to your dbt_project.yml file:

dbt_project.yml
vars:
  snowplow_web:
    snowplow__atomic_schema: schema_with_snowplow_events
    snowplow__database: database_with_snowplow_events

Databricks only

Please note that your target.database is NULL if using Databricks. In Databricks, schemas and databases are used interchangeably and in the dbt implementation of Databricks therefore we always use the schema value, so adjust your snowplow__atomic_schema value if you need to.

4. Enabled desired contexts

The web package has the option to join in data from the following 3 Snowplow enrichments:

By default these are all disabled in the web package. Assuming you have the enrichments turned on in your Snowplow pipeline, to enable the contexts within the package please add the following to your dbt_project.yml file:

dbt_project.yml
vars:
  snowplow_web:
    snowplow__enable_iab: true
    snowplow__enable_ua: true
    snowplow__enable_yauaa: true

5. Filter your data set

You can specify both start_date at which to start processing events and the app_id's to filter for. By default the start_date is set to 2020-01-01 and all app_id's are selected. To change this please add the following to your dbt_project.yml file:

dbt_project.yml
vars:
  snowplow_web:
    snowplow__start_date: 'yyyy-mm-dd'
    snowplow__app_id: ['my_app_1','my_app_2']

6. Verify page ping variables

The web package processes page ping events to calculate web page engagement times. If your tracker configuration for min_visit_length (default 5) and heartbeat (default 10) differs from the defaults provided in this package, you can override by adding to your dbt_project.yml:

dbt_project.yml
vars:
  snowplow_web:
    snowplow__min_visit_length: 5 # Default value
    snowplow__heartbeat: 10 # Default value

7. Additional vendor specific configuration

BigQuery Only

Verify which column your events table is partitioned on. It will likely be partitioned on collector_tstamp or derived_tstamp. If it is partitioned on collector_tstamp you should set snowplow__derived_tstamp_partitioned to false. This will ensure only the collector_tstamp column is used for partition pruning when querying the events table:

dbt_project.yml
vars:
  snowplow_mobile:
    snowplow__derived_tstamp_partitioned: false

Databricks only - setting the databricks_catalog

Add the following variable to your dbt project's dbt_project.yml file

dbt_project.yml
vars:
  snowplow_web:
    snowplow__databricks_catalog: 'hive_metastore'

Depending on the use case it should either be the catalog (for Unity Catalog users from databricks connector 1.1.1 onwards, defaulted to 'hive_metastore') or the same value as your snowplow__atomic_schema (unless changed it should be 'atomic'). This is needed to handle the database property within models/base/src_base.yml.

A more detailed explanation for how to set up your Databricks configuration properly can be found in Unity Catalog support.

8. Run your model

You can now run your models for the first time by running the below command (see the operation page for more information on operation of the package). As this package contains some seed files, you will need to seed these first

dbt seed --select snowplow_web --full-refresh
dbt run --selector snowplow_web

9. Enable extras

The package comes with additional modules and functionality that you can enable, for more information see the consent tracking, conversions, and core web vitals documentation.

tip

For some common analytical queries to run on the derived web data, take a look at our page here!

Requirements​

Installation​

Setup​

1. Override the dispatch order in your project​

2. Adding the selectors.yml file​

3. Check source data​

4. Enabled desired contexts​

5. Filter your data set​

6. Verify page ping variables​

7. Additional vendor specific configuration​

8. Run your model​

9. Enable extras​