Skip to main content

Model Selection

YAML Selectorsโ€‹

The Snowplow models in each package are designed to be run as a whole, which ensures all incremental tables are kept in sync. As such, run the model using:

dbt run --select snowplow_<package> tag:snowplow_<package>_incremental

The snowplow_<package> selection will execute all nodes within the relevant Snowplow package, while the tag:snowplow_<package>_incremental will execute all custom modules that you may have created.

Given the verbose nature of this command we suggest using the YAML selectors we have provided. The equivalent command using the selector flag would be:

dbt run --selector snowplow_<package>

Within the packages we have provided a suite of suggested selectors to run and test the models within the packages. This leverages dbt's selector flag.

  • snowplow_web: Recommended way to run the package. This selection includes all models within the Snowplow Web as well as any custom models you have created
  • snowplow_web_lean_tests: Recommended way to test the models within the package. See the testing section for more details

These are defined in each selectors.yml file within the packages, however in order to use these selections you will need to copy this file into your own dbt project directory. This is a top-level file and therefore should sit alongside your dbt_project.yml file. If you are using multiple packages in your project you will need to combine the contents of these into a single file.

Specific Model Selectionโ€‹

You may wish to run the modules asynchronously, for instance run the page views module hourly but the sessions and users modules daily. You would assume this could be achieved using e.g.:

dbt run --select +snowplow_web.page_views

Currently however it is not possible during a dbt job's start phase to deduce exactly what models are due to be executed from such a command. This means the package is unable to select the subset of models from the manifest. Instead all models from the standard and custom modules are selected from the manifest and the package will attempt to synchronize all models. This makes the above command unsuitable for asynchronous runs.

However we can leverage dbt's ls command in conjunction with shell substitution to explicitly state what models to run, allowing a subset of models to be selected from the manifest and thus run independently.

For example to run just the page views module asynchronously:

dbt run --select +snowplow_web.page_views --vars "{'models_to_run': '$(dbt ls --m  +snowplow_web.page_views --output name)'}"
Was this page helpful?