Model Selection
YAML Selectorsโ
The Snowplow models in each package are designed to be run as a whole, which ensures all incremental tables are kept in sync. As such, run the model using:
dbt run --select snowplow_<package> tag:snowplow_<package>_incremental
The snowplow_<package>
selection will execute all nodes within the relevant Snowplow package, while the tag:snowplow_<package>_incremental
will execute all custom modules that you may have created.
Given the verbose nature of this command we suggest using the YAML selectors we have provided. The equivalent command using the selector flag would be:
dbt run --selector snowplow_<package>
Within the packages we have provided a suite of suggested selectors to run and test the models within the packages. This leverages dbt's selector flag.
- Snowplow Web
- Snowplow Mobile
- Snowplow Media Player
- Snowplow Normalize
- Snowplow E-commerce
snowplow_web
: Recommended way to run the package. This selection includes all models within the Snowplow Web as well as any custom models you have createdsnowplow_web_lean_tests
: Recommended way to test the models within the package. See the testing section for more details
snowplow_mobile
: Recommended way to run the package. This selection includes all models within the Snowplow Mobile as well as any custom models you have createdsnowplow_mobile_lean_tests
: Recommended way to test the models within the package. See the testing section for more details
snowplow_web
: Recommended way to run the package. This selection includes all models within the Snowplow Web and Snowplow Media Player as well as any custom models you have createdsnowplow_web_lean_and_media_player_tests
: Recommended way to test the models within the package. See the testing section for more detailssnowplow_media_player_tests
: Runs all tests within the Snowplow Media Player Package and any custom models tagged withsnowplow_media_player
snowplow_web_and_media_player_tests
: Runs all tests within the Snowplow Web and Snowplow Media Player Package and any custom models tagged withsnowplow_media_player
orsnowplow_web_incremental
snowplow_normalize
: Recommended way to run the package. This selection includes all models within the Snowplow Normalize package as well as any custom models you have created
snowplow_ecommerce
: Recommended way to run the package. This selection includes all models within the Snowplow E-commerce as well as any custom models you have createdsnowplow_ecommerce_lean_tests
: Recommended way to test the models within the package. See the testing section for more details
These are defined in each selectors.yml
file within the packages, however in order to use these selections you will need to copy this file into your own dbt project directory. This is a top-level file and therefore should sit alongside your dbt_project.yml
file. If you are using multiple packages in your project you will need to combine the contents of these into a single file.
Specific Model Selectionโ
You may wish to run the modules asynchronously, for instance run the page views module hourly but the sessions and users modules daily. You would assume this could be achieved using e.g.:
dbt run --select +snowplow_web.page_views
Currently however it is not possible during a dbt job's start phase to deduce exactly what models are due to be executed from such a command. This means the package is unable to select the subset of models from the manifest. Instead all models from the standard and custom modules are selected from the manifest and the package will attempt to synchronize all models. This makes the above command unsuitable for asynchronous runs.
However we can leverage dbt's ls
command in conjunction with shell substitution to explicitly state what models to run, allowing a subset of models to be selected from the manifest and thus run independently.
For example to run just the page views module asynchronously:
dbt run --select +snowplow_web.page_views --vars "{'models_to_run': '$(dbt ls --m +snowplow_web.page_views --output name)'}"