Understanding the data loading process

The data loading process is engineered for large volumes of data. In addition, for each data warehouse, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the database types for self-describing events and entities according to their schemas.

Redshift
BigQuery
Databricks
Snowflake

We load data into Redshift using the RDB Loader.

AWS (Batching, recommended)
AWS (Micro-batching)

At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Redshift, stores them in an S3 bucket and instructs Redshift to load them.

RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Redshift.