Skip to main content

Transformer Kinesis configuration reference

The configuration reference in this page is written for Transformer Kinesis 5.7.3

An example of the minimal required config for the Transformer Kinesis can be found here and a more detailed one here.

ParameterDescription
input.appNameOptional. KCL app name. The default is snowplow-rdb-transformer
input.streamNameRequired for kinesis. Enriched Kinesis stream name.
input.regionAWS region of the Kinesis stream. Optional if it can be resolved with AWS region provider chain.
input.positionOptional. Kinesis position: LATEST or TRIM_HORIZON. The default is LATEST.
windowingOptional. Frequency to emit shredding complete message. The default is 10 minutes.
output.pathRequired. S3 URI of the transformed output.
output.compressionOptional. One of NONE or GZIP. The default is GZIP.
output.regionAWS region of the S3 bucket. Optional if it can be resolved with AWS region provider chain.
output.maxRecordsPerFile (since 5.4.0) Optional. Default = 10000. Max number of events per parquet partition.
output.bad.type (since 5.4.0) Optional. Either kinesis or file, default value file. Type of bad output sink. When file, badrows are written as files under URI configured in output.path.
output.bad.streamName (since 5.4.0) Required if output type is kinesis. Name of the Kinesis stream to write to.
output.bad.region (since 5.4.0) AWS region of the Kinesis stream. Optional if it can be resolved with AWS region provider chain.
output.bad.recordLimit (since 5.4.0) Optional, default = 500. Limits the number of events in a single PutRecords Kinesis request.
output.bad.byteLimit (since 5.4.0) Optional, default = 5242880. Limits the number of bytes in a single PutRecords Kinesis request.
output.bad.backoffPolicy.minBackoff (since 5.4.0) Optional, default = 100 milliseconds. Minimum backoff before retrying when writing to Kinesis fails with internal errors.
output.bad.backoffPolicy.maxBackoff (since 5.4.0) Optional, default = 10 seconds. Maximum backoff before retrying when writing to Kinesis fails with internal errors.
output.bad.backoffPolicy.maxRetries (since 5.4.0) Optional, default = 10. Maximum number of retries for internal Kinesis errors.
output.bad.throttledBackoffPolicy.minBackoff (since 5.4.0) Optional, default = 100 milliseconds. Minimum backoff before retrying when writing to Kinesis fails in case of throughput exceeded.
output.bad.throttledBackoffPolicy.maxBackoff (since 5.4.0) Optional, default = 10 seconds. Maximum backoff before retrying when writing to Kinesis fails in case of throughput exceeded. Writing is retried forever.
queue.typeRequired. Type of the message queue. Can be either sqs or sns.
queue.queueNameRequired if queue type is sqs. Name of the SQS queue. SQS queue needs to be FIFO.
queue.topicArnRequired if queue type is sns. ARN of the SNS topic.
queue.regionAWS region of the SQS queue or SNS topic. Optional if it can be resolved with AWS region provider chain.
formats.*Schema-specific format settings.
formats.transformationTypeRequired. Type of transformation, either `shred` or `widerow`. See Shredded data and Wide row format.
formats.fileFormatOptional. The default is JSON. Output file format produced when transformation is widerow. Either JSON or PARQUET.
formats.defaultOptional. The default is TSV. Data format produced by default when transformation is shred. Either TSV or JSON. TSV is recommended as it enables table autocreation, but requires an Iglu Server to be available with known schemas (including Snowplow schemas). JSON does not require an Iglu Server, but requires Redshift JSONPaths to be configured and does not support table autocreation.
formats.tsvOptional. List of Iglu URIs, but can be set to empty list [] which is the default. If default is set to JSON this list of schemas will still be shredded into TSV.
formats.jsonOptional. List of Iglu URIs, but can be set to empty list [] which is the default. If default is set to TSV this list of schemas will still be shredded into JSON.
formats.skipOptional. List of Iglu URIs, but can be set to empty list [] which is the default. Schemas for which loading can be skipped.
monitoring.metrics.*Send metrics to a StatsD server or stdout.
monitoring.metrics.statsd.*Optional. For sending metrics (good and bad event counts) to a StatsD server.
monitoring.metrics.statsd.hostnameRequired if monitoring.metrics.statsd section is configured. The host name of the StatsD server.
monitoring.metrics.statsd.portRequired if monitoring.metrics.statsd section is configured. Port of the StatsD server.
monitoring.metrics.statsd.tagsOptional. Tags which are used to annotate the StatsD metric with any contextual information.
monitoring.metrics.statsd.prefixOptional. Configures the prefix of StatsD metric names. The default is snoplow.transformer.
monitoring.metrics.stdout.*Optional. For sending metrics to stdout.
monitoring.metrics.stdout.prefixOptional. Overrides the default metric prefix.
telemetry.disableOptional. Set to true to disable telemetry.
telemetry.userProvidedIdOptional. See here for more information.
monitoring.sentry.dsnOptional. For tracking runtime exceptions.
featureFlags.enableMaxRecordsPerFile (since 5.4.0) Optional, default = true. When enabled, output.maxRecordsPerFile configuration parameter is going to be used.
validations.*Optional. Criteria to validate events against
validations.minimumTimestampThis is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg 2021-11-18T11:00:00.00Z.
featureFlags.*Optional. Enable features that are still in beta, or which aim to enable smoother upgrades.
featureFlags.legacyMessageFormatThis currently the only feature flag. Setting this to true allows you to use a new version of the transformer with an older version of the loader.
featureFlags.truncateAtomicFields (since 5.4.0) Optional, default false. When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation.
Was this page helpful?