Transformer Kafka configuration reference
The configuration reference in this page is written for Transformer Kafka 5.7.3
An example of the minimal required config for the Transformer Kafka can be found here and a more detailed one here.
Parameter | Description |
---|---|
input.topicName | Name of the Kafka topic to read from |
input.bootstrapServers | A list of host:port pairs to use for establishing the initial connection to the Kafka cluster |
input.consumerConf | Optional. Kafka consumer configuration. See https://kafka.apache.org/documentation/#consumerconfigs for all properties |
output.path | Azure Blob Storage path to transformer output |
output.compression | Optional. One of NONE or GZIP . The default is GZIP . |
output.bad.type | Optional. Either kafka or file , default value file . Type of bad output sink. When file , badrows are written as files under URI configured in output.path . |
output.bad.topicName | Required if output type is kafka . Name of the Kafka topic that will receive the bad data. |
output.bad.bootstrapServers | Required if output type is kafka . A list of host:port pairs to use for establishing the initial connection to the Kafka cluster |
output.producerConf | Optional. Kafka producer configuration. See https://kafka.apache.org/documentation/#producerconfigs for all properties |
queue.topicName | Name of the Kafka topic used to communicate with Loader |
queue.bootstrapServers | A list of host:port pairs to use for establishing the initial connection to the Kafka cluster |
queue.producerConf | Optional. Kafka producer configuration. See https://kafka.apache.org/documentation/#producerconfigs for all properties |
monitoring.metrics.* | Send metrics to a StatsD server or stdout. |
monitoring.metrics.statsd.* | Optional. For sending metrics (good and bad event counts) to a StatsD server. |
monitoring.metrics.statsd.hostname | Required if monitoring.metrics.statsd section is configured. The host name of the StatsD server. |
monitoring.metrics.statsd.port | Required if monitoring.metrics.statsd section is configured. Port of the StatsD server. |
monitoring.metrics.statsd.tags | Optional. Tags which are used to annotate the StatsD metric with any contextual information. |
monitoring.metrics.statsd.prefix | Optional. Configures the prefix of StatsD metric names. The default is snoplow.transformer . |
monitoring.metrics.stdout.* | Optional. For sending metrics to stdout. |
monitoring.metrics.stdout.prefix | Optional. Overrides the default metric prefix. |
telemetry.disable | Optional. Set to true to disable telemetry. |
telemetry.userProvidedId | Optional. See here for more information. |
monitoring.sentry.dsn | Optional. For tracking runtime exceptions. |
featureFlags.enableMaxRecordsPerFile (since 5.4.0) | Optional, default = true. When enabled, output.maxRecordsPerFile configuration parameter is going to be used. |
validations.* | Optional. Criteria to validate events against |
validations.minimumTimestamp | This is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg 2021-11-18T11:00:00.00Z . |
featureFlags.* | Optional. Enable features that are still in beta, or which aim to enable smoother upgrades. |
featureFlags.legacyMessageFormat | This currently the only feature flag. Setting this to true allows you to use a new version of the transformer with an older version of the loader. |
featureFlags.truncateAtomicFields (since 5.4.0) | Optional, default false . When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation. |