SchemaVer
This page is adapted from the Snowplow Analytics blog post, Introducing SchemaVer for semantic versioning of schemas .
Overviewโ
With the advent of our new self-describing JSON Schemas, it became necessary to implement some kind of versioning to those JSON Schemas so they could evolve through time.
Our approach is based on semantic versioning (SemVer for short) which, as a reminder, looks like this: MAJOR.MINOR.PATCH
MAJORwhich you're supposed to use when you make backwards-incompatible API changesMINORwhen you add backwards-compatible functionalitiesPATCHwhen you make backwards-compatible bug fixes
As is, SemVer does not suit schema versioning well. Indeed, there is no such thing as bug fixes for a JSON Schema and the idea of an API doesn't really translate to JSON Schemas either.
That's why we decided to introduce our own schema versioning notion: SchemaVer.
SchemaVer is defined as follows: MODEL-REVISION-ADDITION
MODELwhen you make a breaking schema change which will prevent interaction with any historical dataREVISIONwhen you introduce a schema change which may prevent interaction with some historical dataADDITIONwhen you make a schema change that is compatible with all historical data
Addition exampleโ
By way of example, if we were to modify an existing JSON Schema representing an ad click with version 1-0-0 defined as follows:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"bannerId": {
"type": "string"
}
},
"required": ["bannerId"],
"additionalProperties": false
}
and introduce a new impressionId property to obtain the following JSON Schema:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"bannerId": {
"type": "string"
},
"impressionId": {
"type": "string"
}
},
"required": ["bannerId"],
"additionalProperties": false
}
Because the new impressionId is not a required property and because the additionalProperties in our 1-0-0 version was set to false, any historical data following the 1-0-0 schema will work with this new schema.
According to our definition of SchemaVer, we are consequently looking at an ADDITION and the schema's version becomes 1-0-1.
Revision exampleโ
If we continue with the same example, but modify the additionalProperties property to true to get the following schema:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"bannerId": {
"type": "string"
},
"impressionId": {
"type": "string"
}
},
"required": ["bannerId"],
"additionalProperties": true
}
We are now at version 1-0-2. After a while, we decide to add a new cost property:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"bannerId": {
"type": "string"
},
"impressionId": {
"type": "string"
},
"cost": {
"type": "number",
"minimum": 0
}
},
"required": ["bannerId"],
"additionalProperties": true
}
The problem now is that since we modified the additionalProperties to true before adding the cost field, someone might have added another cost field in the meantime following a different set of rules (for example it could be an amount followed by the currency such as 1.00$, the effective type would be string and not number) and so we cannot be sure that this new schema validate all historical data.
As a result, this new JSON Schema is a REVISION of the previous one, its version becomes 1-1-0.
Model exampleโ
Times goes by and we choose to completely review our JSON Schema identifying an ad click only through a clickId property so our schema becomes:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"clickId": {
"type": "string"
},
"cost": {
"type": "number",
"minimum": 0
}
},
"required": ["clickId"],
"additionalProperties": false
}
The change is so important that we cannot realistically expect our historical data to interact with this new JSON Schema, consequently, the MODEL is changed and the schema's version becomes 2-0-0.
Another important thing to notice is that we switched the additionalProperties back to false in order to avoid unnecessary future revisions.
Additional differencesโ
There are a few additional differences between our own SchemaVer and SemVer:
- we use hyphens instead of periods to separate the components that make our SchemaVer
- the versioning starts with
1-0-0instead of0.1.0
The design considerations behind those decisions can be found in the blog post on SchemaVer.