geoffwilliams@home:~$

Confluent schema evolution in development and production

Schema Evolution: The official word

TLDR

  • Free-for-all when solo developing
  • Practice your schema evolution when working as part of a team
  • Defined process for schema evolution in production

Schema evolution In development

In a development environment you have a lot of flexibility in terms of what to do around schema evolution. If your working on your own and dont care about any of the data, my favorite technique (when I dont feel like recreating the entire Kafka cluster) is delete all schema versions using the REST API, eg:

curl -X DELETE http://localhost:8081/subjects/mytopic-value
  • Assumes Schema Registry running on localhost, eg via docker compose - adjust as needed
  • If using TopicNameStrategy the default schema subject (name) is the name of the topic with -value appended. Source
  • Deleting all schemas will allow you to register changes that would otherwise have been incompatible with the default schema compatibility
  • Other approach: Set the default schema compatibility to NONE
  • Automatic schema registration: true by default, you may want to set false to simulate production and test deployment procedures and pipelines

Schema evolution In production

In a production environment, you dont normally have the luxury of being able to completely change the data schema. This is where schema evolution becomes necessary. If you experiment with evolution in the development environment you will gain confidence in how to evolve schemas in environments you care about.

For production environments, Confluent recommend disabling automatic schema registration. The aim is to allow ops teams to take control over data schemas to safeguard correct app execution.

With this in mind, production scheme evolution should look something like this:

See below for details.

Prevent automatic schema registration

  • Automatic schema registration is on by default
  • On the server side, this is normally achieved through Confluent RBAC and principle of least privilege
  • Java clients can be configured with auto.register.schemas=false

Server-wide default schema compatibility

  • Reference
  • Config file variable: schema.compatibility.level
  • Docker environment variable: SCHEMA_REGISTRY_SCHEMA_COMPATIBILITY_LEVEL
  • Applies to all schemas which do not explicitly set schema compatibility at the schema level (the default)

How to register schemas

With auto.register.schemas=false set, these are the options to register schemas:

Per-schema compatibility

Sometimes you will need to special case the schema compatibility mode for a given schema. This can be done using the same techniques used to register schemas as above.

Post comment