Professional Documents
Culture Documents
Confluent Documentation
(/home/overview.html)
Requirements
Schema rules are only available on Confluent Enterprise and Confluent Cloud with the Stream Governance
“Advanced” package.
Enable Schema Registry with the Advanced Stream Governance package (/cloud/current/stream-
governance/packages.html#features-by-package-type), as described in Packages, Features, and Limits
(/cloud/current/stream-governance/packages.html), and in the Choose a Stream Governance package and
enable Schema Registry for Confluent Cloud (/cloud/current/get-started/schema-registry.html#choose-a-
stream-governance-package-and-enable-sr-for-ccloud).
On Confluent Cloud, starting in Q3 2023, there will be an add-on charge for TRANSFORM rules priced at
$0.1/hr/rule (10 cents per rule). To learn more, see the details regarding Data Contracts as a part of the
Advanced package in Stream Governance Packages, Features, and Limits (/cloud/current/stream-
governance/packages.html).
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 1/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
To enable schema rules for Confluent Enterprise, add the following to the Schema Registry properties before
starting Schema Registry. The value of the property below can be a comma-separated list of class names:
resource.extension.class=io.confluent.kafka.schemaregistry.rulehandler.RuleSetResourceExtension
Limitations
Current limitations are:
Structure. This is the part of the contract that is covered by the schema, which defines the fields and their
types.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 2/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
Integrity constraints. This includes declarative constraints or data quality rules on the domain values of
fields, such as the constraint that an age must be a positive integer.
Metadata. Metadata is additional information about the schema or its constituent parts, such as whether a
field contains sensitive information. Metadata can also include documentation for a data contract, such as who
created it.
Rules or policies. These data rules or policies can enforce that a field that contains sensitive information
must be encrypted, or that a message containing an invalid age must be sent to a dead letter queue.
Change or evolution. This implies that data contracts are versioned, and can support declarative migration
rules for how to transform data from one version to another, so that even changes that would normally break
downstream components can be easily accommodated.
Keeping in mind that a data contract is an agreement between an upstream component and a downstream
component, note that:
Data contracts are important because they provide transparency over dependencies and data usage in a
stream architecture. They also help to ensure the consistency, reliability, and quality of the data in motion.
The upstream component could be a Apache Kafka® producer, while the downstream component would be
the Kafka consumer. But the upstream component could also be a Kafka consumer, and the downstream
component would be the application in which the Kafka consumer resides. This differentiation is important in
schema evolution, where the producer may be using a newer version of the data contract, but the
downstream application still expects an older version. In this case the data contract is used by the Kafka
consumer to mediate between the Kafka producer and the downstream application, ensuring that the data
received by the application matches the older version of the data contract, possibly using declarative
transformation rules to massage the data into the desired form.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 3/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
In the diagram above, the producer may be using version 2 of the data contract, yet the application in which
the consumer resides expects version 1 of the data contract. In this case the consumer will use a migration
rule to transform the data so that it conforms to version 1.
To support the five aspects of a data contract mentioned above, Schema Registry has been enhanced with
tags, metadata, and rules.
Tags
Tags are used to add annotations to a schema or its parts. Tags can either be inlined directly in a schema, or
can be specified externally to the schema.
{
"type":"record",
"name":"MyRecord",
"fields":[{
"name":"ssn",
"type":"string",
"confluent:tags": [ "PII", "PRIVATE" ]
}]
}
{
"type": "object",
"properties": {
"ssn": {
"type": "string"
"confluent:tags": [ "PII", "PRIVATE" ]
}
}
}
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 4/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
syntax = "proto3";
import "confluent/meta.proto"
message MyRecord {
string ssn = 1 [
(confluent.field_meta).tags = "PII",
(confluent.field.meta).tags = "PRIVATE"
];
}
When sending requests to Schema Registry, any of the above schemas is passed as the value of a field
named schema in a JSON payload. Previously, the JSON payload had only three fields: schemaType , schema ,
and references . The JSON payload has been expanded to include two additional fields called metadata and
ruleSet .
{
"schemaType": "...",
"schema": "...",
"references": [...],
"metadata": {
"tags": {...},
"properties": {...}
},
"ruleSet": {
"migrationRules": [...],
"domainRules": [...]
}
}
The tags property of the metadata property allows external tags to be specified.
{
"schema": "...",
"metadata": {
"tags": {
"MyRecord.ssn": [ "PII" ]
}
},
"ruleSet": ...
}
One feature supported by external tags that is not available with inline tags is the support for path wildcards.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 5/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
{
"schema": "...",
"metadata": {
"tags": {
"**.ssn": [ "PII" ]
}
},
"ruleSet": ...
}
Note that a path is a set of elements, separated by dots ( . ). Path wildcards support the following
characters:
* matches any number of characters in a path element; for example, a*.bob matches alice.bob
** matches any number of characters across multiple path elements; for example, a** matches alice.bob
Note
On Confluent Cloud, any tags in the schema first must have their definitions created in the Stream Catalog
(/cloud/current/stream-governance/stream-catalog.html) beforehand.
For a schema with references, only the external tags at the root schema apply. This is different from inline
tags, which are always applicable, regardless of whether the inline tag is in the root schema or one of the
referenced schemas.
To learn more about tags, see Tagging Data and Schemas (/cloud/current/stream-governance/stream-
catalog.html#tagging-data-and-schemas) and Tag API Examples (/cloud/current/stream-
governance/stream-catalog-rest-apis.html#tags-api-examples).
Metadata Properties
A data contract can be enhanced with arbitrary metadata properties.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 6/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
{
"schema": "...",
"metadata": {
"properties": {
"owner": "Bob Jones",
"email": "bob@acme.com"
}
},
"ruleSet": ...
}
Rules
Rules can be used to specify integrity constraints or data policies in a data contract. Rules have several
properties.
Every rule executor must implement a method named type() that returns a string. When a rule is specified in
a schema, the client, during serialization or deserialization, checks to determine if an appropriate rule
executor for the given rule type has been configured on the client. If so, the rule executor is used to run the
rule.
Note
On Confluent Cloud, starting in Q3 2023, there will be an add-on charge for TRANSFORM rules priced at
$0.1/hr/rule (10 cents per rule). To learn more, see the details regarding Data Contracts as a part of the
Advanced package in Stream Governance Packages, Features, and Limits (/cloud/current/stream-
governance/packages.html).
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-rules</artifactId>
<version>7.4.0</version>
</dependency>
The CEL rule executor will register itself using the “CEL” rule type. Next, you need to associate a CEL rule with
the schema. For example, suppose you want to validate that the ssn field is only 9 characters long before
serializing a message. You can use a schema with the rule below, where the message is passed to the rule as
a variable with the name message :
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 8/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
{
"schema": "...",
"ruleSet": {
"domainRules": [
{
"name": "checkSsnLen",
"kind": "CONDITION",
"type": "CEL",
"mode": "WRITE",
"expr": "size(message.ssn) == 9"
}
]
}
}
Note that the example gives the rule the same name ( checkSsnLen ) as that passed to the rule.executors
property. If you don’t use the same name, the client will use any executor that is registered for the given rule
type on the client. However, it’s often useful to use the same name if you want to configure several rules of
the same type (but with different names) differently.
During registration, if the schema is omitted above, then the ruleset will be attached to the latest schema in
the subject. After registration, you need to specify use.latest.version=true on the client.
Now that you’ve set up a data quality rule, whenever a message is sent that has an ssn field that is not 9
characters in length, the serializer will throw an exception. Alternatively, if the validation fails, you can have the
message sent to a dead-letter queue. Note that the action type “DLQ” is specified below. For the bidirectional
rules (WRITEREAD, UPDOWN), you can specify a comma-separated pair of action types if needed.
{
"schema": "...",
"ruleSet": {
"domainRules": [
{
"name": "checkSsnLen",
"kind": "CONDITION",
"type": "CEL",
"mode": "WRITE",
"expr": "size(message.ssn) == 9",
"onFailure": "DLQ"
}
]
}
}
Similar to rule executors, every rule action must implement a method named type() that returns a string.
The dead-letter queue is configured by specifying a rule action that returns the action type “DLQ”.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 9/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
When using a dead-letter queue to validate a record value, if the validation fails, you will usually want to
capture both the record key and the record value. However, the record key is not passed to the serializer or
deserializer that is validating the record value. To capture the record key, you must specify a “wrapping”
serializer or deserializer for the record key that will capture the key and make it available to the serializer or
deserializer for the record value. So if the record key is a simple string, instead of specifying:
key.serializer=org.apache.kafka.common.serialization.StringSerializer
Instead, specify:
key.serializer=io.confluent.kafka.serializers.WrapperKeySerializer
wrapped.key.serializer=org.apache.kafka.common.serialization.StringSerializer
Field-Level Transforms
Field-level transforms can also be specified using the Google Common Expression Language (CEL). This is
accomplished with a rule executor that transforms field values in a message, using a rule executor of type
CEL_FIELD. To use this executor, include the following dependency:
dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-rules</artifactId>
<version>7.4.0</version>
</dependency>
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 10/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
Next, configure the Java client to use the CEL_FIELD rule executor:
rule.executors=populateFullName
rule.executors.populateFullName.class=io.confluent.kafka.schemaregistry.rules.cel.CelFieldExecutor
The CEL field level rule executor will register itself using the “CEL_FIELD” rule type. The rule expression can be
applied to multiple fields, but those fields must be of the same type. To specify the applicable fields for the
rule expression, we use a guard, which is a CEL boolean expression. So the complete rule expression is of the
form:
Suppose you want to provide a default value for a field if it is empty. You can use a schema with the rule
below, where field name is passed as name , the field value is passed as value , and the field type is passed as
typeName . Note that the expr below is only applied to the field named occupation .
{
"schema": "...",
"ruleSet": {
"domainRules": [
{
"name": "populateDefault",
"kind": "TRANSFORM",
"type": "CEL_FIELD",
"mode": "WRITE",
"expr": "name == 'occupation' ; value == '' ? 'unspecified' : value"
}
]
}
}
Alternatively, you can specify a guard of the form typeName == 'STRING' to apply the rule expression to all
string fields, such as in the following:
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 11/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
{
"schema": "...",
"ruleSet": {
"domainRules": [
{
"name": "populateDefault",
"kind": "TRANSFORM",
"type": "CEL_FIELD",
"mode": "WRITE",
"expr": "typeName == 'STRING' ; value == '' ? 'unspecified' : value"
}
]
}
}
Note
Only primitive types ( string , bytes , int , long , float , double , boolean ) are supported for field
transformations. Enums are not supported.
Event-Condition-Action Rules
When using conditions, one can model arbitrary Event-Condition-Action (ECA) rules by specifying an action
for onSuccess to use when the condition is true , and an action for onFailure to use when the condition is
false . The action type for onSuccess defaults to the built-in action type NONE, and the action type for
onError defaults to the built-in action type ERROR. You can explicitly set these values to NONE, ERROR, DLQ,
or a custom action type. For example, you might want to implement an action to send an email whenever an
event indicates a credit card is about to expire.
To implement a custom action type, implement a RuleAction interface and then register the action as follows:
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 12/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
But what if you want to evolve the schema in an incompatible manner without breaking clients? Such a
scenario requires some form of data migration, and there are two possible solutions for a migration:
Inter-topic migrations: Set up a new Kafka topic and translate the topic data from the old format in the old
topic to the new format in the new topic.
Intra-topic migrations: Use transformations when consuming from a topic to translate the topic data from the
old format to the new format.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 13/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
The complexity with the first approach is in switching consumers from reading the old topic to the new topic
in a way that does not skip data or cause duplicate processing. To do this properly, a correlation would need
to be established between the offsets of the two topics, and the consumer would need to start reading from
the correct correlation point in the new topic. In addition, if something goes wrong you might want the ability
to switch back to the old topic, again using the correct correlation point.
With the second option, the client does not need to be switched over to a different topic, but instead
continues to read from the same topic, and a set of declarative migration rules will massage the data into the
form that the consumer expects. With such declarative migration rules, we can support a system in which
producers are separately using versions 1, 2, and 3 of a schema, which are all incompatible, and consumers
that expect versions 1, 2, or 3 each see a message transformed to the desired version, regardless of which
version the producer sent. This is often a cleaner and simpler solution for complex schema evolution that
was not available with Schema Registry until now, with the release of Confluent Platform 7.4.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 14/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
Before discussing how to use declarative migration rules, the next sections cover a few features that were
added to Schema Registry in order to support complex schema evolution.
{
"schema": "...",
"metadata": {
"properties": {
"application.major.version": "2"
}
},
"ruleSet": ...
}
You can then specify that a consumer only use the latest schema of a specific major version.
use.latest.with.metadata=application.major.version=2
latest.cache.ttl.sec=300
The above example also specifies that the client should check for a new latest version every 5 minutes. This
TTL configuration can also be used with the use.latest.version=true configuration.
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 15/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
Configuration Enhancements
The compatibility level in Schema Registry can be set at the global level, or at the level of an individual
subject. This setting is achieved by using a configuration object. Previously, the only field in the configuration
object was for the compatibility level. The configuration object has been enhanced with the following
properties;
compatibilityGroup - Only schemas that belong to the same compatibility group will be checked for
compatibility.
defaultMetadata - Default value for the metadata to be used during schema registration.
overrideMetadata - Override value for the metadata to be used during schema registration.
defaultRuleSet - Default value for the ruleSet to be used during schema registration.
overrideRuleSet - Override value for the ruleSet to be used during schema registration.
If the schema does not specify metadata, then during the second step, the schema will use the metadata
from the previous version if it exists.
By setting the “overrideMetadata” with the following value, we will ensure that every schema has an
“application.major.version” of “2” when it is registered. This allows us to not have to explicitly pass this
metadata value with every schema during registration. Also, by specifying below that the “compatibilityGroup”
is “application.major.version”, only schemas that have the same value for “application.major.version” will be
compared against one another during compatibility checks.
{
"compatibilityGroup": "application.major.version",
"overrideMetadata": {
"properties": {
"application.major.version": "2"
}
}
}
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 16/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
Migration Rules
With the above configurations, you can now pin a consumer to a specific application major version, and
ensure that compatibility checks only happen as long as the application major version does not change.
When you introduce a schema that belongs to a new major version, you must attach migration rules to the
schema to indicate how to transform the message from the previous major version to the new major version
(for new consumers reading old messages) and how to transform the message from the new major version
to the previous major version (for old consumers reading new messages). If a consumer encounters a
schema with a major version that is two or more versions away, such as going from major version 1 to 3 (or
from 3 to 1), then it will run migration rules transitively, in this case from 1 to 2, and then from 2 to 3.
In the diagram above, the producer executes the rules applicable to WRITE mode, while the consumer either
runs rules in either UPGRADE or DOWNGRADE mode (depending on the schema version of the payload and
the desired version), then executes rules applicable to READ mode. Note that the rules are executed in the
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 17/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
reverse order during READ mode. This is to ensure any actions performed during WRITE are undone in the
proper order during READ.
How do migration rules work? They take advantage of the fact that Avro, JSON Schema, and Protobuf all
specify a JSON serialization format for messages. The migration rules can be specified in any declarative
language that supports JSON transformations.
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-rules</artifactId>
<version>7.4.0</version>
</dependency>
rule.executors=myRuleName
rule.executors.myRuleName.class=io.confluent.kafka.schemaregistry.rules.jsonata.JsonataExecutor
The JSONata rule executor will register itself using the “JSONATA” rule type.
{
"type":"record",
"name":"MyRecord",
"fields":[{
"name":"ssn",
"type":"string",
"confluent:tags": [ "PII", "PRIVATE" ]
}]
}
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 18/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
We want to rename the field as “socialSecurityNumber”. Below are a set of migration rules to achieve this. In
this case, you can use the JSONata function called sift to remove the field with the old name, and then use
a JSON property to specify the new field.
{
"schema": "...",
"ruleSet": {
"migrationRules": [
{
"name": "changeSsnToSocialSecurityNumber",
"kind": "TRANSFORM",
"type": "JSONATA",
"mode": "UPGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'ssn'}), {'socialSecurityNumber': $.'ss
n'}])"
},
{
"name": "changeSocialSecurityNumberToSsn",
"kind": "TRANSFORM",
"type": "JSONATA",
"mode": "DOWNGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'socialSecurityNumber'}), {'ssn': $.'socialS
ecurityNumber'}])"
}
]
}
}
The UPGRADE rule allows new consumers to read old messages, while the DOWNGRADE rule allows old
consumers to read new messages. If you plan to upgrade all consumers, you can of course omit the
DOWNGRADE rule.
Custom Rules
As mentioned previously, a custom rule can be used by implementing the RuleExecutor interface:
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 19/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
For rules of type CONDITION, the transform method should return a Boolean value, and for rules of type
TRANSFORM, the transform method should return the transformed message.
Quick Start
This Quick Start shows an example of a data quality rule using an Avro schema. You’ll creates a schema with
a single field “f1”, and add a data quality rule rule to ensure that the length of the field is always less than 10.
Requirements
Schema rules are only available on Confluent Enterprise (not the Community edition) and Confluent Cloud
with the Stream Governance “Advanced” package.
For Confluent Cloud Schema Registry, enable the Advanced Stream Governance package
(/cloud/current/stream-governance/packages.html#features-by-package-type), as described in Packages,
Features, and Limits (/cloud/current/stream-governance/packages.html), and in the Choose a Stream
Governance package and enable Schema Registry for Confluent Cloud (/cloud/current/get-
started/schema-registry.html#choose-a-stream-governance-package-and-enable-sr-for-ccloud).
For Confluent Platform, make sure you have Confluent Platform Enterprise
(/platform/7.4/installation/available_packages.html#cp-platform-packages) installed, and then add the
following to the Schema Registry properties before starting Schema Registry. The value of the property
below can be a comma-separated list of class names:
resource.extension.class=io.confluent.kafka.schemaregistry.rulehandler.RuleSetResourceExtension
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 20/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
./bin/kafka-avro-console-producer \
--topic test \
--broker-list localhost:9092 \
--property schema.registry.url=http://localhost:8081 \
--property value.schema='{"type":"record","name":"myrecord","fields":
[{"name":"f1","type":"string"}]}' \
--property value.rule.set='{ "domainRules":
[{ "name": "checkLen", "kind": "CONDITION", "type": "CEL",
"mode": "WRITE", "expr": "size(message.f1) < 10",
"onFailure": "ERROR"}]}' \
--property rule.executors=checkLen \
--property rule.executors.checkLen.class=io.confluent.kafka.schemaregistry.rules.cel.CelExecutor
{"f1": "success"}
{"f1": "this will fail"}
./bin/kafka-avro-console-consumer \
--topic test \
--bootstrap-server localhost:9092 \
--property schema.registry.url=http://localhost:8081
Reference
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 21/22
5/18/23, 2:48 PM Data Contracts for Schema Registry | Confluent Documentation
Related Content
Schema Evolution and Compatibility (avro.html#schema-evolution-and-compatibility)
Formats, Serializers, and Deserializers (/platform/7.4/schema-registry/serdes-develop/index.html)
Reference topics (/platform/7.4/schema-registry/develop/index.html)
Using Google Common Expression Language (CEL)
JSON query and transformation language (JSONata)
https://docs.confluent.io/platform/current/schema-registry/fundamentals/data-contracts.html 22/22