Schema Validator

The Schema Validator Component ensures incoming data adheres to defined validation rules, such as data types, value ranges, and nullability. It helps enforce data quality and consistency across pipelines by separating valid records from invalid ones.

The component produces two outputs:

  • Valid Records Event – Passes schema-validated records successfully.

  • Bad Records Event – Captures rejected records with error details.

Key Capabilities

  • Validate flat JSON records against a schema file.

  • Enforce constraints such as data types, ranges, and nullability.

  • Handle schema drift scenarios with configurable validation modes.

  • Separate valid and invalid records for downstream processing.

  • Support for batch and real-time execution.

Note Schema Validator supports only flat JSON data. It does not validate nested JSON arrays/lists or objects.

Correct Example (Flat JSON):

{"Emp_id": 248, "Age": 20, "City": "Mumbai", "Dept": "Data_Science"}

Incorrect Example (Nested JSON):

{"Id": 248, "Name": "Smith", "Marks": [80,85,70,90,91], "Dept": {"dept_id":20,"dept_name":"data_science"}}

Configuration Overview

All Schema Validator configurations are organized into:

  • Basic Information

  • Meta Information

  • Resource Configuration

Steps to Configure the Schema Validator

  1. Drag and drop the Schema Validator Component into the Pipeline workflow canvas.

  2. Connect it to the required input and output events.

  3. Select the component to configure its properties in the tabs below.

Basic Information Tab

  • Invocation Type

    • Real-Time – Keeps the component active continuously, ready to consume data at any time.

    • Batch – Requires a trigger from the previous event; shuts down after processing is complete.

  • Batch Size

    • Maximum number of records processed in one cycle.

    • Useful for controlling load when processing large record sizes.

  • Failover Event

    • Defines where data is routed if the component fails.

    • Includes failure cause and timestamp in the event.

  • Intelligent Scaling

    • Auto-scales instances based on traffic volume.

    • Reduces processing load by scaling up during high data traffic.

Meta Information Tab

  • Schema File Name – Name of the uploaded schema file.

  • Choose File – Upload schema file (JSON format).

  • View Schema – Inspect the uploaded schema.

  • Remove File – Delete the uploaded schema file.

Mode

Two schema validation modes are available:

  • Strict

    • Prevents unexpected schema behavior.

    • Throws exceptions or logs warnings when rules are violated.

    • Invalid records are sent to the Bad Records Event.

  • Allow Schema Drift

    • Tolerates slight schema changes in source data.

    • Supports scenarios where metadata (fields, columns, or types) changes dynamically.

Bad Records Event

  • Automatically mapped to the second node of the Schema Validator component.

  • Captures rejected records with details on why validation failed.

Usage Notes

  • The Copy/Paste option is disabled for this component. Create new instances manually if needed.

  • Future Plan – Support for nested JSON data validation.

Example Use Cases

  • Validate incoming employee records to ensure numeric IDs and non-null department fields.

  • Enforce date formats and ranges in transaction datasets.

  • Route invalid records to a bad records event for remediation workflows.

  • Handle schema drift in log files where fields may appear or disappear over time.