Synthetic Data Generator
The Synthetic Data Generator component creates artificial datasets based on a Draft-07 JSON schema. It enables users to generate realistic test data for pipelines without relying on sensitive or production datasets.
Key features:
Upload CSV/XLSX/JSON sample files to automatically generate a Draft-07 schema.
Directly edit or upload Draft-07 schema JSON.
Configure iterations, delays, and batch sizes for continuous data generation.
Support for advanced schema rules: if-then-else conditions, weights, and mathematical calculations.
Configuration Sections
All configurations are classified into:
Basic Information
Meta Information
Resource Configuration
Basic Information Tab
The Basic Information tab defines execution parameters.
Invocation Type
Select execution mode: Real-Time.
Yes
Deployment Type
Displays the deployment type (pre-selected).
Yes
Container Image Version
Displays the Docker image version (pre-selected).
Yes
Failover Event
Select a failover event.
Optional
Batch Size
Maximum number of records per cycle (minimum: 10).
Yes
Meta Information Tab
The Meta Information tab defines schema and data generation parameters.
Iteration
Number of iterations for producing synthetic data.
Yes
Delay (sec)
Delay between iterations in seconds.
Yes
Batch Size
Number of records generated per iteration.
Yes
Upload Sample File
Upload CSV/XLSX/JSON file to auto-generate a Draft-07 schema.
Optional
Schema
Displays the Draft-07 schema. Can be edited directly.
Yes
Upload Schema
Upload Draft-07 schema in JSON format.
Optional
Draft-07 Schema Capabilities
Supported Data Types
String
Properties:
maxLength
,minLength
,enum
,weights
,format
Formats:
date
,date-time
,name
,country
,state
,email
,uri
,address
,current_datetime
Number
Properties:
minimum
,maximum
,exclusiveMinimum
,exclusiveMaximum
,unique
,start
,enum
,weights
Float
Properties:
minimum
,maximum
Conditional Rules (if-then-else)
Draft-07 schemas allow applying logical conditions for validation and generation.
Example – ensuring end_date ≥ start_date
:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"task_start_date": { "type": "string", "format": "date" },
"task_end_date": { "type": "string", "format": "date" }
},
"if": {
"properties": {
"task_end_date": { "type": "string", "format": "date" },
"task_start_date": { "type": "string", "format": "date" }
}
},
"then": {
"properties": {
"task_end_date": {
"format": "date",
"minimum": { "$data": "task_start_date" }
}
}
}
}
Weighted Values
Weights bias generated values across enumerations.
"age": {
"type": "string",
"enum": ["Young", "Middle", "Old"],
"weights": [0.6, 0.2, 0.2]
}
Computed Fields
You can define derived values with calculation rules.
"number3": {
"calculation": {
"$eval": "data.number1 + data.number2 * 2"
}
}
Saving the Component Configuration
Configure Basic Information and Meta Information.
Click Save Component (Storage icon).
A confirmation message appears after saving.
Activate the pipeline to begin generating synthetic data.
Example Workflow
Upload a sample CSV file containing customer records.
The system generates a Draft-07 schema automatically.
Configure:
Iteration = 10
Delay = 5 seconds
Batch Size = 100
Save the component and activate the pipeline.
The component continuously generates synthetic customer data batches and feeds them downstream.