# Synthetic Data Generator

The **Synthetic Data Generator** component creates artificial datasets based on a **Draft-07 JSON schema**. It enables users to generate realistic test data for pipelines without relying on sensitive or production datasets.

Key features:

* Upload **CSV/XLSX/JSON sample files** to automatically generate a Draft-07 schema.
* Directly edit or upload Draft-07 schema JSON.
* Configure **iterations, delays, and batch sizes** for continuous data generation.
* Support for advanced schema rules: **if-then-else conditions**, **weights**, and **mathematical calculations**.

### Configuration Sections

All configurations are classified into:

* **Basic Information**
* **Meta Information**
* **Resource Configuration**

### Basic Information Tab

The *Basic Information* tab defines execution parameters.

| Field                       | Description                                        | Required |
| --------------------------- | -------------------------------------------------- | -------- |
| **Invocation Type**         | Select execution mode: **Real-Time**.              | Yes      |
| **Deployment Type**         | Displays the deployment type (pre-selected).       | Yes      |
| **Container Image Version** | Displays the Docker image version (pre-selected).  | Yes      |
| **Failover Event**          | Select a failover event.                           | Optional |
| **Batch Size**              | Maximum number of records per cycle (minimum: 10). | Yes      |

### Meta Information Tab

The *Meta Information* tab defines schema and data generation parameters.

| Field                  | Description                                                   | Required |
| ---------------------- | ------------------------------------------------------------- | -------- |
| **Iteration**          | Number of iterations for producing synthetic data.            | Yes      |
| **Delay (sec)**        | Delay between iterations in seconds.                          | Yes      |
| **Batch Size**         | Number of records generated per iteration.                    | Yes      |
| **Upload Sample File** | Upload CSV/XLSX/JSON file to auto-generate a Draft-07 schema. | Optional |
| **Schema**             | Displays the Draft-07 schema. Can be edited directly.         | Yes      |
| **Upload Schema**      | Upload Draft-07 schema in JSON format.                        | Optional |

### Draft-07 Schema Capabilities

#### Supported Data Types

* **String**
  * Properties: `maxLength`, `minLength`, `enum`, `weights`, `format`
  * Formats: `date`, `date-time`, `name`, `country`, `state`, `email`, `uri`, `address`, `current_datetime`
* **Number**
  * Properties: `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum`, `unique`, `start`, `enum`, `weights`
* **Float**
  * Properties: `minimum`, `maximum`

#### Conditional Rules (if-then-else)

Draft-07 schemas allow applying logical conditions for validation and generation.

Example – ensuring `end_date ≥ start_date`:

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "task_start_date": { "type": "string", "format": "date" },
    "task_end_date": { "type": "string", "format": "date" }
  },
  "if": {
    "properties": {
      "task_end_date": { "type": "string", "format": "date" },
      "task_start_date": { "type": "string", "format": "date" }
    }
  },
  "then": {
    "properties": {
      "task_end_date": {
        "format": "date",
        "minimum": { "$data": "task_start_date" }
      }
    }
  }
}
```

#### Weighted Values

Weights bias generated values across enumerations.

```json
"age": {
  "type": "string",
  "enum": ["Young", "Middle", "Old"],
  "weights": [0.6, 0.2, 0.2]
}
```

#### Computed Fields

You can define derived values with **calculation rules**.

```json
"number3": {
  "calculation": {
    "$eval": "data.number1 + data.number2 * 2"
  }
}
```

### Saving the Component Configuration

1. Configure **Basic Information** and **Meta Information**.
2. Click **Save Component** (Storage icon).
3. A confirmation message appears after saving.
4. Activate the pipeline to begin generating synthetic data.

### Example Workflow

1. Upload a **sample CSV file** containing customer records.
2. The system generates a Draft-07 schema automatically.
3. Configure:
   * **Iteration** = 10
   * **Delay** = 5 seconds
   * **Batch Size** = 100
4. Save the component and activate the pipeline.
5. The component continuously generates synthetic customer data batches and feeds them downstream.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bdb.ai/bdb-user-documentation/platform-modules/11.0/data-engineering/data-pipelines/pipeline-editor/pipeline-components/producers/synthetic-data-generator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
