# Synthetic Data Generator

The Synthetic Data Generator component is designed to generate the desired data by using the Draft07 schema of the data that needs to be generated.

The user can upload the data in CSV or XLSX format and it will generate the draft07 schema for the same data.

{% hint style="success" %}
*Check out steps to create and use the Synthetic Data Generator component in a Pipeline workflow.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FARWwq0mHnUobVAJqtFtw%2Fuploads%2F1RfjK69OWphzf2TuG5jd%2F2023-07-20-10-37-53%20(online-video-cutter.com).mp4?alt=media&token=8ed5a41f-528e-4016-a0dc-5acf5e46ef5d>" %}
***Synthetic Data Generator***
{% endembed %}

## Drag and Drop the Component <a href="#drag-and-drop-the-component" id="drag-and-drop-the-component"></a>

* Drag and drop the Synthetic Data Generator Component to the Workflow Editor.

&#x20;      ![](/files/ZrV24sMyi3fKOtQncOEi)

* Click on the dragged Synthetic Data Generator component to get the component properties tabs.

## Basic Information Tab <a href="#basic-information-tab" id="basic-information-tab"></a>

Configure the **Basic Information** tab.

* Select an Invocation type from the drop-down menu to confirm the running mode of the component. Select the **Real-Time** option from the drop-down menu.
* **Deployment Type**: It displays the deployment type for the component. This field comes pre-selected.
* **Container Image Version**: It displays the image version for the docker container. This field comes pre-selected.
* **Failover Event**: Select a failover Event from the drop-down menu .
* **Batch Size (min 10)**: Provide maximum number of records to be processed in one execution cycle (Min limit for this field is 10.

<figure><img src="/files/BKuGqxU2LSVFuDo1yt7f" alt=""><figcaption></figcaption></figure>

## Meta Information Tab

Configure the following information:

* **Iteration:** Number of iterations for producing the data.
* **Delay (sec):** Delay between each iteration in seconds.
* **Batch Size:** Number of data to be produced in each iteration.
* **Upload Sample File:** Upload the file containing data. CSV and XLSX file formats are supported. Once the file is uploaded, the draft07 schema for the uploaded file will be generated in the Schema tab. The supported files are CSV, Excel, and JSON formats.
* **Schema:** Draft07 schema will display under this tab in the editable format.
* **Upload Schema:** The user can directly upload the draft07 schema in JSON format from here. Also, the user can directly paste the draft07 schema in the schema tab.&#x20;

<figure><img src="/files/KVgum6FnmRt3rTot5IYi" alt=""><figcaption><p><em><strong>Meta Information for Schema Data Generator</strong></em></p></figcaption></figure>

## Saving the Component Configuration <a href="#saving-the-component-configuration" id="saving-the-component-configuration"></a>

* After doing all the configurations click the ***Save Component in Storage*** icon provided in the configuration panel to save the component.

<figure><img src="/files/ULjpx5jPku2img9kcT61" alt=""><figcaption></figcaption></figure>

* A notification message appears to inform about the component configuration saved.

<figure><img src="/files/2K9gkIBD01Y0fYBVQfM0" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note</mark>: **Total number of generated data**= **Number of iterations \* batch size***
{% endhint %}

### Sample Schema File

Please find a Sample Schema file given below for the users to explore the component.&#x20;

```json
    "Company": {
      "type": "string",
      "enum": ["NIKO RESOURCES LIMITED", "TCS","Accenture","ICICI Bank","Cognizant","HDFC Bank","Infosys"]
    },
    "Lead Origin": {
      "type": "string",
      "enum": ["Campaign", "Walk-in", "Social Media","Existing Account"]
    },
    "Mobile Number": {
      "type": "string",
      "pattern": "^\\+?\\d{1,3}[-.\\s]?\\(?(\\d{1,3})\\)?[-.\\s]?\\d{1,4}[-.\\s]?\\d{1,4}$"
    },
    "Lead Source": {
      "type": "string",
      "enum": ["Source A", "Source B", "Source C"]
    },
    "Source Medium": {
      "type": "string",
      "enum": ["Website", "Direct Calls", "Referal"]
    },
    "Source Campaign": {
      "type": "string",
      "enum": ["Campaign A", "Campaign B", "Campaign C"]
    },
    "Do Not Email": {
      "type": "string",
      "enum": ["Yes","No"]
    },
    "Do Not Call": {
      "type": "string",
      "enum": ["Yes","No"]
    },
    "Lead Stage": {
      "type": "string",
      "enum": ["Contact","Lead","Prospect","Opportunity"]
    },
    "Lead Score": {
      "type": "number",
      "minimum" : 0,
      "maximum" : 10
    },
    "Order Value": {
    "type": "number",
    "minimum" : 0,
    "maximum" : 10000000
    },
    "Engagement Score": {
    "type": "number",
    "minimum" : 0,
    "maximum" : 100
    },
    "TotalVisits": {
    "type": "number",
    "minimum" : 0,
    "maximum" : 10
    },
    "Average Time Per Visit": {
    "type": "number",
    "minimum" : 1,
    "maximum" : 50
    },
    "Last Activity": {
      "type": "string",
      "enum": ["Page Visited on Website","Email Opened","Unreachable","Converted to Lead"]      
    },
    "Last Activity Date": {
      "type": "string",
      "format" : "date",
      "minimum" : "2020-01-01",
      "maximum" : "2023-01-01"
    },
    "Created On": {
    "type": "string",
    "format" : "date",
    "minimum" : "2020-01-01",
    "maximum" : "2023-01-01"
    },
    "Modified On": {
    "type": "string",
    "format" : "date",
    "minimum" : "2020-01-01",
    "maximum" : "2023-01-01"
    },
    "Lead Conversion Date": {
    "type": "string",
    "format" : "date",
    "minimum" : "2020-01-01",
    "maximum" : "2023-01-01"
    },
    "State": {
      "type": "string",
      "enum": ["State A", "State B", "State C"]
    },
    "Country": {
      "type": "string",
      "enum": ["Country A", "Country B", "Country C"]
    },
    "Specialization": {
      "type": "string"
    }
  },
  "required": [
    "Company",
    "Lead Origin",
    "Mobile Number",
    "Lead Source",
    "Source Medium",
    "Source Campaign",
    "Do Not Email",
    "Do Not Call",
    "Lead Stage",
    "Lead Score",
    "Order Value",
    "Engagement Score",
    "TotalVisits",
    "Average Time Per Visit",
    "Last Activity",
    "Last Activity Date",
    "Created On",
    "Modified On",
    "Lead Conversion Date",
    "State",
    "Country",
    "Specialization"
  ]
}


```

{% hint style="info" %}
Weights can be given in order to handle the bias  across the data generated:

***The addition on weights should be exactly 1***

"age": { "type": "string", "enum": \["Young", "Middle","Old"], "weights":\[0.6,0.2,0.2]}
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bdb.ai/data-pipeline-1/components/producers/synthetic-data-generator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
