Appendix B: Special cases

B.1. API Integration Using the BDB Platform to receive data

API (Application Programming Interface) integration enables different software systems to connect and share data automatically. It acts as a bridge between applications, allowing seamless communication and efficient data exchange. This helps streamline processes and ensures that systems remain synchronized.

Implementation of API Integration in BDB.ai — Overview and Steps

To implement API integration in BDB.ai, you need to build a pipeline that establishes the connection between systems. This is used to save the data from the dashboard into any database or to post any customer data into the BDB system. Follow the steps below:

Create a New Pipeline

Open a new canvas to create the pipeline.
From the right-side menu, navigate to Consumers → API Ingestion, then drag and drop it onto the canvas.
This component helps to receive data via an HTTP POST request to a generated endpoint.

Configure the API Ingestion Component

Click on the API Ingestion component to open its configuration window.
This window has two tabs — Basic Information and Meta Information.

Fill in Basic Information

Choose the Invocation Type as Real-Time.
Set an appropriate Batch Size.

Fill in Meta Information

Set Ingestion Type to API Ingestion.
The Ingestion ID and Secret will be generated automatically.
Click the Save icon to update the pipeline.
Confirm the changes to generate the Component ID URL, and save your updates.

Add a Kafka Event

Right-click the created component and select Add Kafka Event.
A new window will appear where you must specify:
Event Name
Event Duration
Number of Partitions
Output Type
Click Add Kafka Event to complete the setup.

View the updated Canvas and configure any writer component

Once created, the canvas will update to reflect the new Kafka event. Any database writer can be connected to this Kafka event to save the received data to the database.

Add the API Trigger Script

Open the associated dashboard by clicking Designer from the sidebar.
On the component where you want to trigger the API, add the following script:

var url = "https://app.bdb.ai/ingestion/dataIngestion";
var ingestionId = "cfed798c-89ed-49b3-9e99-e075b02ab2a9";
var ingestionSecret = "bzl52wIEqJxJqe+qZxxRMf8L17xxxxxxxxxxxxxxxxxxxxxx";
 
var settings = {
    "url": url,
    "method": "POST",
    "timeout": 0,
    "headers": {
        "ingestionId": ingestionId,
        "ingestionSecret": ingestionSecret,
        "Content-Type": "application/json"
    },
    "data": JSON.stringify({
        "data": {
            "id": 1,
            "name": "Deva",
            "email": "[email protected]",
            "contact_number": 9003620549,
            "department": "Professional Services"
        }
    }),
};
 
$.ajax(settings).done(function (response) {
    console.log(response);  // Log the full response for debugging
});

Note: Replace url, ingestionId, and ingestionSecret with the values generated from your API Ingestion component.

Explanation of the Script

Defines variables: url, ingestionId, ingestionSecret.
Creates a settings object for the AJAX call:
Method: POST
Headers: ingestion credentials and content type
Data: JSON payload (id, name, email, contact number, department)
Sends the request using $.ajax(settings) and logs the response in the console.

Activate the Pipeline

Go to the Pipeline menu in the sidebar.
Activate the pipeline.
Once activated, you should see a confirmation image or status.

Trigger and Preview the API Ingestion

Navigate to the Designer component.
Preview the dashboard and click the button to trigger the API ingestion.
Then, go to the Pipeline component → Event → Preview Tab to view the data generated through the API.
By connecting any writer to this event, the data received can be saved to the database after performing the required transformations.

B.2. Sending Email using Python Script

The following section explains how to trigger an email from the platform based on a specific condition. This Python script enables you to send emails securely using SMTP (Simple Mail Transfer Protocol) with SSL encryption. It supports both plain text and HTML-formatted content, making it suitable for sending simple messages as well as rich, styled emails.

The script uses only Python’s built-in libraries — smtplib, ssl, and email.mime — so there are no external dependencies. You can easily configure it to work with your preferred SMTP server (e.g., Gmail, Outlook, or a corporate mail server).

Scenario: Send an email when a leave request is approved. Goal: Automatically notify the user through an email upon approval of their leave request.

The script is added within a Python component that triggers the email-sending process when approved from the dashboard.

Pipeline

In the pipeline module, open a new canvas to create the pipeline.
From the right-side menu, navigate to Consumers → API Ingestion, then drag and drop it onto the canvas.
This component helps to receive data via an HTTP POST request to a generated endpoint. This endpoint will be used in the dashboard.
Add and connect an event to the API Ingestion component.
Drag and drop a Python Script component and add the below script.

Python code

import smtplib,ssl
from email.mime.multipart import MIMEMultipart 
from email.utils import formatdate 
from email.mime.text import MIMEText  
from email.mime.base import MIMEBase  
from email import encoders
from email.message import Message
from email.mime.audio import MIMEAudio 
from email.mime.image import MIMEImage 

 
def send_mail(send_from, send_to, subject, server, port, username, password, html_content=None):
    try:
        msg = MIMEMultipart()
        msg['From'] = send_from
        msg['To'] = send_to
        msg['Date'] = formatdate(localtime=True)
        msg['Subject'] = subject
 
        if html_content:
            part = MIMEText(html_content, 'html')
            msg.attach(part)
        else:
            part = MIMEText("Hi, Your leave has been approved", 'plain')
            msg.attach(part)
 
        context = ssl.create_default_context()
        with smtplib.SMTP_SSL(server, port, context) as smtp:
            smtp.login(username, password)
            smtp.sendmail(send_from, send_to, msg.as_string())
        print("Email Sent")
    except Exception as e:
        print(f"Error sending email: {str(e)}")

send_from = send_from
send_to = send_to
subject = subject
server = server
port = port
username = username
password = password

send_mail(send_from, send_to, subject, server, port, username, password, html_content=None)

Code Explanation

Imports Modules
1. Uses smtplib, ssl, and email.mime to construct and send emails securely.
Defines the send_mail Function
1. Parameters: send_from, send_to, subject, server, port, username, password, and optional html_content.
Creates the Email Object
1. Builds a MIMEMultipart message and sets headers: From, To, Date, and Subject.
Adds the Email Body
1. If html_content is provided, → attaches it as an HTML message.
2. Else → sends a plain-text message saying: “Hi, your leave has been approved.”
Establishes a Secure Connection
1. Connects to the SMTP server using SSL.
2. Logs in with the provided credentials.
Sends the Email
1. Uses smtp.sendmail() to send the message.
2. Prints “Email sent successfully!” upon completion.
Error Handling
1. Captures and prints any exceptions encountered during execution.
Configuration
1. Replace the placeholders (send_from, server, etc.) with actual values before running the script.

Dashboard:

The above pipeline can be triggered from the dashboard. Internally, this action sends an API POST request to the pipeline, which executes the process of sending the email. (Refer to Appendix B.1. for how to configure the API ingestion from the dashboard.)

B.3. Simulating Data using SDG and Python

Data simulation is often required during proof-of-concept (POC) or testing stages when production data is unavailable or restricted. The Synthetic Data Generator (SDG) in the BDB platform, combined with the Python Faker library, enables the creation of realistic, controlled datasets that adhere to business rules, data types, and inter-column dependencies.

This approach helps teams validate pipelines, transformations, and dashboards before integrating live data sources.

The SDG component focuses on generating structured data aligned to business schemas and validation rules.
The Python Faker library adds realism by simulating user-level or location-specific data such as names, addresses, phone numbers, and emails.

Together, they form a flexible, controlled, and repeatable method for creating test data across analytical use cases.

Example Schema: Product Price Data

The following JSON schema defines data types, categorical distributions, patterns, and inter-field dependencies that can be simulated using the SDG component.

 {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Product Price Data",
  "type": "object",
  "properties": {
    "CHARGE_ID": {
      "type": "string",
      "pattern": "^PRICE\\d{7}$",
      "description": "A unique identifier for the charge."
    },
    "CHARGE_NAME": { "type": "string" },
    "CHARGE_STATUS": {
      "type": "string",
      "enum": ["Live", "Test"],
      "weights": [0.98, 0.02]
    },
    "CHARGE_DESCRIPTION": { "type": "string" },
    "CHARGE_CURRENCY": {
      "type": "string",
      "enum": ["INR"]
    },
    "CHARGING_PATTERN": {
      "type": "string",
      "enum": ["prepaid", "postpaid"],
      "weights": [0.58, 0.42]
    },
    "BILL_TYPE": {
      "type": "string",
      "enum": ["Arrear", "Advance"]
    },
    "RECURRING_TYPE": {
      "type": "string",
      "enum": ["Recurring", "Non Recurring"]
    },
    "RECURRING_PERIOD_TYPE": {
      "type": "string",
      "enum": ["Monthly", "Default"]
    },
    "RECURRING_PERIOD": {
      "type": "integer",
      "minimum": 0
    },
    "CHARGE_STARTDATE": {
      "type": "string",
      "format": "date-time"
    },
    "UPDATED_DATE": {
      "type": "string",
      "format": "date-time"
    },
    "PRICE": {
      "type": "number",
      "minimum": 0
    }
  },
  "required": [
    "CHARGE_ID", "CHARGE_NAME", "CHARGE_STATUS", "CHARGE_DESCRIPTION",
    "CHARGE_CURRENCY", "CHARGING_PATTERN", "BILL_TYPE",
    "RECURRING_TYPE", "RECURRING_PERIOD_TYPE", "RECURRING_PERIOD",
    "CHARGE_STARTDATE", "UPDATED_DATE", "PRICE"
  ],
  "if": {
    "properties": { "CHARGING_PATTERN": { "const": "prepaid" } }
  },
  "then": {
    "properties": {
      "RECURRING_TYPE": { "const": "Non Recurring" },
      "RECURRING_PERIOD_TYPE": { "const": "Default" }
    }
  },
  "else": {
    "if": {
      "properties": { "CHARGING_PATTERN": { "const": "postpaid" } }
    },
    "then": {
      "properties": {
        "RECURRING_TYPE": { "const": "Recurring" },
        "RECURRING_PERIOD_TYPE": { "const": "Monthly" }
      }
    }
  }
}

Key Schema Attributes

Attribute

Description

type

Defines the datatype of each field (e.g., string, number, integer, date).

pattern

Uses regular expressions to control value formats (e.g., ID patterns).

enum

Lists possible values for categorical fields with low cardinality.

weights

Defines the probability distribution among categorical values.

rules (if/else)

Specifies inter-field dependencies or conditional logic.

Example Rule Definition

Logical consistency across columns can be achieved using conditional rules. For example:

If CHARGING_PATTERN = prepaid, then RECURRING_TYPE = Non-Recurring; If CHARGING_PATTERN = postpaid, then RECURRING_TYPE = Recurring.

Such inter-field relationships can be expressed using if / else conditions within the JSON schema.

Python Faker Enrichment Example

When the generated dataset requires contextual or realistic values such as random names, fake addresses, phone numbers, or email IDs, the Python Faker library can be used in combination with the SDG-generated dataset.

Below is a sample enrichment function that can be applied post-SDG data generation.

import pandas as pd
import random
from faker import Faker

# Initialize Faker for India-based data
fake = Faker('en_IN')

def Channelpartner_enrichment(df):
    def apply_conditions(row):
        if row["channelpartner_type"] in ["L2_Own", "L2_Others"]:
            row["partner_number"] = random.randint(1000, 2000)
            row["parent_partner_number"] = None
        elif row["channelpartner_type"] in ["L3_Own", "L3_Others"]:
            row["partner_number"] = random.randint(2000, 4000)
            row["parent_partner_number"] = random.randint(1000, 2000)
        elif row["channelpartner_type"] in ["L4_Own", "L4_Others"]:
            row["partner_number"] = random.randint(4000, 6000)
            row["parent_partner_number"] = random.randint(2000, 4000)
        
        # Generate fake enrichment fields
        address = fake.address()
        contact_person = fake.name()
        email_name = contact_person.replace(' ', '').lower()
        email_domain = fake.free_email_domain()
        
        row["partner_address"] = address
        row["contact_person"] = contact_person
        row["partner_email"] = f"{email_name}{random.randint(1, 999)}@{email_domain}"
        
        # Derive city name from address (optional)
        city = address.split()[-1]
        
        if row["channelpartner_category"] == "Own":
            row["shopName"] = f"Contact Shop_{city}_{random.randint(1, 100)}"
        else:
            row["shopName"] = f"{fake.company()}_{city}_{random.randint(1, 100)}"
        
        return row

    enriched_df = df.apply(apply_conditions, axis=1)
    return enriched_df

In Summary, the SDG + Python Faker combination follows a two-step workflow:

Step 1 – Schema-Based Generation (SDG)

Use the SDG component to generate the base dataset with:

Schema-defined structure
Data type and format rules

· Inter-field relationships and conditions

Step 2 – Data Enrichment (Python Faker)

Apply a Python script to enrich the SDG-generated data with realistic attributes such as:

Fake names, addresses, and email IDs
Derived columns based on logical dependencies
Localized or region-specific data (e.g., India-based contact information)
The workflow will look like the screenshot below. Any database writer can be connected to the final Kafka topic to store the generated data in a target database of your choice.

Last updated 2 months ago

hashtagB.1. API Integration Using the BDB Platform to receive data

hashtagCreate a New Pipeline

hashtagConfigure the API Ingestion Component

hashtagFill in Basic Information

hashtagFill in Meta Information

hashtagAdd a Kafka Event

hashtagView the updated Canvas and configure any writer component

hashtagAdd the API Trigger Script

hashtagExplanation of the Script

hashtagActivate the Pipeline

hashtagTrigger and Preview the API Ingestion

hashtagB.2. Sending Email using Python Script

hashtagPipeline

hashtagPython code

hashtagCode Explanation

hashtagDashboard:

hashtagB.3. Simulating Data using SDG and Python

hashtagExample Schema: Product Price Data

hashtagExample Rule Definition

hashtagPython Faker Enrichment Example