On-Demand Python Job Execution using the BDB Platform

On-Demand Python Job Execution using BDB Platform

This Workflow highlights the on-demand Python job functionality, enabling users to execute Python scripts at any time through a payload-based API trigger. This dynamic capability provides precise control over data workflows, making it well-suited for real-time execution, automation, and just-in-time data processing within the BDB Platform.

This workflow demonstrates how to create, configure, and execute on-demand Python jobs within the BDB Platform. It enables users to run Python scripts dynamically via payload-based API triggers, offering real-time, flexible, and automated control over data workflows.

Overview

The on-demand Python job functionality in the BDB Platform allows users to:

  • Execute Python scripts at any time using a payload-triggered API.

  • Handle real-time data ingestion, transformation, or computation.

  • Integrate seamlessly between Data Science Lab (DS Lab) and Data Pipeline modules for production-grade deployment.

This capability is ideal for just-in-time data processing, event-driven automation, and real-time integration workflows.

Creating an On-Demand Python Job through DS Lab

Step 1: Create a Project in DS Lab

Procedure

  1. Navigate to the Apps Menu → Data Science Lab plugin.

  2. Click Create and configure the following project details:

    • Name: Job Workflow4

    • Algorithm: Classification and Regression

    • Environment: Python

    • Resource Allocation: Based on dataset size and compute requirements

  3. Click Save, then Activate the project.

Step 2: Create or Import the Python Notebook

Procedure

  1. Inside the project, open the Repo tab.

  2. Click the three-dot (⋮) menu and choose Import.

  3. Name the notebook (e.g., Workflow4_PythonJob).

  4. Upload the notebook file and click Save.

Step 3: Add and Register the Python Script

This script uses clickhouse_driver to:

  • Accept a payload (a list of dictionaries).

  • Connect to a ClickHouse database.

  • Create a table if it doesn’t exist.

  • Insert the payload data.

Script Example

from clickhouse_driver import Client

def payload(job_payload=None, host=None, port=None, user=None, password=None, database=None):
    # Fallback if no payload is provided
    if not job_payload:
        job_payload = [
            {"id": 101, "name": "jashika", "age": 20},
            {"id": 102, "name": "siya", "age": 40}
        ]
    
    # Convert list of dicts to tuples for ClickHouse insertion
    data_tuples = [(item["id"], item["name"], item["age"]) for item in job_payload]
    
    # Connect to ClickHouse
    client = Client(host=host, port=port, user=user, password=password, database=database)
    
    # Create table if it doesn’t exist
    client.execute("""
        CREATE TABLE IF NOT EXISTS Employees (
            id UInt32,
            name String,
            age UInt8
        ) ENGINE = MergeTree() ORDER BY id
    """)
    
    # Insert data into ClickHouse
    client.execute("INSERT INTO Employees (id, name, age) VALUES", data_tuples)

Procedure

  1. Paste the above code into the notebook.

  2. Click the three-dot (⋮) icon in the notebook toolbar.

  3. Select Register.

  4. Choose the payload function.

  5. Click Next.

  6. You will get a validation message.

  7. Select Register as a Job option.

  8. Click Next.

  9. Users will be redirected to Configure Job Info.

  10. Click the Finish option.

Creating an On-Demand Python Job in Data Pipeline

You can quickly set up a Python On-demand job directly within the Data Pipeline module. The following steps detail this creation process:

Step 1: Access the Job section within the Data Pipeline

Procedure

  • Navigate to Apps Menu → Data Pipeline.

  • Click Create under the Jobs section.

  • Configure the job details:

    • Job Name: Workflow4

    • Description: Python On-Demand Job

    • Job Type: Python

    • Enable the On-Demand checkbox.

  • In the Payload Field, provide a JSON array as input, for example:

[
    {"id": 201, "name": "Emma", "age": 30},
    {"id": 202, "name": "Liam", "age": 28}
]
  • Click Save.

Step 2: Configure the Python Job Component

Procedure

  • On the job canvas, click the Python component.

  • Configure the following parameters:

    • Project: Workflow4

    • Script: Auto-loaded from DS Lab export

    • Start Function: payload

  • Enter the Input Arguments:

    • Host

    • Port

    • User

    • Password

    • Database

  • Click Save.

Step 3: Activate and Monitor the Job

Procedure

  • Click the Activate icon to trigger the job.

  • Ensure associated pods are running.

  • Open the Logs tab to track job execution.

Verification: Check the "Employees" table in ClickHouse to confirm successful data insertion.

Results

  • The job executes successfully using a payload-triggered on-demand model.

  • The data defined in the JSON payload is written directly into the database.

  • The user gains real-time control over Python job execution without scheduled dependencies.

Notes and Recommendations:

  • Use payload-based execution for dynamic or event-driven workflows.

  • Ensure the target table and database credentials are configured correctly before activation.

  • Monitor execution logs for validation and debugging.

  • Integrate the job with external triggers (e.g., API calls, message queues, or webhooks) for automated processing.

Best Situation to Use

Use On-Demand Python Jobs when:

  • You need real-time job execution triggered by user input or external systems.

  • Data workflows depend on dynamic payloads (e.g., API-driven data ingestion).

  • You want flexible job automation without rigid scheduling constraints.

  • Integrating Python scripts with ClickHouse or similar databases for fast, event-driven updates.