On-Demand Python Job Execution using the BDB Platform

On-Demand Python Job Execution using BDB Platform

This Workflow highlights the on-demand Python job functionality, enabling users to execute Python scripts at any time through a payload-based API trigger. This dynamic capability provides precise control over data workflows, making it well-suited for real-time execution, automation, and just-in-time data processing within the BDB Platform.

This workflow demonstrates how to create, configure, and execute on-demand Python jobs within the BDB Platform. It enables users to run Python scripts dynamically via payload-based API triggers, offering real-time, flexible, and automated control over data workflows.

Overview

The on-demand Python job functionality in the BDB Platform allows users to:

Execute Python scripts at any time using a payload-triggered API.
Handle real-time data ingestion, transformation, or computation.
Integrate seamlessly between Data Science Lab (DS Lab) and Data Pipeline modules for production-grade deployment.

This capability is ideal for just-in-time data processing, event-driven automation, and real-time integration workflows.

Prerequisites:

Before beginning, ensure that:

You have active access to the Data Science Lab and Data Pipeline modules.
The ClickHouse database (or target data source) is available and accessible.
You have the required user credentials (host, port, username, password, database name).

Creating an On-Demand Python Job through DS Lab

Step 1: Create a Project in DS Lab

Procedure

Navigate to the Apps Menu → Data Science Lab plugin.
Click Create and configure the following project details:
- Name: Job Workflow4
- Algorithm: Classification and Regression
- Environment: Python
- Resource Allocation: Based on dataset size and compute requirements
Click Save, then Activate the project.

A Python-based DS Lab project is created and activated.

Step 2: Create or Import the Python Notebook

Procedure

Inside the project, open the Repo tab.
Click the three-dot (⋮) menu and choose Import.
Name the notebook (e.g., Workflow4_PythonJob).
Upload the notebook file and click Save.

The notebook gets imported and is ready for script editing.

Step 3: Add and Register the Python Script

This script uses clickhouse_driver to:

Accept a payload (a list of dictionaries).
Connect to a ClickHouse database.
Create a table if it doesn’t exist.
Insert the payload data.

Script Example

from clickhouse_driver import Client

def payload(job_payload=None, host=None, port=None, user=None, password=None, database=None):
    # Fallback if no payload is provided
    if not job_payload:
        job_payload = [
            {"id": 101, "name": "jashika", "age": 20},
            {"id": 102, "name": "siya", "age": 40}
        ]
    
    # Convert list of dicts to tuples for ClickHouse insertion
    data_tuples = [(item["id"], item["name"], item["age"]) for item in job_payload]
    
    # Connect to ClickHouse
    client = Client(host=host, port=port, user=user, password=password, database=database)
    
    # Create table if it doesn’t exist
    client.execute("""
        CREATE TABLE IF NOT EXISTS Employees (
            id UInt32,
            name String,
            age UInt8
        ) ENGINE = MergeTree() ORDER BY id
    """)
    
    # Insert data into ClickHouse
    client.execute("INSERT INTO Employees (id, name, age) VALUES", data_tuples)

Procedure

Paste the above code into the notebook.
Click the three-dot (⋮) icon in the notebook toolbar.
Select Register.
Choose the payload function.
Click Next.
You will get a validation message.
Select Register as a Job option.
Click Next.
Users will be redirected to Configure Job Info.
Click the Finish option.

A success message confirms registration.
The notebook is exported as a Python On-demand job to the Data Pipeline.
- The exported On-Demand Job will be in the Activated state by default.

Creating an On-Demand Python Job in Data Pipeline

You can quickly set up a Python On-demand job directly within the Data Pipeline module. The following steps detail this creation process:

Step 1: Access the Job section within the Data Pipeline

Procedure

Navigate to Apps Menu → Data Pipeline.
Click Create under the Jobs section.
Configure the job details:
- Job Name: Workflow4
- Description: Python On-Demand Job
- Job Type: Python
- Enable the On-Demand checkbox.
In the Payload Field, provide a JSON array as input, for example:

[
    {"id": 201, "name": "Emma", "age": 30},
    {"id": 202, "name": "Liam", "age": 28}
]

Click Save.

The Python On-Demand job is created and ready for configuration.

Step 2: Configure the Python Job Component

Procedure

On the job canvas, click the Python component.
Configure the following parameters:
- Project: Workflow4
- Script: Auto-loaded from DS Lab export
- Start Function: payload
Enter the Input Arguments:
- Host
- Port
- User
- Password
- Database
Click Save.

The Python job configuration is saved and ready for execution.

Step 3: Activate and Monitor the Job

Procedure

Click the Activate icon to trigger the job.
Ensure associated pods are running.
Open the Logs tab to track job execution.

Upon successful completion, a confirmation message appears.
The payload data is inserted into the Employees table in the ClickHouse database.

Verification: Check the "Employees" table in ClickHouse to confirm successful data insertion.

Results

The job executes successfully using a payload-triggered on-demand model.
The data defined in the JSON payload is written directly into the database.
The user gains real-time control over Python job execution without scheduled dependencies.

Notes and Recommendations:

Use payload-based execution for dynamic or event-driven workflows.
Ensure the target table and database credentials are configured correctly before activation.
Monitor execution logs for validation and debugging.
Integrate the job with external triggers (e.g., API calls, message queues, or webhooks) for automated processing.

Best Situation to Use

Use On-Demand Python Jobs when:

You need real-time job execution triggered by user input or external systems.
Data workflows depend on dynamic payloads (e.g., API-driven data ingestion).
You want flexible job automation without rigid scheduling constraints.
Integrating Python scripts with ClickHouse or similar databases for fast, event-driven updates.

PreviousCreating and Executing a PySpark Job NextAutomating Python (On-Demand) Job Execution Using Job Trigger