Workflow 4

On-Demand Python Job Execution using BDB Platform

This Workflow highlights the on-demand Python job functionality, enabling users to execute Python scripts at any time through a payload-based API trigger. This dynamic capability provides precise control over data workflows, making it well-suited for real-time execution, automation, and just-in-time data processing within the BDB Platform.

Users start by creating a project in the Data Science Lab module using the Python environment. After uploading a notebook and writing the required code, the script can be registered and seamlessly exported to the Data Pipeline, where it is configured as a job for execution.

DS Lab – Project & Notebook Creation

From the Apps menu, navigate to the Data Science Lab plugin.

Create a new project with the following configurations:

Name: Job Workflow4
Algorithm: Classification and Regression

Environment: Python
Resource Allocation: Based on data needs

Save and activate the project.

Inside the project:

Navigate to the Repo tab.
Choose Import from the 3 dots present in the repo tab.
click on the import option.
Name the notebook.
Upload and save the notebook file.

This script uses clickhouse_driver to:

Accept a payload (a list of dictionaries).
Connect to a ClickHouse database.

Create a table if it doesn’t exist.
Insert the payload data.

Python Script:

from clickhouse_driver import Client

def payload(job_payload=None, host=None, port=None, user=None, password=None, database=None):

# Fallback if job_payload is not passed properly

if not job_payload:

job_payload = [

{"id": 101, "name": "jashika", "age": 20},

{"id": 102, "name": "siya", "age": 40}

]

# Convert list of dicts to list of tuples for ClickHouse insertion

data_tuples = [(item["id"], item["name"], item["age"]) for item in job_payload]

client = Client(host=host, port=port, user=user, password=password, database=database)

client.execute("""

CREATE TABLE IF NOT EXISTS Employees (

id UInt32,

name String,

age UInt8

) ENGINE = MergeTree() ORDER BY id

""")

client.execute("INSERT INTO Employees (id, name, age) VALUES", data_tuples)

Register the notebook script:

Click the three-dot menu on the notebook.
Choose Register, select the function.
Click Next, then Validate.
Once validated, click Export to Pipeline to complete.

A success message confirms the export.

Create Python On-Demand Job .

From the Apps menu, open the Data Pipeline module.

Click Create after navigating to the jobs tab, and enter:

Job Name: Workflow4
Description: Python On-Demand Job
Job Type: Python
Enable On-Demand checkbox

In the Payload field, input a JSON array (copied from the prerequisites document). Example:

[

{"id": 201, "name": "Emma", "age": 30},

{"id": 202, "name": "Liam", "age": 28}

]

Click Save.

Configure Python Job Component

On the job canvas, click the Python component and configure:

Project: Workflow4
Script: Auto-loaded from DS Lab
Start Function: payload

Provide the following input arguments (from prerequisites):

Host
Port
User
Password
Database

Save the configuration.

Activate and Monitor the Job

Click on 'Activate' Icon.

Once activated:

Ensure associated pods are running.
Click on Logs to track execution.

Upon success:

A confirmation message appears.
You can verify data insertion in the ClickHouse database.
Check the Employees table to ensure the payload was inserted.

Conclusion

In this Workflow, we explored how to dynamically trigger a Python job on demand using a payload, API call, or pipeline trigger. From notebook creation in DS Lab to final execution in the Data Pipeline, this approach gives users flexible, real-time job control.

PreviousWorkflow 3 NextWorkflow 5

Last updated 27 days ago