Workflow 4

On-Demand Python Job Execution using BDB Platform

This Workflow highlights the on-demand Python job functionality, enabling users to execute Python scripts at any time through a payload-based API trigger. This dynamic capability provides precise control over data workflows, making it well-suited for real-time execution, automation, and just-in-time data processing within the BDB Platform.

Users start by creating a project in the Data Science Lab module using the Python environment. After uploading a notebook and writing the required code, the script can be registered and seamlessly exported to the Data Pipeline, where it is configured as a job for execution.

DS Lab – Project & Notebook Creation

From the Apps menu, navigate to the Data Science Lab plugin.

Create a new project with the following configurations:

  • Name: Job Workflow4

  • Algorithm: Classification and Regression

  • Environment: Python

  • Resource Allocation: Based on data needs

Save and activate the project.

Inside the project:

  • Navigate to the Repo tab.

  • Choose Import from the 3 dots present in the repo tab.

  • click on the import option.

  • Name the notebook.

  • Upload and save the notebook file.

This script uses clickhouse_driver to:

  • Accept a payload (a list of dictionaries).

  • Connect to a ClickHouse database.

  • Create a table if it doesn’t exist.

  • Insert the payload data.

Python Script:

from clickhouse_driver import Client

def payload(job_payload=None, host=None, port=None, user=None, password=None, database=None):

# Fallback if job_payload is not passed properly

if not job_payload:

job_payload = [

{"id": 101, "name": "jashika", "age": 20},

{"id": 102, "name": "siya", "age": 40}

]

# Convert list of dicts to list of tuples for ClickHouse insertion

data_tuples = [(item["id"], item["name"], item["age"]) for item in job_payload]

client = Client(host=host, port=port, user=user, password=password, database=database)

client.execute("""

CREATE TABLE IF NOT EXISTS Employees (

id UInt32,

name String,

age UInt8

) ENGINE = MergeTree() ORDER BY id

""")

client.execute("INSERT INTO Employees (id, name, age) VALUES", data_tuples)

Register the notebook script:

  • Click the three-dot menu on the notebook.

  • Choose Register, select the function.

  • Click Next, then Validate.

  • Once validated, click Export to Pipeline to complete.

A success message confirms the export.

Create Python On-Demand Job .

From the Apps menu, open the Data Pipeline module.

Click Create after navigating to the jobs tab, and enter:

  • Job Name: Workflow4

  • Description: Python On-Demand Job

  • Job Type: Python

  • Enable On-Demand checkbox

In the Payload field, input a JSON array (copied from the prerequisites document). Example:

[

{"id": 201, "name": "Emma", "age": 30},

{"id": 202, "name": "Liam", "age": 28}

]

Click Save.

Configure Python Job Component

On the job canvas, click the Python component and configure:

  • Project: Workflow4

  • Script: Auto-loaded from DS Lab

  • Start Function: payload

Provide the following input arguments (from prerequisites):

  • Host

  • Port

  • User

  • Password

  • Database

Save the configuration.

Activate and Monitor the Job

Click on 'Activate' Icon.

Once activated:

  • Ensure associated pods are running.

  • Click on Logs to track execution.

Upon success:

  • A confirmation message appears.

  • You can verify data insertion in the ClickHouse database.

  • Check the Employees table to ensure the payload was inserted.

Conclusion

In this Workflow, we explored how to dynamically trigger a Python job on demand using a payload, API call, or pipeline trigger. From notebook creation in DS Lab to final execution in the Data Pipeline, this approach gives users flexible, real-time job control.

Last updated