Python Job (On demand)

An On-Demand Python Job enables you to trigger a Python script at any time using an API call or a pipeline Job Trigger component. Unlike scheduled jobs, On-Demand Jobs allow you to pass a dynamic payload at runtime, making them suitable for real-time or event-driven workloads.

Payloads are submitted as JSON arrays of objects and distributed across job instances for parallel processing.

Prerequisites

Before creating an On-Demand Python Job, ensure the following:

A project exists in the Data Science Lab under the Python Environment.
The project is activated, and a Notebook has been created.
The Notebook script has been saved and exported to the Data Pipeline module.

Refer to the Data Science Lab Projects guide for details on creating and exporting Python scripts.

Create an On-Demand Python Job

Navigation path: Data Pipeline > Jobs > Create Job

From the Data Pipeline homepage, click Create Job.
In the New Job dialog:
- Name: Enter a job name.
- Description (Optional): Provide details about the job.
- Job Base Info: Select Python Job from the drop-down menu.
- On-Demand: Check the On-Demand option.
Configure Docker Resources:
- Select a resource tier: Low, Medium, or High.
- Define resource allocation:
  - Limit = Maximum CPU/Memory available.
  - Request = Minimum CPU/Memory requested at job start.
  - Instances = Number of instances to run in parallel.
Payload: Provide input data in JSON array format. Example:
```
[
  {"emp_id": 0, "department": "IT", "salary": 71357, "working_mode": "Hybrid"},
  {"emp_id": 1, "department": "Operations", "salary": 33411, "working_mode": "Onsite"}
]
```
- Payload is distributed evenly across instances.
- Distribution formula:
  records per instance=⌈payload sizeinstances⌉\text{records per instance} = \lceil \frac{\text{payload size}}{\text{instances}} \rceilrecords per instance=⌈instancespayload size⌉
  Example: Payload size = 10, Instances = 3 → Topics created: 4, 4, 2.
Trigger By: Choose if the job should also trigger based on other jobs:
- On Success: Start when a selected job completes successfully.
- On Failure: Start if a selected job fails.
Click Save.

A confirmation message appears, and the Job Editor page opens.

Topic Naming Convention

When an On-Demand Job runs, payload data is distributed across Kafka-style topics.

Topics are named using the format:
```
<Job_ID>_<n>
```
n starts from 0 and goes up to instances – 1.

Example

Job ID: job_13464363406493
Instances: 3

Resulting topics:

job_13464363406493_0
job_13464363406493_1
job_13464363406493_2

Writing the Notebook Script

When writing scripts for On-Demand Jobs in the Data Science Lab:

The first function argument must represent the job payload.
Example Notebook function:

import logging
from pymongo import MongoClient

def data(job_payload, conn_str, database, collection):
    """
    Insert data into MongoDB based on the provided job_payload.
    
    Parameters:
        job_payload (dict): JSON payload provided at job creation or via API.
        conn_str (str): MongoDB connection string.
        database (str): Target database.
        collection (str): Target collection.
    """
    logging.info(job_payload)
    client = MongoClient(conn_str)
    db = client[database]
    collection = db[collection]
    collection.insert_one(job_payload)
    logging.info(f"Data {job_payload} inserted successfully")

Configure Meta Information

Navigation path: Data Pipeline > Jobs > Job Editor > Meta Information

Project Name: Select the Data Science Lab project containing your Notebook.
Script Name: Choose the exported Notebook.
External Library: Add required libraries (comma-separated). Example:
```
pandas,numpy,scikit-learn
```
Start Function: Select the entry function (e.g., data).
Script: The exported script appears here.
Input Data: Provide parameter key-value pairs if your function accepts additional arguments.

Activate an On-Demand Python Job

On-Demand Jobs can be activated in two ways:

1. From the UI

Open the Job Editor page.
Enter the payload JSON array into the Payload field.
Click the Activate icon.

Example Payload:

[
  {"emp_id": 10, "department": "Sales", "salary": 86318, "working_mode": "Onsite"},
  {"emp_id": 11, "department": "IT", "salary": 57910, "working_mode": "Hybrid"}
]

2. From the Job Trigger Component

Create a pipeline that generates data as JSON output.
Connect a Job Trigger component to the output event.
In the Job Trigger component metadata, select the On-Demand Job to activate.
The event data is automatically passed as the payload.

Example:

Event output contains:

[
  {"output": "jobs/ApportionedIdentifiers.csv"},
  {"output": "jobs/accounts.csv"},
  {"output": "jobs/glue.csv"},
  {"output": "jobs/census_2011.csv"}
]

This payload is sent directly to the On-Demand Python Job.

PreviousPython Job NextScript Executer Job