Python Job (On demand)

An On-Demand Python Job enables you to trigger a Python script at any time using an API call or a pipeline Job Trigger component. Unlike scheduled jobs, On-Demand Jobs allow you to pass a dynamic payload at runtime, making them suitable for real-time or event-driven workloads.

Payloads are submitted as JSON arrays of objects and distributed across job instances for parallel processing.

Prerequisites

Before creating an On-Demand Python Job, ensure the following:

  • A project exists in the Data Science Lab under the Python Environment.

  • The project is activated, and a Notebook has been created.

  • The Notebook script has been saved and exported to the Data Pipeline module.

Create an On-Demand Python Job

Navigation path: Data Pipeline > Jobs > Create Job

  1. From the Data Pipeline homepage, click Create Job.

  2. In the New Job dialog:

    • Name: Enter a job name.

    • Description (Optional): Provide details about the job.

    • Job Base Info: Select Python Job from the drop-down menu.

    • On-Demand: Check the On-Demand option.

  3. Configure Docker Resources:

    • Select a resource tier: Low, Medium, or High.

    • Define resource allocation:

      • Limit = Maximum CPU/Memory available.

      • Request = Minimum CPU/Memory requested at job start.

      • Instances = Number of instances to run in parallel.

  4. Payload: Provide input data in JSON array format. Example:

    [
      {"emp_id": 0, "department": "IT", "salary": 71357, "working_mode": "Hybrid"},
      {"emp_id": 1, "department": "Operations", "salary": 33411, "working_mode": "Onsite"}
    ]
    • Payload is distributed evenly across instances.

    • Distribution formula:

      records per instance=⌈payload sizeinstances⌉\text{records per instance} = \lceil \frac{\text{payload size}}{\text{instances}} \rceilrecords per instance=⌈instancespayload size​⌉

      Example: Payload size = 10, Instances = 3 → Topics created: 4, 4, 2.

  5. Trigger By: Choose if the job should also trigger based on other jobs:

    • On Success: Start when a selected job completes successfully.

    • On Failure: Start if a selected job fails.

  6. Click Save.

A confirmation message appears, and the Job Editor page opens.

Topic Naming Convention

When an On-Demand Job runs, payload data is distributed across Kafka-style topics.

  • Topics are named using the format:

    <Job_ID>_<n>
  • n starts from 0 and goes up to instances – 1.

Example

  • Job ID: job_13464363406493

  • Instances: 3

Resulting topics:

  • job_13464363406493_0

  • job_13464363406493_1

  • job_13464363406493_2

Writing the Notebook Script

When writing scripts for On-Demand Jobs in the Data Science Lab:

  • The first function argument must represent the job payload.

  • Example Notebook function:

import logging
from pymongo import MongoClient

def data(job_payload, conn_str, database, collection):
    """
    Insert data into MongoDB based on the provided job_payload.
    
    Parameters:
        job_payload (dict): JSON payload provided at job creation or via API.
        conn_str (str): MongoDB connection string.
        database (str): Target database.
        collection (str): Target collection.
    """
    logging.info(job_payload)
    client = MongoClient(conn_str)
    db = client[database]
    collection = db[collection]
    collection.insert_one(job_payload)
    logging.info(f"Data {job_payload} inserted successfully")

Configure Meta Information

Navigation path: Data Pipeline > Jobs > Job Editor > Meta Information

  1. Project Name: Select the Data Science Lab project containing your Notebook.

  2. Script Name: Choose the exported Notebook.

  3. External Library: Add required libraries (comma-separated). Example:

    pandas,numpy,scikit-learn
  4. Start Function: Select the entry function (e.g., data).

  5. Script: The exported script appears here.

  6. Input Data: Provide parameter key-value pairs if your function accepts additional arguments.

Activate an On-Demand Python Job

On-Demand Jobs can be activated in two ways:

1. From the UI

  1. Open the Job Editor page.

  2. Enter the payload JSON array into the Payload field.

  3. Click the Activate icon.

Example Payload:

[
  {"emp_id": 10, "department": "Sales", "salary": 86318, "working_mode": "Onsite"},
  {"emp_id": 11, "department": "IT", "salary": 57910, "working_mode": "Hybrid"}
]

2. From the Job Trigger Component

  1. Create a pipeline that generates data as JSON output.

  2. Connect a Job Trigger component to the output event.

  3. In the Job Trigger component metadata, select the On-Demand Job to activate.

  4. The event data is automatically passed as the payload.

Example:

Event output contains:

[
  {"output": "jobs/ApportionedIdentifiers.csv"},
  {"output": "jobs/accounts.csv"},
  {"output": "jobs/glue.csv"},
  {"output": "jobs/census_2011.csv"}
]

This payload is sent directly to the On-Demand Python Job.