Script Executer Job

A Script Executor Job enables you to execute scripts written in multiple programming languages such as Python, directly within the Data Pipeline module.

The job fetches code from a configured GitHub or GitLab repository and runs it seamlessly inside the pipeline. This feature is especially useful for:

  • Automating multi-language workflows.

  • Executing reusable scripts maintained in Git repositories.

  • Integrating custom code into data pipelines.

Prerequisites

Before creating a Script Executor Job:

  • Ensure your GitHub or GitLab credentials are configured in the platform.

  • Verify that your repository is accessible and contains the required script files.

  • Confirm that the correct branch and token authentication are set up.

Refer to Admin Settings: GitHub/GitLab Configuration for details on setting credentials.

Create a Script Executor Job

Navigation path: Data Pipeline > Jobs > Create Job

  1. From the Data Pipeline homepage, click Create Job.

  2. In the right-hand panel:

    • Name: Enter a job name.

    • Description (Optional): Provide details about the job.

    • Job Base Info: Select Script Executor.

  3. Trigger By: Define when the job should execute:

    • On Success: Trigger if a selected job completes successfully.

    • On Failure: Trigger if a selected job fails.

  4. Scheduling:

    • Schedule the job for a specific UTC timestamp.

    • Or leave unscheduled for on-demand activation.

  5. Docker Configuration:

    • Choose a resource allocation profile: Low, Medium, or High.

    • Define:

      • Limit = Maximum CPU/Memory allocation.

      • Request = Minimum CPU/Memory requested at job start.

      • Instances = Number of parallel instances.

  6. Alerts: Configure Job Alerts to receive notifications.

  7. Click Save to create the job.

Once saved, you are redirected to the Job Editor workspace.

Configure Script Executor Metadata

Navigation path: Data Pipeline > Jobs > Job Editor > Meta Information

You can configure the Git source, script details, and execution parameters.

Git Config Options

  • Personal: Configure repository details per job.

    • Git URL: Repository URL (e.g., https://github.com/... or https://gitlab.com/...).

    • User Name: Git username.

    • Token: Access/API token for authentication.

    • Branch: The branch from which the script will be fetched.

  • Admin: Use centrally managed Git credentials.

    • Git configuration is done in Admin Settings (see below).

    • Only script-specific details need to be provided in the job.

Script Execution Parameters

  • Script Type: Choose one: Python, Go, or Julia.

  • Start Script: Name of the script file (e.g., script_name.py, script_name.go).

  • Start Function: Entry function or method to execute.

  • Repository: Name of the Git repository.

  • Input Arguments: Optional parameters for dynamic script execution. Example:

    {"input_file": "data.csv", "threshold": 0.7}

If you select Admin Git Config, you must preconfigure repository access in the platform:

  1. Navigate to Admin > Configurations > Version Control.

  2. From the Version drop-down, select the Git provider (GitHub or GitLab).

  3. Choose DsLabs as the module.

  4. Provide the following:

    • Host: Git host (e.g., github.com, gitlab.com).

    • Token Key: Authentication token for Git.

    • Project: Select the Git project.

    • Branch: Specify the branch.

  5. Click Test to verify the credentials. If successful, click Save.

  6. Navigate to My Account > Configuration.

    • Enter your Git Token and Git Username.

    • Click Save.

    • Admin Settings for GitHub/GitLab

Once configured, these credentials can be reused for all Script Executor Jobs under Admin Git Config mode.


Example Usage

Example: Python Script Execution

  • Script File: data_processor.py

  • Start Function: main

  • Arguments:

    {"input_path": "s3://data/input.csv", "output_path": "s3://data/output.csv"}

Example: Go Script Execution

  • Script File: process.go

  • Start Function: Execute

  • Arguments:

    {"batch_size": 100, "retry_count": 3}