Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • List Pipelines
      • Creating a New Pipeline
        • Adding Components to Canvas
        • Connecting Components
          • Events [Kafka and Data Sync]
        • Memory and CPU Allocations
      • List Jobs
      • Create Job
        • Job Editor Page
        • Task Components
          • Readers
            • HDFS Reader
            • MongoDB Reader
            • DB Reader
            • S3 Reader
            • Azure Blob Reader
            • ES Reader
            • Sandbox Reader
          • Writers
            • HDFS Writer
            • Azure Writer
            • DB Writer
            • ES Writer
            • S3 Writer
            • Sandbox Writer
            • Mongodb Writer
            • Kafka Producer
          • Transformations
        • PySpark Job
        • Python Job
      • List Components
      • Delete Orphan Pods
      • Scheduler
      • Data Channel
      • Cluster Event
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push Pipeline (to VCS/GIT)
        • Pull Pipeline
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Pipeline Monitoring
        • Delete Pipeline
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Monitoring Pipeline
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
    • Transformations
      • SQL Component
      • Dateprep Script Runner
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Email Component
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
    • Scheduler
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
  • Creating a Python Job
  • Configuring the Meta information of Python Job
  1. Getting Started
  2. Homepage
  3. Create Job

Python Job

PreviousPySpark JobNextList Components

Last updated 1 year ago

This feature allows users to write their own Python script and run their script in the Jobs section of Data Pipeline module.

Before creating the Python Job, the user has to create a project in the DS Lab module under Python Environment. Please refer the below image for reference:

After creating the project, the user needs to activate it and create a Notebook where they can write their own Python script. Once the script is written, the user must save it and export it to be able to use it in Python Jobs.

Creating a Python Job

  1. Click on the Data Pipeline module from the homepage.

  2. Click on the Create Job icon on the list pipeline page.

  1. The New Job dialog box appears redirecting the user to create a new Job.

  2. Enter a name for the new Job.

  3. Describe the Job (Optional).

  4. Job Baseinfo: Select Python Job from the drop-down.

  5. Trigger By: There are 2 options for triggering a job on success or failure of a job: Success Job: On successful execution of the selected job the current job will be triggered. Failure Job: On failure of the selected job the current job will be triggered.

  6. Is Scheduled?

    • A job can be scheduled for a particular timestamp. Every time at the same timestamp the job will be triggered.

    • Job must be scheduled according to UTC.

  7. Docker Configuration

    • Select a resource allocation option using the radio button. The given choices are:

      1. Low

      2. Medium

      3. High

    • Provide the resources required to run the python Job in the limit and Request section. 1. Limit: Enter max CPU and Memory required for the Python Job. 2. Request: Enter the CPU and Memory required for the job at the start. 3. Instances: Enter the number of instances for the Python Job.

  8. Click the Save option to save the Python Job.

  1. The Python Job gets saved, and it will redirect the user to the Job Editor workspace.

Check out the below given demonstration configure a Python Job.

Configuring the Meta information of Python Job

Once the Python Job is created, follow the below given steps to configure the Meta Information tab of the Python Job.

  • Project Name: Select the same Project using the drop-down menu where the Notebook has been created.

  • Script Name: This field will list the exported Notebook names which are exported from the Data Science Lab module to Data Pipeline.

  • External Library: If any external libraries are used in the script the user can mention it here. The user can mention multiple libraries by giving comma (,) in between the names.

  • Start Function: Select the function name in which the script has been written.

  • Script: The Exported script appears under this space.

  • Input Data: If any parameter has been given in the function, then the name of the parameter is provided as Key, and value of the parameters has to be provided as value in this field.

Configuring Python Job
Creating a Python Job
Meta information tab