Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • Create
        • Creating a New Pipeline
          • Adding Components to Canvas
          • Connecting Components
            • Events [Kafka and Data Sync]
          • Memory and CPU Allocations
        • Creating a New Job
          • Job Editor Page
          • Spark Job
            • Readers
              • HDFS Reader
              • MongoDB Reader
              • DB Reader
              • S3 Reader
              • Azure Blob Reader
              • ES Reader
              • Sandbox Reader
              • Athena Query Executer
            • Writers
              • HDFS Writer
              • Azure Writer
              • DB Writer
              • ES Writer
              • S3 Writer
              • Sandbox Writer
              • Mongodb Writer
              • Kafka Producer
            • Transformations
          • PySpark Job
          • Python Job
          • Python Job(On demand)
          • Script Executer Job
          • Job Alerts
        • Register as Job
        • Exporting a Script From Data Science Lab
        • Utility
        • Git Sync
      • Overview
        • Jobs
        • Pipeline
      • List Jobs
      • List Pipelines
      • Scheduler
      • Data Channel & Cluster Events
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push & Pull Pipeline
        • Pull Pipeline
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Delete Pipeline
        • Pipeline Component Configuration
        • Pipeline Failure Alert History
        • Format Flowchart
        • Zoom In/Zoom Out
        • Update Component Version
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Pipeline Monitoring
    • Job Monitoring
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • GCS Reader
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader (Docker)
      • Athena Query Executer
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • GCS Monitor
      • Sqoop Executer
      • OPC UA
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
      • Synthetic Data Generator
    • Transformations
      • SQL Component
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
    • Scheduler
    • Alerts
      • Alerts
      • Email Component
    • Job Trigger
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
  • Invocation Type
  • Real-Time
  • Intelligent Scaling
  • Batch
  • Batch Size
  • Failover Events
  1. Components

Component Base Configuration

This page pays attention to describe the Basic Info tab provided for the pipeline components. This tab has to be configured for all the components.

PreviousComponent ArchitectureNextResource Configuration

Last updated 11 months ago

Invocation Type

The Invocation-Type config decides the type of deployment of the component. There are following two types of invocations:

  1. Real-Time

  2. Batch

Real-Time

When the Component has the real-time invocation, the component never goes down when the pipeline is active. This is for situations where you want to keep the component ready all the time to consume data.

Intelligent Scaling

When "Realtime" is selected as the invocation type, we have an additional option to scale up the component called "Intelligent Scaling."

Please Note: The First Component of the pipeline must be in real-time invocation.

Batch

When the component has the batch invocation type then the component needs a trigger to initiate the process from the previous event. Once the process of the component is finished and there are no new events to process the component goes down.

These are really helpful in Batch or scheduled operations where the data is not streaming or real-time.

Please Note: When the users select the Batch invocation type, they get an additional option of the Grace Period (in sec). This grace period is the time that the component will take to go down gracefully. The default value for Grace Period is 60 seconds and it can be configured by the user.​

Batch Size

The pipeline components process the data in micro-batches. This batch size is given to define the maximum number of records that you want to process in a single cycle of operation; This is really helpful if you want to control the number of records being processed by the component if the unit record size is huge. You can configure it in the base config of the components.

The below given illustration displays how to update the Batch Size configuration.

Failover Events

We can create a failover event and map it in the component base configuration, so that if the component fails it audits all the failure messages with the data (if available) and timestamp of the error.

Go through the illustration given below to understand the Failover Events.

Please refer to the following page to learn more about .

Intelligent Scaling
Real-time configuration
Batch Configuration
Configuring Failover Events
Configuring Basic Information