Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • List Pipelines
      • Creating a New Pipeline
        • Adding Components to Canvas
        • Connecting Components
          • Events [Kafka and Data Sync]
        • Memory and CPU Allocations
      • List Jobs
      • Create Job
        • Job Editor Page
        • Task Components
          • Readers
            • HDFS Reader
            • MongoDB Reader
            • DB Reader
            • S3 Reader
            • Azure Blob Reader
            • ES Reader
            • Sandbox Reader
          • Writers
            • HDFS Writer
            • Azure Writer
            • DB Writer
            • ES Writer
            • S3 Writer
            • Sandbox Writer
            • Mongodb Writer
            • Kafka Producer
          • Transformations
        • PySpark Job
        • Python Job
      • List Components
      • Delete Orphan Pods
      • Scheduler
      • Data Channel
      • Cluster Event
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push Pipeline (to VCS/GIT)
        • Pull Pipeline
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Pipeline Monitoring
        • Delete Pipeline
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Monitoring Pipeline
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
    • Transformations
      • SQL Component
      • Dateprep Script Runner
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Email Component
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
    • Scheduler
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
  • Creating a Kafka Event
  • Updating a Kafka Event
  • Flushing Events
  • Connecting Event to a Component
  • Data Sync Event
  • Benefits of using the Data Sync Event
  • Creating a Data Sync Event
  1. Getting Started
  2. Homepage
  3. Creating a New Pipeline
  4. Connecting Components

Events [Kafka and Data Sync]

PreviousConnecting ComponentsNextMemory and CPU Allocations

Last updated 2 years ago

Creating a Kafka Event

  • Navigate to the page.

  • Click the Toggle Event Panelicon from the header.

  • Click the Add New Event icon from the Event Panel.

  • The New Event window opens.

  • Provide a name for the new Event.

  • Select Event Duration: Select an option from the below-given options.

* Short (4 hours)

* Medium (8 hours)

* Long (48 hours)

* Full Day (24 hours)

* Long (48 hours)*

Week (168 hours)

​Please Note: The event data gets erased after 7 days if no duration option is selected from the available options. The Offsets expire as well.

  • Provide No. of Partitions (1-50). By default, the number of partitions will be 3.

  • No. of Outputs: The user can define the number of outputs using this field.

  • Checkmark the 'Is Failover' option to enable the Failover Event.

  • Select a Pipeline using the drop-down menu if you have selected the ‘Is Shared?’ option.

  • Click the Add Event option.

  • A notification message appears.

  • The newly created Event gets added to the Private tab in the Events panel.

Updating a Kafka Event

The user can use the below-given steps to update a Kafka Event.

  • Drag the created Event component to the Pipeline Editor canvas.

  • Click the dragged Event component to open the Basic Info configuration fields.

  • The user can edit the following information (except No. of Partition and Event Duration):

  • Event Name: Modify the event name.

  • No. of Outputs: Set the number of outputs (the maximum allowed number of outputs are 3).

  • ‘Is Failover?’: Enable this option to create a failover event.

  • Select Pipeline: Select one pipeline or multiple pipelines using the drop-down list.

  • Click the Save Event icon to save the changes.

  • The user gets a notification message stating that the pipeline update is a success.

  • The targeted Event component gets updated.

Flushing Events

Kafka events can be flushed to delete all records present. Flushing an event retains the offsets of the topic by setting the start-offset value to the end-offset. Events can be flushed by using the "Flush Event" button beside the respective event in the event panel, and all events can be flushed at once by using the "Flush All" button. This button is present at the top of the event panel.

Connecting Event to a Component

  • Drag a (reader) component to the canvas.

  • Configure the parameters of the dragged reader.

  • Drag the Event from the Events Panel.

  • Connect the dragged reader component as the input connection to the dragged Event.

  • Click the Update Pipeline icon to save the pipeline workflow.

Data Sync Event

Data Sync Event in the Data Pipeline module is used to write the required data directly to the any of the databases without using the Kafka Event and writer components in the pipeline. Please refer the below image for reference:

It can be seen in the above image that Data Event Event will directly write the data read by MongoDB reader component to the table of configured Database in Data Sync without using a Kafka Event in-between.

Benefits of using the Data Sync Event

  • It doesn't need Kafka event to read the data. It can be connected with any component to read the data and it writes it to the tables of respective databases.

  • Pipeline complexity is reduced because Kafka event and writer is not needed to use in the pipeline.

  • Since, writers are not used, the resource consumption are low.

  • Once Data sync are configured, multiple Data Sync events can be created for the same configuration and the data can be written to multiple tables.

Creating a Data Sync Event

DB Sync Event enables direct write to DB that helps in reducing the usage of additional compute resources like Writers in the Pipeline Workflow.

Please Note: The supported drivers for the Data Sync component are as listed below:

  • ClickHouse

  • MongoDB

  • MSSQL

  • MySQL

  • Oracle

  • PostgreSQL

  • Snowflake

  • Navigate to the Pipeline Editor page.

  • Click on the DB Sync tab.

  • Click on the Add New Data Sync (+) icon from the Toggle Event Panel.

  • The Create Data Sync window opens.

    • Provide a display name for the new Data Sync.

    • Select the Driver

  • Click the Save option.

  • Drag and drop Data Sync Event to the workflow editor.

  • Click on the dragged Data Sync component.

  • The Basic Information tab appears with the following fields:

    • Display Name: Display name of the Data Sync

    • Event Name: Event name of the Data Sync

    • Table name: Specify table name.

    • Driver: this field will be pre-selected.

    • Save Mode: Select save mode from the drop-down: Append or Upsert.

    • Composite Key: This field is optional. This field will only appear when upsert is selected as the Save Mode.

    • Click on the Save Data Sync icon to save the Data Sync information.

    ​

  • Connect the dragged Data Sync Event to the reader component as displayed below:

  • Update and activate the pipeline.

  • Open the Logs tab to view whether the data gets written to a specified table.​​

Please Note:

  • In the Save mode, there are two available options.

    • Append

    • Upsert: One extra field will be displayed for upsert save mode i.e.: Composite Key.

  • When the SQL component is set to Aggregate Query mode and connected to Data Sync, the data resulting from the query will not be written to the Data Sync event. Please refer to the following image for a visual representation of the flow and avoid using such scenario.

​

The Events Panel appears, and the Toggle Event Panel icon gets changed assuggesting that the event panel is displayed.

​

​​

Basic Info of the created event.

Pre-requisite: Before Creating the Data Sync Event, the user has to configure the Data Sync section under the page.

Click the Toggle Event Panelicon from the header.

The Events Panel appears, and the Toggle Event Panel icon gets changed as, suggesting that the event panel is displayed.

​

Please Note: Only the configured drivers from the page get listed under the Create Data Sync wizard.

Settings
Settings
Pipeline Editor
Dropping the Event from the event panel.
Flush all events button on top; individual events' flush buttons.
Drag the Event from the Events Panel.
Data Sync Event