Connecting Components

An event-driven architecture uses events to trigger and communicate between decoupled services and is common in modern applications built with microservices.

What is a Connecting Component?

Each component within the pipeline is fully decoupled, functioning independently as both a producer and consumer of data. The architecture follows an event-driven orchestration model, where the interaction between components is mediated through events rather than direct calls.

To facilitate the transfer of output from one component to another, an intermediary event is required. This ensures loose coupling and scalability across the system.

The connector elements enable the integration of individual components to build a complete pipeline workflow. Simply click and drag the desired components onto the editor canvas. To establish data flow between components, link the output of each component to an event, which serves as the medium for transferring data to the next stage in the pipeline.

This design promotes asynchronous processing, fault isolation, and horizontal scalability, making it highly suitable for complex and distributed pipeline workflows.

Pipeline Assembling Process

The process of assembling a pipeline can be divided into two primary stages:

  1. Adding Components to the Canvas

    • These components represent the functional units of the pipeline.

    • They can be either system-defined pipeline components or custom-developed components tailored to specific requirements.

    • Simply drag and drop the required components onto the editor canvas to begin constructing the workflow.

  2. Adding Connecting Components (Events)

    • To establish the data flow and define the execution sequence within the pipeline, connecting components (events) must be added.

    • These events act as data transfer mechanisms between pipeline stages.

    • Supported event types include Kafka topics for real-time streaming or Data Sync modules for batch or scheduled data exchange.

Event Driven Architecture

This two-step process ensures a modular, event-driven, and scalable pipeline architecture. An event-driven architecture typically comprises the following three core elements:

  1. Event Producers – Components or services that generate events in response to changes in state or specific operations.

  2. Event Routers or Brokers – Middleware systems (e.g., Kafka, Data Sync) that route events from producers to consumers.

  3. Event Consumers – Components that listen for, process, and act upon incoming events to perform their designated tasks (e.g., RabbitMQ Consumer, Kafka Consumer, etc).

Event Types in Data Pipelines

Kafka Events

A Kafka Event enables real-time data ingestion and streaming within the pipeline by integrating with Apache Kafka topics. It acts as a connector that consumes messages from or publishes messages to Kafka, allowing seamless data exchange between distributed systems and applications.

Benefits

  • Real-time processing: Facilitates near-instantaneous data flow across components.

  • High throughput: Efficiently handles large volumes of streaming data.

  • Scalability: Integrates easily with multiple producers and consumers.

  • Fault tolerance: Ensures reliability with Kafka’s distributed architecture and message persistence.

  • Decoupling: Promotes loose coupling between data producers and consumers, simplifying integration

Data Sync Events

A Data Sync Event guarantees data consistency between a Source component and a Target component in the pipeline. It synchronizes updates, ensuring that downstream systems always get the most recent data.

Benefits

  • Data consistency: Keeps source and target data aligned in near real-time or scheduled intervals.

  • Flexibility: Supports both incremental updates (only new/modified records) and full synchronization.

  • Automation: Eliminates the need for manual refreshes or data pulls.

  • Seamless integration: Ideal for updating reporting systems, data warehouses, or APIs with current data.

  • Improved reliability: Reduces risks of working with stale or outdated datasets.

Comparison: Kafka Events vs Data Sync Events

Aspect

Kafka Events

Data Sync Events

Primary Purpose

Real-time ingestion and streaming of event data.

Synchronization of data between source and target components.

Data Flow

Continuous stream of messages from producers to consumers.

Periodic or triggered synchronization to ensure consistency.

Use Cases

Log aggregation, sensor/IoT data streaming, event-driven architectures.

Updating reporting databases, refreshing data warehouses, syncing APIs.

Processing Mode

Asynchronous and real-time.

Batch-oriented or incremental updates.

Integration

Connects with Apache Kafka topics for producing/consuming events.

Connects pipeline source and target components directly.

Scalability

Highly scalable; supports distributed, high-volume event streams.

Scales with pipeline configuration but oriented toward consistency rather than volume.

Reliability

Fault-tolerant with message persistence and replay.

Ensures data accuracy by aligning source and target states.

Best For

High-velocity event data requiring immediate processing.

Data sets where freshness and alignment across systems are critical.

Together, Kafka Events and Data Sync Events provide complementary capabilities:

  • Kafka Events specializes in real-time streaming and high-volume ingestion.

  • Data Sync Events focus on data consistency and synchronization across systems.