Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • Create
        • Creating a New Pipeline
          • Adding Components to Canvas
          • Connecting Components
            • Events [Kafka and Data Sync]
          • Memory and CPU Allocations
        • Creating a New Job
          • Job Editor Page
          • Spark Job
            • Readers
              • HDFS Reader
              • MongoDB Reader
              • DB Reader
              • S3 Reader
              • Azure Blob Reader
              • ES Reader
              • Sandbox Reader
              • Athena Query Executer
            • Writers
              • HDFS Writer
              • Azure Writer
              • DB Writer
              • ES Writer
              • S3 Writer
              • Sandbox Writer
              • Mongodb Writer
              • Kafka Producer
            • Transformations
          • PySpark Job
          • Python Job
          • Python Job(On demand)
          • Script Executer Job
          • Job Alerts
        • Register as Job
        • Exporting a Script From Data Science Lab
        • Utility
        • Git Sync
      • Overview
        • Jobs
        • Pipeline
      • List Jobs
      • List Pipelines
      • Scheduler
      • Data Channel & Cluster Events
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push & Pull Pipeline
        • Pull Pipeline
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Delete Pipeline
        • Pipeline Component Configuration
        • Pipeline Failure Alert History
        • Format Flowchart
        • Zoom In/Zoom Out
        • Update Component Version
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Pipeline Monitoring
    • Job Monitoring
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • GCS Reader
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader (Docker)
      • Athena Query Executer
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • GCS Monitor
      • Sqoop Executer
      • OPC UA
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
      • Synthetic Data Generator
    • Transformations
      • SQL Component
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
    • Scheduler
    • Alerts
      • Alerts
      • Email Component
    • Job Trigger
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
  • Steps to Configure
  • Configuring meta information of Kafka Consumer:
  1. Components
  2. Consumers

Kafka Consumer

PreviousAWS SNS MonitorNextAPI Ingestion and Webhook Listener

Last updated 1 year ago

The Kafka Consumer component consumes the data from the given Kafka topic. It can consume the data from the same environment and external environment with CSV, JSON, XML, and Avro formats. This comes under the Consumer component group.

All component configurations are classified broadly into the following sections:

  • ​​

  • Meta Information

  • ​​

Check out the steps provided in the demonstration to configure the Kafka Consumer component.

Please Note: It currently supports SSL and Plaintext as Security types.

This Component can read the data from external Brokers as well with SSL as the security type and host Aliases:

Steps to Configure

  • Click on the dragged Kafka Consumer component to get the component properties tabs.

  • Configure the Basic Information tab.

  • Select an Invocation type from the drop-down menu to confirm the running mode of the component. Select the Real-Time option from the drop-down menu.

  • Deployment Type: It displays the deployment type for the component. This field comes pre-selected.

  • Container Image Version: It displays the image version for the docker container. This field comes pre-selected.

  • Failover Event: Select a failover Event from the drop-down menu.

  • Batch Size (min 10): Provide the maximum number of records to be processed in one execution cycle (Min limit for this field is 10

  • Enable Auto-Scaling: Component pod scale up automatically based on a given max instance, if component lag is more than 60%.

Configuring meta information of Kafka Consumer:

  • Topic Name: Specify the topic name that the user wants to consume data from Kafka.

  • Start From: The user will find four options here. Please refer at the bottom of the page for a detailed explanation along with an example.

    • Processed:

      • It Represents the offset that has been successfully processed by the consumer.

      • This is the offset of the last record that has been successfully read and processed by the consumer.

      • By selecting this option, the consumer initiates data consumption from the point where it previously successfully processed, ensuring continuity in the consumption process.

    • Beginning:

      • It Indicates the earliest available offset in a Kafka topic.

      • When a consumer starts reading from the beginning, it means it will read from the first offset available in the topic, effectively reading all messages from the start.

    • Latest:

      • It represents the offset at the end of the topic, indicating the latest available message.

      • When a consumer starts reading from the latest offset, it means it will only read new messages that are produced after the consumer starts.

    • Timestamp:

      • It refers to the timestamp associated with a message. Consumers can seek to a specific timestamp to read messages that were produced up to that timestamp.

      • To utilize this option, users are required to specify both the Start Time and End Time, indicating the range for which they intend to consume data. This allows consumers to retrieve messages within the defined time range for processing.

  • Is External: The user can consume external topic data from the external bootstrap server by enabling the Is External option. The Bootstrap Server and Config fields will display after enabling the Is External option.

  • Bootstrap Server: Enter external bootstrap details.

  • Config: Enter configuration details of external details.

  • Input Record Type: It contains the following input record types:

  • CSV: The user can consume CSV data using this option. The Headers and Separator fields will display if the user selects choose CSV input record type.

    • Header: In this field, the user can enter column names of CSV data that consume from the Kafka topic.

    • Separator: In this field, the user can enter separators like comma (,) that are used in the CSV data.

  • JSON: The user can consume JSON data using this option.

  • XML: The user can consume parquet data using this option.

  • AVRO: The user can consume Avro data using this option.

  • Security Type: It contains the following security types:

    • Plain Text: Choose the Plain Text option if there environment without SSL.

    • Host Aliases: This option contains the following fields:

    • IP: Provide the IP address.

    • Host Names: Provide the Host Names.

  • SSL: Choose the SSL option if there environment with SSL. It will display the following fields:

    • Trust Store Location: Provide the trust store path.

    • Trust Store Password: Provide the trust store password.

    • Key Store Location: Provide the key store path.

    • Key Store Password: Provide the key store password.

    • SSL Key Password: Provide the SSL key password.

    • Host Aliases: This option contains the following fields:

    • IP: Provide the IP.

    • Host Names: Provide the host names.

Please Note: The Host Aliases can be used with the SSL and Plain text Security types.

// Assume the following offsets and timestamps for messages in the topic:

Offset | Timestamp
-------|-------------------
0      | 2024-02-27 10:00 AM
1      | 2024-02-27 11:30 AM
2      | 2024-02-27 01:00 PM
3      | 2024-02-27 02:30 PM
  1. Processed:

    • If a consumer has successfully processed up to offset 2, it means it has processed all messages up to and including the one at offset 2 (timestamp 2024-02-27 01:00 PM). Now, the consumer will resume processing from offset 3 onwards.

  2. Beginning:

    • If a consumer starts reading from the beginning, it will read messages starting from offset 0. It will process messages with timestamps from 2024-02-27 10:00 AM onward.

  3. Latest:

    • If a consumer starts reading from the latest offset, it will only read new messages produced after the consumer starts. Let's say the consumer starts at timestamp 2024-02-27 02:00 PM; it will read only the message at offset 3.

  4. Timestamp:

    • If a consumer seeks to a specific timestamp, for example, 2024-02-27 11:45 AM, it will read messages with offsets 2 and 3, effectively including the messages with timestamps 2024-02-27 01:00 PM and 2024-02-27 02:30 PM, while excluding the messages with timestamps 2024-02-27 10:00 AM and 2024-02-27 11:30 AM.

Component being used to read data from the external broker.
Kafka consumer Meta Information tab

Drag and drop the Kafka Consumer Component to the Workflow Editor.

​Basic Information​
Resource Configuration​
Timestamp
Configuring Kafka Consumer