Kafka Events

This page offers a detailed overview of the connecting components within the BDB Data Pipeline, specifically focusing on Kafka Events.

What is a Kafka Event?

A Kafka event refers to a single piece of data or message that is exchanged between producers and consumers in a Kafka messaging system. Kafka events are also known as records or messages. They typically consist of a key, a value, and metadata such as the topic and partition. Producers publish events to Kafka topics, and consumers subscribe to these topics to consume events. Kafka events are often used for real-time data streaming, messaging, and event-driven architectures.

Creating a Kafka Event from the Event Panel

  • Navigate to the Pipeline Workflow Editor page.

  • Click on the Add Component/ Event icon.

  • Open the Events tab from the panel that opens below.

  • Click the Add New Event icon from the Events tab.

  • Provide a display name for the new Event.

  • Select Event Duration: Select an option from the given options.

    • Short (4 hours)

    • Medium (8 hours)

    • Half Day (12 hours)

    • Full Day (24 hours)

    • Long (48 hours)

    • Week (168 hours)

Please note: The Event data gets erased after 7 days if no duration option is selected from the available options. The Offsets expire as well.

  • No. of Partitions: Enter a value between 1 to 100. The default number of partitions is 3.

  • No. of Outputs: Define the number of outputs using this field.

  • Is Failover: Enable this option to create the event as the Failover Event.

    • If a Failover Event is created, it must be mapped with a component to retrieve failed data from that component. In the present example, the Event is not created as a Failover Event.

  • Click the Add Kafka Event option.

  • A new Kafka Event will be created and added to the Events section provided under the Events tab.

  • Once the Kafka Event is created, the user can drag it to the pipeline workspace and connect it to any component. The Event will also be auto-connected to the nearest pipeline component.

Please note:

  • When the user hovers over an event in the pipeline workspace, the following details about the event are displayed:

    • Event Name

    • Duration

    • Number of Partitions

  • The user can edit the following information of the Kafka Event after dragging it to the pipeline workspace:

    • Display Name

    • No. of Outputs

    • Is Failover

Mapping a Failover Event

A Failover Event is designed to capture data that a component in the pipeline fails to process. In cases where a connected event's data cannot be processed successfully by a component, the failed data is sent to the Failover Event.

Please note: To create a Failover Event, follow the same steps outlined for creating a regular Event. Additionally, enable the Is Failover option to designate it as a failover event.

Follow these steps to map a Failover Event to a component within a Pipeline workflow:

  • Navigate to the Events tab.

  • Drag the Failover Event to the pipeline workspace.

  • Navigate to the Basic Information tab of the desired component where the Failover Event should be mapped.

  • From the drop-down, select the Failover Event using the Failover Event drop-down.

  • Save the component configuration.

  • A success notification appears to ensure that the component is updated and the Failover Event is successfully mapped.

  • Activate the pipeline workflow and check out the logs using the Log panel.

  • A notification will be displayed to notify the users if the mapped pipeline component encounters processing failures with data from its preceding event.

  • Open the Logs tab to check if the mapped component encounters processing failures with data from its preceding event.

  • The failed data will be directed to the mapped Failover Event, and the same is notified to the users through the latest logs.

  • The Failover Event holds the following keys, along with the failed data in the data preview:

    • Cause: Cause of failure.

    • eventTime: Date and Time when the data fails.

  • When you hover over the Failover Event, the corresponding component in the pipeline to which it is mapped will be highlighted. Refer to the image below for a visual representation.

    Highlighting the Mapped Pipeline Component to a Failover Event

Auto-connecting Kafka Event to a component

This feature automatically connects the Kafka Event to the component when it is dragged from the Events panel. To use this feature, users must ensure that the "Auto connect components on drag" option is enabled in the Events panel (it comes enabled in the default setting).

Auto-connecting Kafka Event to a Component in a Pipeline Workflow

Adding a Kafka Event to Pipeline Workflow

This feature allows users to directly connect/ add a Kafka/Data Sync event to a component by right-clicking on the dragged component on the pipeline workspace canvas.

Follow these steps to directly connect a Kafka/Data Sync event to a component in the pipeline:

  • Right-click on the dragged pipeline component.

  • Select the Add Kafka Event option from the context menu.

  • The Create Kafka Event dialog box will open.

  • Enter the required details.

  • Click on the "Add Kafka Event" option.

  • The newly created Kafka Event will be directly connected to the component.

Map an Event

This feature allows users to map an existing Kafka event to a new Kafka event, ensuring the mapped event carries the identical data from its source.

Follow these steps to create a mapped event in the pipeline:

  • Choose the Kafka event from the pipeline as the source event for which the mapped events need to be created.

  • Copy the Event Name of the source event to be used in the mapped event.

  • Open the events panel from the pipeline toolbar and select the "Add New Event" option.

  • In the Create Kafka Event dialog box, enable the option at the top to enable mapping for this event.

  • Enter the Source Event Name in the Event name and click on the search icon to select the name of the source event from the suggestions.

  • The Event Duration and No. of Partitions will be automatically filled in the same way as the source event. Users can modify the No. of Outputs between 1 to 10 for the mapped event.

  • Click on the Map Kafka Event option to create the Mapped Event.

  • Update the Pipeline and run it to see the data flow in the source event.

    • Check out the Meta Info and Preview tabs.

  • Open the content configuration tabs for the mapped event.

    • You may see that the same information is passed to the Meta Info and Preview tabs.

Please note:

  • The data of a Mapped Event cannot be cleared directly.

  • To remove data from a Mapped Event, the Source Event to which it is mapped must be flushed.

  • Flushing the Source Event automatically clears the data of the associated Mapped Event

Kafka Event – Post-Execution Tabs Overview

Upon the successful execution of the pipeline, users gain access to several informational tabs associated with the Kafka Event. These tabs provide comprehensive insights into the event's configuration, metadata, data content, and schema structure.

Users can access the following tabs for a Kafka Event after a successful run of the pipeline.

Downloading Data from the Kafka Topic

The user can preview the data and download it once it is sent from the producer component. The Data can be downloaded in CSV, Excel, and JSON formats.

Follow the steps given below to download the data from a Kafka Topic:

  • Navigate to the Preview tab of a Kafka Event. It is recommended to select a pipeline workflow that has been activated or executed successfully to ensure that relevant data is available for preview and analysis within the Kafka Event tabs.

  • The Data Preview will be displayed below.

  • Click the Download option from the top right side of the tab.

  • A context menu appears with the available download options. The supported formats are CSV, Excel, and JSON.

  • Select an option from the context menu to download the data.

  • The Download data dialog box opens.

  • Users have the flexibility to either select all columns at once by using the "Select All" checkbox or individually choose specific columns by selecting the corresponding checkboxes next to each column name.

  • Click the Download option provided at the end of the dialog box.

  • The data will be downloaded to your system in the selected format.

Please note:

  • Users can preview, download, and copy up to 100 data entries.

  • Click the Download icon to download the data in CSV, JSON, or Excel format.

  • Click on the Copy option to copy the data as a list of dictionaries.

Data Type Icons in Column Headers

Each column header includes a data type icon to visually represent the type of data contained within that column. These icons help users quickly identify and interpret the structure of the data.

The following icons are used:

These icons improve data readability and aid in schema understanding during preview, mapping, or transformation tasks.

Column Data Type
Data Type Icon

String: Denotes text or character-based data.

Integer: Represents whole numbers.

Date: Denotes dates.

DateTime: Used for date, time, or timestamp fields.

Float: Indicates numeric values with decimal precision.

Boolean: Represents true or false values.

Object: Represents data where each entry is treated as a generic Python object.

Movable Column Separators
  • Users can scroll the column separators to adjust column widths, allowing for a personalized view of the data.

  • This feature helps accommodate various data lengths and user preferences.

Table View for the Data Preview
  • The event data is displayed in a structured table format.

  • The table supports sorting and filtering to enhance data usability. The previewed data can be filtered based on the Latest, Beginning, and Timestamp options.

Accessing the Time Range Dialog Box

The Timestamp filter option redirects the user to select a timestamp from the Time Range window. The user can either select a start and end date or choose from the available time ranges to apply and get a data preview.

Preview Schema

Users can view the Spark Schema Preview under the Preview Schema tab for Events that have been successfully mapped and executed.

Flushing Events

Kafka Events can be flushed to remove all existing records while retaining the topic offsets by setting the start-offset to match the end-offset. To flush an individual Event, use the Flush Event button next to it in the Event panel. To flush all Events at once, click the Flush All button located at the top of the Event panel.

Flush All Events at the top; individual events' flush buttons