Real-Time Data Pipeline: API Ingestion to WebSocket Delivery

To ingest, prepare, and deliver real-time hiring data simultaneously to a database for storage and a WebSocket for live consumption.

This guide explains how to ingest, prepare, and deliver real-time hiring data simultaneously to two destinations:

  • ClickHouse Database (for storage and analysis)

  • WebSocket Producer (for live dashboards and real-time applications)

The workflow demonstrates the BDB Platform’s ability to support event-driven, multi-channel data delivery through the following stages:

  1. Ingesting hiring data from an API source

  2. Cleaning and transforming the data using Data Preparation

  3. Delivering the processed data simultaneously to both the database and WebSocket consumers

Architecture Overview

Data Flow

Stage

Component

Mode

Purpose

1

API Ingestion

Real-Time

Ingest live hiring data

2

Data Preparation

Batch

Clean and standardize data

3

DB Writer

Batch

Store results in ClickHouse

4

WebSocket Producer

Real-Time

Broadcast to live dashboards and services

Pipeline Sequence

API Ingestion → Kafka Event → Data Preparation → Multi-Output Event → DB Writer + WebSocket Producer

Prerequisites:

Before starting, ensure that:

  • You have valid access to the BDB Platform modules: Data Center, Data Pipeline, and Data Preparation.

  • You have valid API credentials for ingestion.

  • You have access to a ClickHouse database.

  • The WebSocket consumer (e.g., dashboard or monitoring service) is ready to receive messages.

  • A CSV sample file is available for testing in the Sandbox.

Step 1: Create a Sandbox and Upload a CSV File

Purpose

To set up a testing environment with sample hiring data for pipeline validation.

Procedure

  1. From the BDB Platform Homepage, click the Apps icon → Select Data Center.

  2. Inside the Data Center, click Sandbox → Select Create.

  3. Upload your CSV file by:

    • Drag-and-drop, or

    • Clicking Browse to choose the file from your system.

  4. Once uploaded successfully, click Upload. The Sandbox will be created and appear in the Sandbox list.

Step 2: Create and Apply Data Preparation

Purpose

To clean, standardize, and structure the hiring data for downstream analysis.

Procedure

  1. In the Sandbox List, click the three dots (⋮) next to your Sandbox.

  2. Select Create Data Preparation.

  3. Apply transformations as follows:

Transformation

Action

Result

Delete a Column

Select Gender column → Transforms → Delete Column

Removes unwanted field

Remove Empty Rows

Select Previous CTC and Offered CTCTransforms → Delete Empty Rows

Removes incomplete entries

  1. Rename your Data Preparation for easy reference.

  2. Verify all applied transformations in the Steps Tab.

  3. Click Save to finalize the Data Preparation.

Step 3: Create a New Pipeline

Purpose

To automate the ingestion, transformation, and dual-output delivery processes.

Procedure

  1. From the Apps Menu, open the Data Pipeline module.

  2. Click Pipeline → Create.

  3. Enter:

    • Pipeline Name

    • Description

    • Resource Allocation: Low / Medium / High

  4. Click Save to create the pipeline.

Step 4: Add the API Ingestion Component

Purpose

To enable real-time ingestion of hiring data through a REST API.

Procedure

  1. Click the ‘+’ icon on the right to open the Components Palette (if not already visible).

  2. In the search bar, type API Ingestion.

  3. Drag and drop the API Ingestion component onto the canvas.

  4. Configure the component:

    • Basic Information

      • Invocation Type: Real-Time

    • Meta Information

      • Ingestion Type: API Ingestion

  5. Click Save.

A unique Ingestion URL will be generated automatically after saving and updating the pipeline.

  1. From the Event Panel, click + to create a Kafka Event and connect it to the API Ingestion component.

Step 5: Add the Data Preparation Component

Purpose

To process and clean the API data before sending it to output components.

Procedure

  1. In the Components Palette, navigate to Transformation → Data Preparation.

  2. Drag and drop the Data Preparation component onto the canvas.

  3. Configure the component:

    • Basic Information: Invocation Type → Batch

    • Meta Information:

      • Data Center Type: Data Sandbox

      • Sandbox Name: Select your created Sandbox

      • Preparation: Select the Data Preparation you saved earlier

  4. Save the component.

  5. From the Event Panel, click + → Create a Kafka Event.

  6. Set Number of Outputs = 2 to send processed data to two destinations simultaneously.

  7. Drag and drop the event onto the canvas and connect it after the Data Preparation component.

Step 6: Add DB Writer Component

Purpose

To persist processed hiring data into ClickHouse for analysis, reporting, and archival.

Procedure

  1. In the Components Palette, search for DB Writer (under the Writer category).

  2. Drag and drop it onto the canvas.

  3. Configure the component:

    • Basic Information: Invocation Type → Batch

    • Meta Information:

      • Host: <database_host>

      • Port: <port_number>

      • Database Name: <database_name>

      • Table Name: <target_table>

      • Username: <db_username>

      • Password: <db_password>

      • Driver: ClickHouse

      • Save Mode: Append

  4. Validate the connection and click Save.

Step 7: Add the WebSocket Producer Component

Purpose

To stream processed hiring data in real-time to connected dashboards or live analytics services.

Procedure

  1. In the Components Palette, navigate to Producer → WebSocket Producer.

  2. Drag and drop it onto the canvas.

  3. Connect the second output of the multi-output Kafka Event to this component.

  4. Configure:

    • Invocation Type: Real-Time

  5. Click Save.

Ensure that all components are properly connected in the following sequence:

API Ingestion → Event → Data Preparation → Multi-Output Event → DB Writer & WebSocket Producer

Step 9: Activate the Pipeline

Purpose

To start data ingestion, transformation, and dual-output delivery in real time.

Procedure

  1. Click the Activate icon in the top toolbar.

  2. Wait until all pods are deployed and the pipeline becomes active.

  3. The Logs Panel will automatically open, showing live execution.

  4. Use the Component Status Panel to track each component’s state in real time.

Step 10: Send an API Request via Postman

Purpose

To test real-time ingestion and ensure successful data delivery to both destinations.

Procedure

  1. Open Postman and create a new POST request.

  2. In the URL field, paste the Ingestion URL generated in Step 4.

  3. Under the Headers section, add:

    • Ingestion ID

    • Ingestion Secret (found in component Meta Info)

  4. In the Body tab, select:

    • raw → JSON, and enter sample data matching your schema:

      {
        "Candidate_ID": "C102",
        "Experience": 4,
        "Previous_CTC": 700000,
        "Offered_CTC": 850000,
        "Location": "Pune"
      }
  5. Click Send.

Expected Response: API Ingestion successful (HTTP 200 OK)

Step 11: Monitor Pipeline Execution

Purpose

To validate that the pipeline processes data correctly and delivers it to both outputs.

Actions

  • Use the Preview Panel of each Event to verify intermediate data.

  • Use the Logs Panel to monitor:

    • API ingestion success

    • Data preparation completion

    • DB Writer and WebSocket Producer status

Expected Log Entries

Component

Expected Message

API Ingestion

“Ingestion successful.”

Data Preparation

“Transformation applied successfully.”

DB Writer

“DB Writer started successfully.”

WebSocket Producer

“Published data to WebSocket successfully.”

Step 12: Verify Output

Destination

Verification Method

Expected Output

ClickHouse

Query target table

Cleaned and ingested hiring data

WebSocket

Check connected dashboard or consumer

Real-time display of hiring data

Step 13: Deactivate the Pipeline

Once verification is complete:

  • Click the Deactivate button in the toolbar.

  • This will gracefully stop all components and release compute resources.

Troubleshooting Guide

Issue

Possible Cause

Resolution

401 Unauthorized in Postman

Invalid credentials

Check Ingestion ID and Secret

DB Writer not inserting data

Connection misconfiguration

Validate host, port, and credentials

WebSocket not receiving data

Consumer not subscribed

Ensure WebSocket client is connected

Empty preview data

Schema mismatch

Match field names with API payload

Outcome

After successful implementation:

  • Real-time hiring data is ingested, cleaned, and transformed automatically.

  • The data is stored in ClickHouse for analytics and simultaneously broadcast via WebSocket for live monitoring.

Key Benefits

Capability

Business Value

Real-Time Ingestion

Instant visibility into recruitment metrics

Data Preparation

Ensures clean, consistent data

Dual Output Delivery

Supports both analytics and live monitoring

Scalable Design

Handles concurrent API streams efficiently

Summary

This workflow demonstrates how to build a dual-output, real-time data pipeline on the BDB Platform. It integrates API ingestion, transformation, and simultaneous delivery to a database and WebSocket, enabling:

  • Continuous hiring data monitoring

  • Instant recruitment insights

  • Streamlined integration with dashboards and reporting systems

The architecture exemplifies BDB’s real-time, event-driven pipeline capability, ensuring data readiness for both analytics and live decision support.