Real-Time Data Pipeline: API Ingestion to WebSocket Delivery
To ingest, prepare, and deliver real-time hiring data simultaneously to a database for storage and a WebSocket for live consumption.
This guide explains how to ingest, prepare, and deliver real-time hiring data simultaneously to two destinations:
ClickHouse Database (for storage and analysis)
WebSocket Producer (for live dashboards and real-time applications)
The workflow demonstrates the BDB Platform’s ability to support event-driven, multi-channel data delivery through the following stages:
Ingesting hiring data from an API source
Cleaning and transforming the data using Data Preparation
Delivering the processed data simultaneously to both the database and WebSocket consumers
Architecture Overview
Data Flow
Stage
Component
Mode
Purpose
1
API Ingestion
Real-Time
Ingest live hiring data
2
Data Preparation
Batch
Clean and standardize data
3
DB Writer
Batch
Store results in ClickHouse
4
WebSocket Producer
Real-Time
Broadcast to live dashboards and services
Pipeline Sequence
API Ingestion → Kafka Event → Data Preparation → Multi-Output Event → DB Writer + WebSocket ProducerStep 1: Create a Sandbox and Upload a CSV File
Purpose
To set up a testing environment with sample hiring data for pipeline validation.
Procedure
From the BDB Platform Homepage, click the Apps icon → Select Data Center.
Inside the Data Center, click Sandbox → Select Create.
Upload your CSV file by:
Drag-and-drop, or
Clicking Browse to choose the file from your system.
Once uploaded successfully, click Upload. The Sandbox will be created and appear in the Sandbox list.
Step 2: Create and Apply Data Preparation
Purpose
To clean, standardize, and structure the hiring data for downstream analysis.
Procedure
In the Sandbox List, click the three dots (⋮) next to your Sandbox.
Select Create Data Preparation.
Apply transformations as follows:
Transformation
Action
Result
Delete a Column
Select Gender column → Transforms → Delete Column
Removes unwanted field
Remove Empty Rows
Select Previous CTC and Offered CTC → Transforms → Delete Empty Rows
Removes incomplete entries
Rename your Data Preparation for easy reference.
Verify all applied transformations in the Steps Tab.
Click Save to finalize the Data Preparation.
Step 3: Create a New Pipeline
Purpose
To automate the ingestion, transformation, and dual-output delivery processes.
Procedure
From the Apps Menu, open the Data Pipeline module.
Click Pipeline → Create.
Enter:
Pipeline Name
Description
Resource Allocation: Low / Medium / High
Click Save to create the pipeline.
Step 4: Add the API Ingestion Component
Purpose
To enable real-time ingestion of hiring data through a REST API.
Procedure
Click the ‘+’ icon on the right to open the Components Palette (if not already visible).
In the search bar, type API Ingestion.
Drag and drop the API Ingestion component onto the canvas.
Configure the component:
Basic Information
Invocation Type: Real-Time
Meta Information
Ingestion Type: API Ingestion
Click Save.
From the Event Panel, click + to create a Kafka Event and connect it to the API Ingestion component.
Step 5: Add the Data Preparation Component
Purpose
To process and clean the API data before sending it to output components.
Procedure
In the Components Palette, navigate to Transformation → Data Preparation.
Drag and drop the Data Preparation component onto the canvas.
Configure the component:
Basic Information: Invocation Type → Batch
Meta Information:
Data Center Type: Data Sandbox
Sandbox Name: Select your created Sandbox
Preparation: Select the Data Preparation you saved earlier
Save the component.
From the Event Panel, click + → Create a Kafka Event.
Set Number of Outputs = 2 to send processed data to two destinations simultaneously.
Drag and drop the event onto the canvas and connect it after the Data Preparation component.
Step 6: Add DB Writer Component
Purpose
To persist processed hiring data into ClickHouse for analysis, reporting, and archival.
Procedure
In the Components Palette, search for DB Writer (under the Writer category).
Drag and drop it onto the canvas.
Configure the component:
Basic Information: Invocation Type → Batch
Meta Information:
Host:
<database_host>Port:
<port_number>Database Name:
<database_name>Table Name:
<target_table>Username:
<db_username>Password:
<db_password>Driver: ClickHouse
Save Mode: Append
Validate the connection and click Save.
Step 7: Add the WebSocket Producer Component
Purpose
To stream processed hiring data in real-time to connected dashboards or live analytics services.
Procedure
In the Components Palette, navigate to Producer → WebSocket Producer.
Drag and drop it onto the canvas.
Connect the second output of the multi-output Kafka Event to this component.
Configure:
Invocation Type: Real-Time
Click Save.

Step 8: Link and Verify Pipeline Components
Ensure that all components are properly connected in the following sequence:
API Ingestion → Event → Data Preparation → Multi-Output Event → DB Writer & WebSocket ProducerStep 9: Activate the Pipeline
Purpose
To start data ingestion, transformation, and dual-output delivery in real time.
Procedure
Click the Activate icon in the top toolbar.
Wait until all pods are deployed and the pipeline becomes active.
The Logs Panel will automatically open, showing live execution.
Use the Component Status Panel to track each component’s state in real time.

Step 10: Send an API Request via Postman
Purpose
To test real-time ingestion and ensure successful data delivery to both destinations.
Procedure
Open Postman and create a new POST request.
In the URL field, paste the Ingestion URL generated in Step 4.
Under the Headers section, add:
Ingestion IDIngestion Secret(found in component Meta Info)
In the Body tab, select:
raw → JSON, and enter sample data matching your schema:
{ "Candidate_ID": "C102", "Experience": 4, "Previous_CTC": 700000, "Offered_CTC": 850000, "Location": "Pune" }
Click Send.
Expected Response:
API Ingestion successful (HTTP 200 OK)
Step 11: Monitor Pipeline Execution
Purpose
To validate that the pipeline processes data correctly and delivers it to both outputs.
Actions
Use the Preview Panel of each Event to verify intermediate data.
Use the Logs Panel to monitor:
API ingestion success
Data preparation completion
DB Writer and WebSocket Producer status
Expected Log Entries
Component
Expected Message
API Ingestion
“Ingestion successful.”
Data Preparation
“Transformation applied successfully.”
DB Writer
“DB Writer started successfully.”
WebSocket Producer
“Published data to WebSocket successfully.”
Step 12: Verify Output
Destination
Verification Method
Expected Output
ClickHouse
Query target table
Cleaned and ingested hiring data
WebSocket
Check connected dashboard or consumer
Real-time display of hiring data
Step 13: Deactivate the Pipeline
Once verification is complete:
Click the Deactivate button in the toolbar.
This will gracefully stop all components and release compute resources.
Troubleshooting Guide
Issue
Possible Cause
Resolution
401 Unauthorized in Postman
Invalid credentials
Check Ingestion ID and Secret
DB Writer not inserting data
Connection misconfiguration
Validate host, port, and credentials
WebSocket not receiving data
Consumer not subscribed
Ensure WebSocket client is connected
Empty preview data
Schema mismatch
Match field names with API payload
Outcome
After successful implementation:
Real-time hiring data is ingested, cleaned, and transformed automatically.
The data is stored in ClickHouse for analytics and simultaneously broadcast via WebSocket for live monitoring.
Key Benefits
Capability
Business Value
Real-Time Ingestion
Instant visibility into recruitment metrics
Data Preparation
Ensures clean, consistent data
Dual Output Delivery
Supports both analytics and live monitoring
Scalable Design
Handles concurrent API streams efficiently
Summary
This workflow demonstrates how to build a dual-output, real-time data pipeline on the BDB Platform. It integrates API ingestion, transformation, and simultaneous delivery to a database and WebSocket, enabling:
Continuous hiring data monitoring
Instant recruitment insights
Streamlined integration with dashboards and reporting systems
The architecture exemplifies BDB’s real-time, event-driven pipeline capability, ensuring data readiness for both analytics and live decision support.