Workflow 3

To ingest, prepare, and deliver real-time hiring data simultaneously to a database for storage and a WebSocket for live consumption.

This workflow focuses on creating a pipeline using the API Ingestion Component, Data Preparation, DB Writer, and WebSocket Producer.

The workflow is designed to ingest hiring data from an API source, retrieving real-time information such as job listings, candidate profiles, and recruitment metrics. The API integration enables seamless extraction of data, ensuring that recruitment insights are always up to date.

Once ingested, the raw data is processed through the Data Preparation Plugin, where it is cleaned, standardized, and transformed for downstream analysis. The processed data is then sent to two destinations simultaneously:

1. ClickHouse Database – for storage, trend analysis, and integration with reporting systems.

2. WebSocket Producer – for real-time consumption by live dashboards, monitoring applications, or downstream services.

This dual-output architecture highlights BDB’s ability to power production-grade, API-driven ML workflows with:

· Real-time monitoring

· Event-driven data handling

· Multi-channel output delivery

Create a Sandbox and Upload the CSV File

  • From the BDB Platform homepage, click on the Apps icon and navigate to the Data Center.

  • Inside the Data Center, click on the “ sandbox” button and then click “create”

  • Upload your CSV file by dragging and dropping or browsing your system.

  • After the file loads successfully, click Upload, and the sandbox will be created and available for use.

· In the sandbox list, click on the three dots next to your created Sandbox.

· Choose “Create Data Preparation” to begin cleaning and processing your data.

· This simplifies the cleaning process and sets a solid foundation for machine learning.

o Delete the Gender column: Select the column → click Transforms → search for Delete Column transform and click on it, the Gender column is now successfully deleted.

o Remove empty rows from Previous CTC and Offered CTC: Repeat click on the transforms option → search for delete empty rows and click on it for respective column.

· Once transformation steps are complete:

o Rename the data preparation for easy reference.

o All transformations are tracked and can be removed via the step option.

o Click Save to finalize.

Create a New Pipeline

  • Go to Apps menu and go to the Data Pipeline module.

  • Create a new pipeline by clicking on the pipeline option and then click on “create”, name it appropriately, and allocate required resources.

· Click Save .

Add API Ingestion Component

· Click the ‘+’ icon on the right side of the screen to open the Components Palette if it’s not present on the canvas

  • Navigate to Components Palette located to the right of the screen.

  • Search for API Ingestion component, drag and drop API Ingestion component onto the workspace.

  • Set:

o Invocation Type: Real-Time, In the basic information.

o Ingestion Type: API Ingestion, in the meta information.

  • Save the component.

  • A unique Ingestion URL will be generated automatically once the pipeline is updated and saved.

· From the “event” panel situated next to the component in the component palette, Click on the ‘+’ icon to add a kafka event, After adding the event drag and drop the event to the pipeline canvas.

Add a Data Preparation component.

· In the Components Palette, navigate to the Transformation section and drag the Data Preparation component onto the pipeline canvas.

  • Set Invocation Type to Batch

  • In meta information, select the data center type as data sandbox, then select the sandbox name and preparation.

  • Save the component, once configured.

· From the Event Panel (next to the Components Palette), click the ‘+’ icon to create a Kafka Event.

· Drag and drop the event onto the pipeline canvas.

· In the Basic Information panel of the event, set the Number of Outputs to 2 (you can adjust this value as needed).

Add DB Writer Component to Store Final Results

  • Drag and drop the DB Writer component from the writer section of the component palette.

  • Give the invocation type as batch

  • Enter required DB credentials in the meta information tab

  • Host, Port, Database Name, Table Name, Username, Password.

  • Use ClickHouse as the driver and Append as save mode.

· Validate the connection and save the component.

Add WebSocket Producer

  • Drag and drop the WebSocket component from the Producer section of the component palette.

  • Connect the second output of the multi-output Event to this component.

  • Set:

  • Invocation Type: Real-Time

  • Save the component

Activate the Pipeline

  • Ensure that all components are properly linked in this sequence:

API Ingestion → Event → Data Preparation → Event (2 outputs) → DB Writer & WebSocket Producer

  • Click “Activate” icon located in the toolbar.

· Once you click on the activate icon the pipeline will start executing and the logs section will be automatically popped up.

  • In the Headers section, include:

  • Ingestion ID

  • Ingestion Secret (both are found in the component meta info).

  • In the Body, choose raw → JSON, and input a sample JSON record matching your data schema.

  • Click Send. A successful request returns a 200 OK response and a message like: " API Ingestion successful."

· Use Preview Panel of respective “event” to inspect data after each component.

· Once activated, the pipeline will begin executing and the pods will start deploying.

· After all pods are up and running, move to the Logs section to track pipeline execution in detail.

· Use the Component Status Panel (next to the logs) to view the real-time status of each component.

· Confirm that data ingestion and transformation were successful.

· In the Logs Panel, track:

o API Ingestion success

o Data being routed through the event and preparation stages

o Any errors or status messages from DB Writer or WebSocket Producer

· After the data is cleaned:

o The DB Writer will insert the cleaned data into the configured ClickHouse table.

If successful, you’ll see confirmation messages such as “DB Writer started successfully.”

· The WebSocket Producer will publish the same data for real-time consumers (e.g., dashboards or downstream services).

Last updated