Automate Data Analysis using API, AutoML, & Database Integration

To ingest hiring data from an API, prepare and analyze it using AutoML, and store the results in a database for actionable insights and reporting.

This guide explains how to ingest hiring data from an API, prepare it for analysis, run an AutoML model for insight generation, and store the processed data in a database for reporting.

The workflow integrates four key BDB Platform modules:

Data Center for sandbox creation
Data Preparation for transformation.
AutoML for model training and inference
Data Pipeline for automation and integration

By the end of this guide, you’ll have a fully automated machine learning workflow that performs real-time data ingestion, transformation, analysis, and output storage for actionable business insights.

Architecture Overview

Workflow Flow

Stage

Process

Module Used

Data ingestion from API

Data Pipeline

Data cleaning and transformation

Data Preparation

Model training and inference

AutoML (DS Lab)

Output writing into database

Data Pipeline (DB Writer)

High-Level Flow Diagram

API Source → Data Preparation → AutoML → Database (ClickHouse)

Prerequisites:

Before you begin:

You have access to the BDB Platform (Data Center, DS Lab, and Data Pipeline modules).
You have valid API credentials and access to a hiring data endpoint.
You have access to a ClickHouse database for writing processed results.
A CSV file (optional) is available for testing in the Sandbox transformation.

Step 1: API Integration

Purpose

To retrieve real-time hiring data, including job listings, candidate profiles, and recruitment metrics from an API source.

Procedure

Navigate to the BDB Platform Homepage.
Click the Apps icon → Select the Data Pipeline module.
Click Create Pipeline under the Pipeline tab.
Enter:
- Pipeline Name
- Description
- Resource Allocation (Low/Medium/High)
- Click Save.

Add API Ingestion Component

Open the Components Palette by clicking the “+” icon (if not visible).
In the search bar, type API Ingestion.
Drag and drop the component onto the canvas.
Configure the component as follows:
- Invocation Type: Real-Time
- Ingestion Type: API Ingestion
Click Save to finalize.

Once saved, a unique Ingestion URL will be automatically generated.

Add Kafka Event

From the Event Panel (right side), click the “+” icon to create a Kafka event.
Drag and drop it onto the pipeline canvas.
The Kafka Event automatically connects to the API Ingestion component.

Step 2: Create a Sandbox and Upload the CSV File

Purpose

To create a local workspace for sample hiring data (used for AutoML training and validation).

Procedure

From the BDB Homepage, open the Data Center.
Click the Sandbox tab → Select Create.
Upload your CSV file by:
- Drag-and-drop, or
- Clicking Browse to locate the file.
Once the upload is complete, click Upload.

The Sandbox is created and visible in the Sandbox List.

Step 3: Data Preparation

Purpose

To clean, transform, and structure raw hiring data for accurate model training and prediction.

Procedure

In the Sandbox List, click the three dots (⋮) next to your created sandbox.
Select Create Data Preparation.

Apply Transformations

Perform the following cleaning actions:

Transformation

Action

Result

Delete Column

Select Gender column → Click Transforms → Delete Column

Removes redundant field

Remove Empty Rows

Select Previous CTC and Offered CTC → Click Transforms → Delete Empty Rows

Removes incomplete entries

Finalize Preparation

Rename the preparation for easy reference.
Review transformation steps under the Steps tab.
Click Save to complete.

A cleaned and structured dataset ready for AutoML processing.

Step 4: Create and Run an AutoML Experiment

Purpose

To train and evaluate machine learning models automatically for predictive analytics on hiring data.

Procedure

Open the DS Lab module from the Apps Menu.
Go to the AutoML section and click Create Experiment.
Configure:
- Experiment Name: Hiring Data
- Experiment Type: Classification
Under Configure Dataset:
- Dataset Source: Sandbox
- File Type: CSV
- Select Sandbox: Choose your sandbox dataset
Under Advanced Information:
- Data Preparation: Select the preparation created in Step 3
- Target Column: Gender
Click Save to start the experiment.

Monitor AutoML Execution

AutoML will train and test multiple models.
Once complete, click View Report to review:
- Model performance metrics
- Accuracy comparison
- Recommended best-fit model

Step 5: Register the Best Model

Purpose

To register the trained AutoML model so it can be reused in pipelines for real-time or batch predictions.

Procedure

Navigate to the Model section under DS Lab.
Select the desired model from your AutoML results.
Click the Register (arrow) icon.
Confirm registration.

The model is registered and available for integration in the Data Pipeline.

Step 6: Add Data Preparation Component to Pipeline

Purpose

To integrate the previously created data preparation logic into the pipeline for consistent transformation.

Procedure

From the Components Palette, search Data Preparation.
Drag and drop onto the pipeline canvas.
Configure:
- Invocation Type: Batch
- Data Center Type: Data Sandbox
- Sandbox Name: Select the sandbox
- Preparation: Choose the saved data preparation
Save configuration.
From the Event Panel, click + → Add a Kafka Event, then connect it.

Step 7: Add AutoML Component

Purpose

To execute the registered AutoML model for predictive analysis on live or processed hiring data.

Procedure

From the Components Palette, search for AutoML Component.
Drag and drop onto the canvas.
Configure:
- Invocation Type: Batch
- Model Name: Select your registered AutoML model
Save the component.
Add and connect a Kafka Event.

Step 8: Add DB Writer Component

Purpose

To store processed predictions and enriched data into the target database for dashboards and reporting.

Procedure

From the Writer Section, drag and drop the DB Writer component onto the canvas.
Configure:
- Invocation Type: Batch
- Database Driver: ClickHouse
- Save Mode: Append
Fill in Meta Information:
- Host
- Port
- Database Name
- Table Name
- Username
- Password
Validate the connection.
Click Save.

Step 9: Activate and Execute the Pipeline

Purpose

To run the end-to-end workflow and verify data ingestion, transformation, model execution, and output storage.

Procedure

Click the Activate icon on the pipeline toolbar.
Wait until all pods are deployed and running.
Monitor the Logs panel to view real-time execution details.

Component Flow:

API Ingestion > Kafka > Data Preparation > Kafka > AutoML > Kafka > DB Writer

Validate Execution

Open the Preview Tab for each Kafka event to inspect intermediate data.
Confirm:
- API ingestion messages are received successfully.
- Transformations from Data Preparation are applied.
- AutoML component returns predictions.
- DB Writer inserts records into the ClickHouse table.

Watch for the success message that ensures the completion of the action.

Step 10: Test the API Ingestion via Postman

Purpose

To simulate incoming hiring data using the generated API Ingestion endpoint.

Procedure

Open Postman.
Create a New POST Request using the generated Ingestion URL.
Add the following headers:
- Ingestion ID
- Ingestion Secret

In the Body Tab, choose:

Format: raw → JSON

Add sample JSON matching your model schema:

{
  "Candidate_ID": "C1234",
  "Experience": 5,
  "Previous_CTC": 800000,
  "Offered_CTC": 950000,
  "Location": "Bangalore"
}

Click Send.

Expected Response on success: API Ingestion successfulwrote

Step 11: Verify Output and Deactivate Pipeline

Go to the ClickHouse database and confirm the table contains prediction results.
Validate the schema and records against the ingested dataset.
Once confirmed, Deactivate the Pipeline to stop execution and release resources.

Monitoring and Troubleshooting

Issue

Possible Cause

Resolution

API Ingestion not receiving data

Invalid credentials or endpoint

Recheck Ingestion ID/Secret

Data Prep error

Mismatch between schema and source

Validate preparation mapping

AutoML failure

Model not registered

DB Writer error

Database connection issue

Verify host, port, and authentication

Outcome

By following this guide, you have successfully:

Ingested real-time hiring data from an API
Transformed and cleaned the dataset using Data Preparation
Applied an AutoML model for predictions
Stored results into a ClickHouse database

These outputs can now be used for reporting, dashboards, and recruitment performance analytics within the BDB Platform.

Key Benefits

Capability

Advantage

API Integration

Real-time hiring data ingestion

Data Preparation

Ensures clean, consistent, and accurate data

AutoML

Automatic model selection and insights generation

Database Integration

Centralized access for analytics and reporting

Summary

This workflow delivers a complete, production-ready machine learning pipeline — from API ingestion to predictive insights storage. By automating ingestion, transformation, and analysis, organizations can monitor hiring patterns, forecast recruitment metrics, and drive data-backed talent decisions seamlessly through the BDB Platform.

PreviousBuild an Automated Machine Learning Pipeline NextReal-Time Data Pipeline: API Ingestion to WebSocket Delivery