Automate Data Analysis using API, AutoML, & Database Integration

To ingest hiring data from an API, prepare and analyze it using AutoML, and store the results in a database for actionable insights and reporting.

This guide explains how to ingest hiring data from an API, prepare it for analysis, run an AutoML model for insight generation, and store the processed data in a database for reporting.

The workflow integrates four key BDB Platform modules:

  • Data Center for sandbox creation

  • Data Preparation for transformation.

  • AutoML for model training and inference

  • Data Pipeline for automation and integration

By the end of this guide, you’ll have a fully automated machine learning workflow that performs real-time data ingestion, transformation, analysis, and output storage for actionable business insights.

Architecture Overview

Workflow Flow

Stage

Process

Module Used

1

Data ingestion from API

Data Pipeline

2

Data cleaning and transformation

Data Preparation

3

Model training and inference

AutoML (DS Lab)

4

Output writing into database

Data Pipeline (DB Writer)

High-Level Flow Diagram

API Source → Data Preparation → AutoML → Database (ClickHouse)

Prerequisites:

Before you begin:

  • You have access to the BDB Platform (Data Center, DS Lab, and Data Pipeline modules).

  • You have valid API credentials and access to a hiring data endpoint.

  • You have access to a ClickHouse database for writing processed results.

  • A CSV file (optional) is available for testing in the Sandbox transformation.

Step 1: API Integration

Purpose

To retrieve real-time hiring data, including job listings, candidate profiles, and recruitment metrics from an API source.

Procedure

  1. Navigate to the BDB Platform Homepage.

  2. Click the Apps icon → Select the Data Pipeline module.

  3. Click Create Pipeline under the Pipeline tab.

  4. Enter:

    • Pipeline Name

    • Description

    • Resource Allocation (Low/Medium/High)

    • Click Save.

Add API Ingestion Component

  1. Open the Components Palette by clicking the “+” icon (if not visible).

  2. In the search bar, type API Ingestion.

  3. Drag and drop the component onto the canvas.

  4. Configure the component as follows:

    • Invocation Type: Real-Time

    • Ingestion Type: API Ingestion

  5. Click Save to finalize.

Once saved, a unique Ingestion URL will be automatically generated.

Add Kafka Event

  1. From the Event Panel (right side), click the “+” icon to create a Kafka event.

  2. Drag and drop it onto the pipeline canvas.

  3. The Kafka Event automatically connects to the API Ingestion component.

Step 2: Create a Sandbox and Upload the CSV File

Purpose

To create a local workspace for sample hiring data (used for AutoML training and validation).

Procedure

  1. From the BDB Homepage, open the Data Center.

  2. Click the Sandbox tab → Select Create.

  3. Upload your CSV file by:

    • Drag-and-drop, or

    • Clicking Browse to locate the file.

  4. Once the upload is complete, click Upload.

Step 3: Data Preparation

Purpose

To clean, transform, and structure raw hiring data for accurate model training and prediction.

Procedure

  1. In the Sandbox List, click the three dots (⋮) next to your created sandbox.

  2. Select Create Data Preparation.

Apply Transformations

Perform the following cleaning actions:

Transformation

Action

Result

Delete Column

Select Gender column → Click Transforms → Delete Column

Removes redundant field

Remove Empty Rows

Select Previous CTC and Offered CTC → Click Transforms → Delete Empty Rows

Removes incomplete entries

Finalize Preparation

  • Rename the preparation for easy reference.

  • Review transformation steps under the Steps tab.

  • Click Save to complete.

Step 4: Create and Run an AutoML Experiment

Purpose

To train and evaluate machine learning models automatically for predictive analytics on hiring data.

Procedure

  1. Open the DS Lab module from the Apps Menu.

  2. Go to the AutoML section and click Create Experiment.

  3. Configure:

    • Experiment Name: Hiring Data

    • Experiment Type: Classification

  4. Under Configure Dataset:

    • Dataset Source: Sandbox

    • File Type: CSV

    • Select Sandbox: Choose your sandbox dataset

  5. Under Advanced Information:

    • Data Preparation: Select the preparation created in Step 3

    • Target Column: Gender

  6. Click Save to start the experiment.

Monitor AutoML Execution

  • AutoML will train and test multiple models.

  • Once complete, click View Report to review:

    • Model performance metrics

    • Accuracy comparison

    • Recommended best-fit model

Step 5: Register the Best Model

Purpose

To register the trained AutoML model so it can be reused in pipelines for real-time or batch predictions.

Procedure

  1. Navigate to the Model section under DS Lab.

  2. Select the desired model from your AutoML results.

  3. Click the Register (arrow) icon.

  4. Confirm registration.

Step 6: Add Data Preparation Component to Pipeline

Purpose

To integrate the previously created data preparation logic into the pipeline for consistent transformation.

Procedure

  1. From the Components Palette, search Data Preparation.

  2. Drag and drop onto the pipeline canvas.

  3. Configure:

    • Invocation Type: Batch

    • Data Center Type: Data Sandbox

    • Sandbox Name: Select the sandbox

    • Preparation: Choose the saved data preparation

  4. Save configuration.

  5. From the Event Panel, click + → Add a Kafka Event, then connect it.

Step 7: Add AutoML Component

Purpose

To execute the registered AutoML model for predictive analysis on live or processed hiring data.

Procedure

  1. From the Components Palette, search for AutoML Component.

  2. Drag and drop onto the canvas.

  3. Configure:

    • Invocation Type: Batch

    • Model Name: Select your registered AutoML model

  4. Save the component.

  5. Add and connect a Kafka Event.

Step 8: Add DB Writer Component

Purpose

To store processed predictions and enriched data into the target database for dashboards and reporting.

Procedure

  1. From the Writer Section, drag and drop the DB Writer component onto the canvas.

  2. Configure:

    • Invocation Type: Batch

    • Database Driver: ClickHouse

    • Save Mode: Append

  3. Fill in Meta Information:

    • Host

    • Port

    • Database Name

    • Table Name

    • Username

    • Password

  4. Validate the connection.

  5. Click Save.

Step 9: Activate and Execute the Pipeline

Purpose

To run the end-to-end workflow and verify data ingestion, transformation, model execution, and output storage.

Procedure

  1. Click the Activate icon on the pipeline toolbar.

  2. Wait until all pods are deployed and running.

  3. Monitor the Logs panel to view real-time execution details.

Component Flow:

API Ingestion > Kafka > Data Preparation > Kafka > AutoML > Kafka > DB Writer

Validate Execution

  1. Open the Preview Tab for each Kafka event to inspect intermediate data.

  2. Confirm:

    • API ingestion messages are received successfully.

    • Transformations from Data Preparation are applied.

    • AutoML component returns predictions.

    • DB Writer inserts records into the ClickHouse table.

Step 10: Test the API Ingestion via Postman

Purpose

To simulate incoming hiring data using the generated API Ingestion endpoint.

Procedure

  1. Open Postman.

  2. Create a New POST Request using the generated Ingestion URL.

  3. Add the following headers:

    • Ingestion ID

    • Ingestion Secret

  4. In the Body Tab, choose:

    • Format: raw → JSON

    • Add sample JSON matching your model schema:

      {
        "Candidate_ID": "C1234",
        "Experience": 5,
        "Previous_CTC": 800000,
        "Offered_CTC": 950000,
        "Location": "Bangalore"
      }
  5. Click Send.

Expected Response on success: API Ingestion successfulwrote

Step 11: Verify Output and Deactivate Pipeline

  1. Go to the ClickHouse database and confirm the table contains prediction results.

  2. Validate the schema and records against the ingested dataset.

  3. Once confirmed, Deactivate the Pipeline to stop execution and release resources.

Monitoring and Troubleshooting

Issue

Possible Cause

Resolution

API Ingestion not receiving data

Invalid credentials or endpoint

Recheck Ingestion ID/Secret

Data Prep error

Mismatch between schema and source

Validate preparation mapping

AutoML failure

Model not registered

Register model before adding it to pipeline

DB Writer error

Database connection issue

Verify host, port, and authentication

Outcome

By following this guide, you have successfully:

  • Ingested real-time hiring data from an API

  • Transformed and cleaned the dataset using Data Preparation

  • Applied an AutoML model for predictions

  • Stored results into a ClickHouse database

These outputs can now be used for reporting, dashboards, and recruitment performance analytics within the BDB Platform.

Key Benefits

Capability

Advantage

API Integration

Real-time hiring data ingestion

Data Preparation

Ensures clean, consistent, and accurate data

AutoML

Automatic model selection and insights generation

Database Integration

Centralized access for analytics and reporting

Summary

This workflow delivers a complete, production-ready machine learning pipeline — from API ingestion to predictive insights storage. By automating ingestion, transformation, and analysis, organizations can monitor hiring patterns, forecast recruitment metrics, and drive data-backed talent decisions seamlessly through the BDB Platform.