Workflow 2

To ingest hiring data from an API, prepare and analyze it using AutoML, and store the results in a database for actionable insights and reporting.

This workflow is designed to seamlessly ingest hiring data from an API source, ensure data quality through preparation, leverage AutoML techniques to extract valuable insights, and finally store the processed data into a database. It integrates the Data Center, Data Preparation, Data Pipeline, and AutoML Plugins within the BDB Platform.

1. API Integration

a. The workflow begins by connecting to the API source and retrieving real-time hiring data.

b. The API integration enables efficient extraction of information such as job listings, candidate profiles, and recruitment metrics, ensuring the data reflects the latest updates in the recruitment process.

2. Data Preparation

a. Raw hiring data is cleaned, standardized, and structured for analysis.

b. Key tasks include handling missing values, removing inconsistencies, and applying feature engineering techniques.

c. This step improves the accuracy, reliability, and consistency of the dataset.

3. AutoML for Insights

a. The prepared dataset is fed into the AutoML Plugin, which automates the training and evaluation of machine learning models.

b. Multiple algorithms, model architectures, and hyperparameter configurations are tested to identify the best-performing model.

c. This allows the system to uncover patterns, predict outcomes, and generate data-driven insights.

4. Storing Processed Data

a. Once processed, the enriched dataset and AutoML outputs are written into a database.

b. Storing results in the database ensures easy retrieval, integration with dashboards, analytics, and reporting systems.

c. Users can access these insights to support decision-making, recruitment optimization, and performance tracking.

Create a Sandbox and Upload the CSV File

· From the BDB Platform Homepage, click the Apps icon and navigate to the Data Center.

· In the Data Center, click the Sandbox tab and then click Create.

· Upload your CSV file by either dragging and dropping it or browsing your system.

· Once the file loads successfully, click Upload. The sandbox will be created and made available for use.

Creating and Applying Data Preparation

In the Sandbox List, click the three dots next to your created sandbox.

· Select Create Data Preparation to begin cleaning and processing your data.

· This streamlines the data cleaning process and provides a solid foundation for machine learning.

Transformations:

· Delete a Column: Select the Gender column → click Transforms → search for Delete Column → apply the transform. The Gender column will be removed.

· Remove Empty Rows: For Previous CTC and Offered CTC columns, click Transforms → search for Delete Empty Rows → apply the transform for each column.

Final Steps:

· Rename the Preparation for easy reference.

· All applied transformations are tracked and can be undone or removed using the Step option.

· Click Save to finalize the preparation.

Create and Run AutoML Experiment

· Go to the AutoML section in the DS Lab Module and click “Create Experiment”.

· Configure:

o Experiment name: Hiring data

o Experiment type: classification

o Select dataset as sandbox under the “configure dataset” option.

o Give File type as csv

o Select the sandbox from the dropdown menu.

o In the Advanced information, Select the Data preparation that you created.

o Set the Target Column as ‘Gender’.

· Click Save and then it will Start to run the experiment.

· Once training is done, click View Report to review model accuracy and performance.

Register the Best Model

From the Model section, select the model you created using the AutoML process.
Click the Register button (arrow icon) to register the model.

· Once registered, the model can be used in both real-time and batch pipelines for prediction.

Create a New Pipeline

· Clicks on Apps menu and navigate to the Data Pipeline Plugin.

· Create a new pipeline by clicking the Pipeline option, then click Create.

· Provide an appropriate name for the pipeline and allocate the required resources.

· Click Save to store the pipeline configuration.

Add API Ingestion Component

· Click the ‘+’ icon on the right side of the screen to open the Components Palette if it’s not present on the canvas

· Search for API Ingestion and drag the API Ingestion component onto the workspace.

· Configure the component:

· In the Basic Information panel, set Invocation Type to Real-Time.

· In the Meta Information panel, set Ingestion Type to API Ingestion.

· Save the component.

· Once the pipeline is updated and saved, a unique Ingestion URL will be generated automatically.

· From the Event Panel (located next to the Components Palette), click the ‘+’ icon to add a Kafka Event.

· Once created, drag and drop the Kafka Event onto the pipeline canvas.

Add a Data Preparation component

· From the Components Palette, navigate to the Transformation section and drag the Data Preparation component onto the pipeline canvas.

· In the Basic Information panel, set the Invocation Type to Batch.

· In the Meta Information panel:

· Select Data Center Type as Data Sandbox.

· Choose the appropriate Sandbox Name.

· Select the desired Preparation.

· Save the component after configuration.

· From the Event Panel (next to the Components Palette), click the ‘+’ icon to add a Kafka Event.

· Drag and drop the event onto the pipeline canvas.

Add Auto ML component.

· Go to the Components Palette and use the search bar to find AutoML.

· Drag and drop the AutoML Component onto the pipeline canvas.

· In the Basic Information panel, set the Invocation Type to Batch.

· In the Meta Information panel, select the required model from the Model Name dropdown list.

· Save the component once configuration is complete.

· From the Event Panel (next to the Components Palette), click the ‘+’ icon to create a Kafka Event.

· Drag and drop the Kafka Event onto the pipeline canvas.

Add DB Writer Component to Store Final Results

Go to the Components Section
From the Writer section, drag and drop the DB Writer component onto the pipeline canvas.

· In the Basic Information panel, set the Invocation Type to Batch.

· In the Meta Information tab, enter the required database credentials:

o Host

o Port

o Database Name

o Table Name

o Username

o Password

· Select ClickHouse as the driver and choose Append as the save mode.

· Validate the connection and Save the component.

Activate the Pipeline.

Click the Activate icon located in the toolbar.

· Once activated, the pipeline will begin executing, and the Logs section will open automatically.

· Monitor the logs as the data flows through the components in sequence: API Ingestion → Kafka → Data Preparation → Kafka → AutoML → Kafka Event → DB Writer.

Send API Request via Postman

Open Postman and create a new POST request.
Use the API Ingestion URL generated earlier.
In the Headers, enter:

o Ingestion ID

o Ingestion Secret

· In the Body, choose raw > JSON, and send sample JSON data matching your model's schema.

On successful send, Postman will return “ API Ingestion successful”.

· Use the Preview tab of each event to inspect data after it passes through a component.

· Check the Logs Panel on the right to troubleshoot issues or verify progress.

· Use the Component Status next to the logs to view the real-time status of components.

· Confirm that data ingestion and transformation have completed successfully.

· Verify that the AutoML component outputs results

· Ensure the DB Writer loads the processed data into the database.

· Monitor logs for details such as data read from the API, Kafka event status, Data Preparation completion, and AutoML execution.

· Once data reaches to the DB Writer, confirm a message like “DB Writer written data successfully.

· Check the ClickHouse table to ensure predictions and processed data are written correctly. After confirmation, deactivate the pipeline to avoid unnecessary resource usage.

PreviousWorkflow 1 NextWorkflow 3

Last updated 27 days ago