Leverage AutoML for Super Market Data

Harnessing AutoML for Supermarket data to apply preparation, classification, and regression for uncovering insights and driving data-driven decisions.

Purpose

This guide explains how to use BDB Data Science Lab (DS Lab) to automate machine learning (ML) workflows using AutoML for a supermarket dataset. It walks you through data ingestion, preparation, and AutoML experiment creation—covering both classification and regression use cases. The outcome is a fully automated process that extracts actionable insights from retail data, accelerates model creation, and enhances data-driven decision-making.

Business Context

Supermarket data contains valuable insights about customer behavior, sales performance, and operational trends. By automating the ML process with AutoML, organizations can rapidly explore multiple algorithms, identify high-performing models, and visualize key business indicators such as:

Product sales performance
Customer segmentation
Pricing impact on revenue
Demand forecasting

AutoML in DS Lab simplifies complex ML processes by automating data preparation, feature selection, model training, and performance evaluation—without requiring extensive manual coding.

Workflow Overview

This workflow consists of the following stages:

Upload Data: Import Super Market data into the Sandbox using the Data Center or DS Lab plugin.
Data Preparation: Clean and transform the data using the Data Preparation tool.
AutoML Experiment Creation: Configure and run experiments using both Classification and Regression algorithms.
Analyze Results: Review performance metrics, interpret explainability dashboards, and conduct “What-If” analysis to support business strategy.

Step 1 – Upload the Super Market Dataset

From the BDB Homepage, click the Apps icon and select the DS Lab module.
On the DS Lab Homepage, click Create + to create a new project.
Fill in the required project configuration fields:
Field
Example
Description
Project Name
Workflow 5 - Super Market AutoML
Unique and descriptive name
Description
“AutoML workflow for classification and regression”
Optional
Environment
Python TensorFlow
Recommended for AutoML
Resource Allocation
Medium
Based on dataset size
Idle Time
1 hour
Auto-shutdown after inactivity
Libraries
pandas, numpy, sklearn
Preinstall dependencies
Click Save to finalize the configuration.
Click Activate to start the project.
Once activated, click View to open the DS Lab workspace.

Step 2 – Upload Dataset to the Sandbox

In the Data tab (left panel), click the + Add Data icon.
In the Add Data window:
- Select Data Sandbox as the data source.
- Click Upload to open the Upload Data Sandbox page.
- Choose your dataset file (supermarket_data.csv).
Enter:
- Name: Supermarket_Data
- Description: “Supermarket sales and customer records.”
Click Save → Wait for the message “File is uploaded successfully.”
Check the box beside the uploaded dataset → Click Add to link it to your project.

Step 3 – Data Preparation

3.1 Access Data Preparation

In the Data tab, click the three dots (⋮) next to your uploaded dataset.
Select Create Data Preparation.
The dataset opens in a grid format within the Data Preparation workspace.

The Data Preparation interface provides a profiling summary of your dataset, highlighting missing values, outliers, and patterns for each column.

3.2 Apply Data Cleaning Transforms

Perform the following transformations sequentially:

Task

Action

Expected Result

Remove Empty Cells (Gender)

Select Gender column → Open Transforms tab → Search Delete Rows with Empty Cells → Click Submit

Removes all rows with blank Gender values.

Delete Invalid Entries (Unit Price)

Select Unit Price column → Apply Delete Rows with Invalid Cells transform

Removes invalid numerical entries.

Rename Column (Customer Type)

Select Customer Type → Apply Rename Column → Enter customer_type → Submit

Updates column name for consistency.

All applied transformations are logged in the Steps section for auditing and version control.

Once complete, rename the data preparation to SuperMarketPreparation for easy reference.
Click the Back arrow.

The preparation is automatically saved and made available across DS Lab modules.

Step 4 – Create AutoML Experiment (Classification)

4.1 Configure Classification Experiment

In DS Lab, navigate to the AutoML tab → Click Create Experiment.
Enter configuration details:
Field
Example
Description
Experiment Name
Supermarket Classification
Name for identification
Experiment Type
Classification
Choose for categorical prediction
Dataset Source
Sandbox
Select previously uploaded data
File Type
CSV
Dataset format
Data Preparation
SuperMarketPreparation
Cleaned dataset
Target Column
Production Line
Column to predict
Click Save to initialize the experiment.

Note: AutoML automatically tests multiple algorithms (e.g., Decision Tree, Random Forest, XGBoost), evaluates metrics, and selects the best-performing model.

4.2 Monitor Experiment Progress

Started: Experiment initialization phase.
Running: Models under training; notifications will appear in real time.
Completed (Green): Successful completion with a trained model.
Failed (Red): Error in training process.

Once complete, click View Report to open the experiment summary.

Step 5 – Analyze Model Insights

Navigate to the Model Explainer Dashboard to understand the model’s predictions and performance behavior.

Navigate to the Model list.
Select a model and click on it to open the right-side panel.
Click the Model Explorer icon for the selected model.

5.1 Classification Stats

Provides a detailed breakdown of metrics, including:

Accuracy
Precision
Recall
F1-score
Global Cutoff threshold control

5.2 Confusion Matrix

Visual representation of:

True Positives (TP)
True Negatives (TN)
False Positives (FP)
False Negatives (FN)

Helps measure misclassification costs and adjust thresholds.

5.3 ROC-AUC and PR-AUC Plots

ROC-AUC plots show trade-offs between sensitivity and specificity.
PR-AUC plots highlight precision–recall relationships for imbalanced data.

5.4 Model Calibration and Lift Curves

Lift Curves: Show improvement over random selection.
Precision Plots: Assess prediction probability accuracy.

5.5 Individual Prediction Analysis

Displays predicted probability per observation.
Use the Contribution Plot to visualize each feature’s influence on the prediction.
The Partial Dependence Plot (PDP) shows how changing a specific feature affects model output, keeping others constant.

5.6 What-If Analysis

Perform sensitivity testing by manually changing input variables to observe how predictions vary—helping decision-makers simulate outcomes.

5.7 Feature Dependence

What-If Analysis allows users to adjust input values to immediately view how different scenarios affect the model's predictions.

SHAP Summary aggregates SHAP values, displaying the mean absolute SHAP value per feature to summarize each feature's overall impact on predictions.

SHAP Dependence plots illustrate the relationship between feature values and their corresponding SHAP values, revealing how the model utilizes a feature and its influence on the predicted outcome.

Step 6 – Create AutoML Experiment (Regression)

Now let’s extend the analysis to forecast total sales using a regression experiment.

6.1 Configure Regression Experiment

From the AutoML tab, click Create Experiment again.
Provide:
Field
Example
Description
Experiment Name
Supermarket Regression
Predict continuous values
Experiment Type
Regression
For numerical predictions
Dataset Source
Sandbox
Same dataset
File Type
CSV
File format
Data Preparation
SuperMarketPreparation
Reuse existing preparation
Target Column
Total Sales
Output variable
Click Save → The AutoML system will:
- Train multiple regression models (e.g., Linear, Ridge, Lasso, Random Forest Regressor).
- Automatically evaluate metrics such as RMSE, R², and MAE.
- Select the optimal model.
Upon completion, the experiment’s status will turn Completed (green).

6.2 Review Model Reports

Use the following tools to analyze results:

Tool

Description

Model Detail View

Displays key parameters, training duration, and performance summary.

Model Explainer

Provides visual feature importance and prediction explanation.

Dataset Explainer

Reveals feature correlations and hidden patterns in the Super Market data.

Note: These insights allow business teams to understand which factors drive sales, which customer segments are most profitable, and how to optimize pricing or marketing.

Outcome

You have successfully:

Uploaded and cleaned the Super Market data using Data Preparation
Configured and executed AutoML Classification and Regression experiments
Explored Model Explainability and What-If Analysis
Derived actionable insights on customer and product behavior

This AutoML-driven workflow eliminates manual tuning, accelerates time-to-insight, and empowers non-technical users to apply machine learning at scale.

Business Impact

By leveraging AutoML, supermarket businesses can:

Automatically detect demand trends and inventory risks.
Identify customer segments driving the most revenue.
Forecast future sales with higher accuracy.
Enhance marketing personalization and optimize operational efficiency.

Next Steps

Export trained models to the Data Pipeline for real-time prediction.
Integrate model outputs with the Business Story Module for interactive dashboards.
Automate retraining workflows using Job Scheduler for periodic updates.

PreviousBuild and Deploy a Sentiment Analysis Model as an API in DS Lab NextDashboard Designer