Leverage AutoML for Super Market Data
Harnessing AutoML for Supermarket data to apply preparation, classification, and regression for uncovering insights and driving data-driven decisions.
Purpose
This guide explains how to use BDB Data Science Lab (DS Lab) to automate machine learning (ML) workflows using AutoML for a supermarket dataset. It walks you through data ingestion, preparation, and AutoML experiment creation—covering both classification and regression use cases. The outcome is a fully automated process that extracts actionable insights from retail data, accelerates model creation, and enhances data-driven decision-making.
Business Context
Supermarket data contains valuable insights about customer behavior, sales performance, and operational trends. By automating the ML process with AutoML, organizations can rapidly explore multiple algorithms, identify high-performing models, and visualize key business indicators such as:
Product sales performance
Customer segmentation
Pricing impact on revenue
Demand forecasting
AutoML in DS Lab simplifies complex ML processes by automating data preparation, feature selection, model training, and performance evaluation—without requiring extensive manual coding.
Workflow Overview
This workflow consists of the following stages:
Upload Data: Import Super Market data into the Sandbox using the Data Center or DS Lab plugin.
Data Preparation: Clean and transform the data using the Data Preparation tool.
AutoML Experiment Creation: Configure and run experiments using both Classification and Regression algorithms.
Analyze Results: Review performance metrics, interpret explainability dashboards, and conduct “What-If” analysis to support business strategy.
Step 1 – Upload the Super Market Dataset
From the BDB Homepage, click the Apps icon and select the DS Lab module.
On the DS Lab Homepage, click Create + to create a new project.
Fill in the required project configuration fields:
FieldExampleDescriptionProject Name
Workflow 5 - Super Market AutoMLUnique and descriptive name
Description
“AutoML workflow for classification and regression”
Optional
Environment
Python TensorFlow
Recommended for AutoML
Resource Allocation
Medium
Based on dataset size
Idle Time
1 hour
Auto-shutdown after inactivity
Libraries
pandas, numpy, sklearn
Preinstall dependencies
Click Save to finalize the configuration.
Click Activate to start the project.
Once activated, click View to open the DS Lab workspace.
Step 2 – Upload Dataset to the Sandbox
In the Data tab (left panel), click the + Add Data icon.
In the Add Data window:
Select Data Sandbox as the data source.
Click Upload to open the Upload Data Sandbox page.

Choose your dataset file (
supermarket_data.csv).
Enter:
Name:
Supermarket_DataDescription: “Supermarket sales and customer records.”
Click Save → Wait for the message “File is uploaded successfully.”
Check the box beside the uploaded dataset → Click Add to link it to your project.
Step 3 – Data Preparation
3.1 Access Data Preparation
In the Data tab, click the three dots (⋮) next to your uploaded dataset.
Select Create Data Preparation.
The dataset opens in a grid format within the Data Preparation workspace.
The Data Preparation interface provides a profiling summary of your dataset, highlighting missing values, outliers, and patterns for each column.
3.2 Apply Data Cleaning Transforms
Perform the following transformations sequentially:
Remove Empty Cells (Gender)
Select Gender column → Open Transforms tab → Search Delete Rows with Empty Cells → Click Submit
Removes all rows with blank Gender values.
Delete Invalid Entries (Unit Price)
Select Unit Price column → Apply Delete Rows with Invalid Cells transform
Removes invalid numerical entries.
Rename Column (Customer Type)
Select Customer Type → Apply Rename Column → Enter customer_type → Submit
Updates column name for consistency.
All applied transformations are logged in the Steps section for auditing and version control.
Once complete, rename the data preparation to SuperMarketPreparation for easy reference.
Click the Back arrow.

Step 4 – Create AutoML Experiment (Classification)
4.1 Configure Classification Experiment
In DS Lab, navigate to the AutoML tab → Click Create Experiment.
Enter configuration details:
FieldExampleDescriptionExperiment Name
Supermarket ClassificationName for identification
Experiment Type
Classification
Choose for categorical prediction
Dataset Source
Sandbox
Select previously uploaded data
File Type
CSV
Dataset format
Data Preparation
SuperMarketPreparation
Cleaned dataset
Target Column
Production LineColumn to predict
Click Save to initialize the experiment.

4.2 Monitor Experiment Progress
Started: Experiment initialization phase.
Running: Models under training; notifications will appear in real time.
Completed (Green): Successful completion with a trained model.
Failed (Red): Error in training process.
Once complete, click View Report to open the experiment summary.


Step 5 – Analyze Model Insights
Navigate to the Model Explainer Dashboard to understand the model’s predictions and performance behavior.
Navigate to the Model list.
Select a model and click on it to open the right-side panel.
Click the Model Explorer icon for the selected model.

5.1 Classification Stats
Provides a detailed breakdown of metrics, including:
Accuracy
Precision
Recall
F1-score
Global Cutoff threshold control

5.2 Confusion Matrix
Visual representation of:
True Positives (TP)
True Negatives (TN)
False Positives (FP)
False Negatives (FN)
Helps measure misclassification costs and adjust thresholds.
5.3 ROC-AUC and PR-AUC Plots
ROC-AUC plots show trade-offs between sensitivity and specificity.
PR-AUC plots highlight precision–recall relationships for imbalanced data.
5.4 Model Calibration and Lift Curves
Lift Curves: Show improvement over random selection.
Precision Plots: Assess prediction probability accuracy.
5.5 Individual Prediction Analysis
Displays predicted probability per observation.
Use the Contribution Plot to visualize each feature’s influence on the prediction.
The Partial Dependence Plot (PDP) shows how changing a specific feature affects model output, keeping others constant.

5.6 What-If Analysis
Perform sensitivity testing by manually changing input variables to observe how predictions vary—helping decision-makers simulate outcomes.

5.7 Feature Dependence
What-If Analysis allows users to adjust input values to immediately view how different scenarios affect the model's predictions.
SHAP Summary aggregates SHAP values, displaying the mean absolute SHAP value per feature to summarize each feature's overall impact on predictions.
SHAP Dependence plots illustrate the relationship between feature values and their corresponding SHAP values, revealing how the model utilizes a feature and its influence on the predicted outcome.

Step 6 – Create AutoML Experiment (Regression)
Now let’s extend the analysis to forecast total sales using a regression experiment.
6.1 Configure Regression Experiment
From the AutoML tab, click Create Experiment again.
Provide:
FieldExampleDescriptionExperiment Name
Supermarket RegressionPredict continuous values
Experiment Type
Regression
For numerical predictions
Dataset Source
Sandbox
Same dataset
File Type
CSV
File format
Data Preparation
SuperMarketPreparation
Reuse existing preparation
Target Column
Total SalesOutput variable
Click Save → The AutoML system will:
Train multiple regression models (e.g., Linear, Ridge, Lasso, Random Forest Regressor).
Automatically evaluate metrics such as RMSE, R², and MAE.
Select the optimal model.
Upon completion, the experiment’s status will turn Completed (green).
6.2 Review Model Reports
Use the following tools to analyze results:
Model Detail View
Displays key parameters, training duration, and performance summary.
Model Explainer
Provides visual feature importance and prediction explanation.
Dataset Explainer
Reveals feature correlations and hidden patterns in the Super Market data.
Outcome
You have successfully:
This AutoML-driven workflow eliminates manual tuning, accelerates time-to-insight, and empowers non-technical users to apply machine learning at scale.
Business Impact
By leveraging AutoML, supermarket businesses can:
Automatically detect demand trends and inventory risks.
Identify customer segments driving the most revenue.
Forecast future sales with higher accuracy.
Enhance marketing personalization and optimize operational efficiency.
Next Steps
Export trained models to the Data Pipeline for real-time prediction.
Integrate model outputs with the Business Story Module for interactive dashboards.
Automate retraining workflows using Job Scheduler for periodic updates.