Leverage AutoML for Super Market Data

Harnessing AutoML for Supermarket data to apply preparation, classification, and regression for uncovering insights and driving data-driven decisions.

Purpose

This guide explains how to use BDB Data Science Lab (DS Lab) to automate machine learning (ML) workflows using AutoML for a supermarket dataset. It walks you through data ingestion, preparation, and AutoML experiment creation—covering both classification and regression use cases. The outcome is a fully automated process that extracts actionable insights from retail data, accelerates model creation, and enhances data-driven decision-making.

Business Context

Supermarket data contains valuable insights about customer behavior, sales performance, and operational trends. By automating the ML process with AutoML, organizations can rapidly explore multiple algorithms, identify high-performing models, and visualize key business indicators such as:

  • Product sales performance

  • Customer segmentation

  • Pricing impact on revenue

  • Demand forecasting

AutoML in DS Lab simplifies complex ML processes by automating data preparation, feature selection, model training, and performance evaluation—without requiring extensive manual coding.

Workflow Overview

This workflow consists of the following stages:

  1. Upload Data: Import Super Market data into the Sandbox using the Data Center or DS Lab plugin.

  2. Data Preparation: Clean and transform the data using the Data Preparation tool.

  3. AutoML Experiment Creation: Configure and run experiments using both Classification and Regression algorithms.

  4. Analyze Results: Review performance metrics, interpret explainability dashboards, and conduct “What-If” analysis to support business strategy.

Step 1 – Upload the Super Market Dataset

  1. From the BDB Homepage, click the Apps icon and select the DS Lab module.

  2. On the DS Lab Homepage, click Create + to create a new project.

  3. Fill in the required project configuration fields:

    Field
    Example
    Description

    Project Name

    Workflow 5 - Super Market AutoML

    Unique and descriptive name

    Description

    “AutoML workflow for classification and regression”

    Optional

    Environment

    Python TensorFlow

    Recommended for AutoML

    Resource Allocation

    Medium

    Based on dataset size

    Idle Time

    1 hour

    Auto-shutdown after inactivity

    Libraries

    pandas, numpy, sklearn

    Preinstall dependencies

  4. Click Save to finalize the configuration.

  5. Click Activate to start the project.

  6. Once activated, click View to open the DS Lab workspace.

Step 2 – Upload Dataset to the Sandbox

  1. In the Data tab (left panel), click the + Add Data icon.

  2. In the Add Data window:

    • Select Data Sandbox as the data source.

    • Click Upload to open the Upload Data Sandbox page.

    • Choose your dataset file (supermarket_data.csv).

  3. Enter:

    • Name: Supermarket_Data

    • Description: “Supermarket sales and customer records.”

  4. Click Save → Wait for the message “File is uploaded successfully.”

  5. Check the box beside the uploaded dataset → Click Add to link it to your project.

Step 3 – Data Preparation

3.1 Access Data Preparation

  1. In the Data tab, click the three dots (⋮) next to your uploaded dataset.

  2. Select Create Data Preparation.

  3. The dataset opens in a grid format within the Data Preparation workspace.

The Data Preparation interface provides a profiling summary of your dataset, highlighting missing values, outliers, and patterns for each column.


3.2 Apply Data Cleaning Transforms

Perform the following transformations sequentially:

Task
Action
Expected Result

Remove Empty Cells (Gender)

Select Gender column → Open Transforms tab → Search Delete Rows with Empty Cells → Click Submit

Removes all rows with blank Gender values.

Delete Invalid Entries (Unit Price)

Select Unit Price column → Apply Delete Rows with Invalid Cells transform

Removes invalid numerical entries.

Rename Column (Customer Type)

Select Customer Type → Apply Rename Column → Enter customer_type → Submit

Updates column name for consistency.

All applied transformations are logged in the Steps section for auditing and version control.

  1. Once complete, rename the data preparation to SuperMarketPreparation for easy reference.

  2. Click the Back arrow.

Step 4 – Create AutoML Experiment (Classification)

4.1 Configure Classification Experiment

  1. In DS Lab, navigate to the AutoML tab → Click Create Experiment.

  2. Enter configuration details:

    Field
    Example
    Description

    Experiment Name

    Supermarket Classification

    Name for identification

    Experiment Type

    Classification

    Choose for categorical prediction

    Dataset Source

    Sandbox

    Select previously uploaded data

    File Type

    CSV

    Dataset format

    Data Preparation

    SuperMarketPreparation

    Cleaned dataset

    Target Column

    Production Line

    Column to predict

  3. Click Save to initialize the experiment.

Note: AutoML automatically tests multiple algorithms (e.g., Decision Tree, Random Forest, XGBoost), evaluates metrics, and selects the best-performing model.

4.2 Monitor Experiment Progress

  • Started: Experiment initialization phase.

  • Running: Models under training; notifications will appear in real time.

  • Completed (Green): Successful completion with a trained model.

  • Failed (Red): Error in training process.

Once complete, click View Report to open the experiment summary.

Experiment Summary

Step 5 – Analyze Model Insights

Navigate to the Model Explainer Dashboard to understand the model’s predictions and performance behavior.

  • Navigate to the Model list.

  • Select a model and click on it to open the right-side panel.

  • Click the Model Explorer icon for the selected model.

5.1 Classification Stats

Provides a detailed breakdown of metrics, including:

  • Accuracy

  • Precision

  • Recall

  • F1-score

  • Global Cutoff threshold control

5.2 Confusion Matrix

Visual representation of:

  • True Positives (TP)

  • True Negatives (TN)

  • False Positives (FP)

  • False Negatives (FN)

Helps measure misclassification costs and adjust thresholds.

5.3 ROC-AUC and PR-AUC Plots

  • ROC-AUC plots show trade-offs between sensitivity and specificity.

  • PR-AUC plots highlight precision–recall relationships for imbalanced data.

5.4 Model Calibration and Lift Curves

  • Lift Curves: Show improvement over random selection.

  • Precision Plots: Assess prediction probability accuracy.

5.5 Individual Prediction Analysis

  • Displays predicted probability per observation.

  • Use the Contribution Plot to visualize each feature’s influence on the prediction.

  • The Partial Dependence Plot (PDP) shows how changing a specific feature affects model output, keeping others constant.

5.6 What-If Analysis

Perform sensitivity testing by manually changing input variables to observe how predictions vary—helping decision-makers simulate outcomes.

5.7 Feature Dependence

What-If Analysis allows users to adjust input values to immediately view how different scenarios affect the model's predictions.

SHAP Summary aggregates SHAP values, displaying the mean absolute SHAP value per feature to summarize each feature's overall impact on predictions.

SHAP Dependence plots illustrate the relationship between feature values and their corresponding SHAP values, revealing how the model utilizes a feature and its influence on the predicted outcome.

Step 6 – Create AutoML Experiment (Regression)

Now let’s extend the analysis to forecast total sales using a regression experiment.

6.1 Configure Regression Experiment

  1. From the AutoML tab, click Create Experiment again.

  2. Provide:

    Field
    Example
    Description

    Experiment Name

    Supermarket Regression

    Predict continuous values

    Experiment Type

    Regression

    For numerical predictions

    Dataset Source

    Sandbox

    Same dataset

    File Type

    CSV

    File format

    Data Preparation

    SuperMarketPreparation

    Reuse existing preparation

    Target Column

    Total Sales

    Output variable

  3. Click Save → The AutoML system will:

    • Train multiple regression models (e.g., Linear, Ridge, Lasso, Random Forest Regressor).

    • Automatically evaluate metrics such as RMSE, R², and MAE.

    • Select the optimal model.

  4. Upon completion, the experiment’s status will turn Completed (green).

6.2 Review Model Reports

Use the following tools to analyze results:

Tool
Description

Model Detail View

Displays key parameters, training duration, and performance summary.

Model Explainer

Provides visual feature importance and prediction explanation.

Dataset Explainer

Reveals feature correlations and hidden patterns in the Super Market data.

Note: These insights allow business teams to understand which factors drive sales, which customer segments are most profitable, and how to optimize pricing or marketing.

Outcome

You have successfully:

This AutoML-driven workflow eliminates manual tuning, accelerates time-to-insight, and empowers non-technical users to apply machine learning at scale.

Business Impact

By leveraging AutoML, supermarket businesses can:

  • Automatically detect demand trends and inventory risks.

  • Identify customer segments driving the most revenue.

  • Forecast future sales with higher accuracy.

  • Enhance marketing personalization and optimize operational efficiency.

Next Steps

  • Export trained models to the Data Pipeline for real-time prediction.

  • Integrate model outputs with the Business Story Module for interactive dashboards.

  • Automate retraining workflows using Job Scheduler for periodic updates.