Workflow 5

To leverage AutoML for Super Market data by applying data preparation, classification, and regression techniques to uncover insights and drive data-driven decisions.

Workflow 5 empowers users to automate the machine learning process, integrate data preparation techniques, and extract valuable insights from Super Market data. By stepping into the world of AutoML, you can simplify and accelerate the application of machine learning algorithms to real-world business problems.

This workflow allows you to effortlessly create AutoML experiments specifically tailored for Super Market datasets, applying both classification and regression algorithms. The dataset may include details such as product sales, customer demographics, and market trends, forming a rich foundation for analysis and decision-making.

Key Steps in This Workflow

1. Upload Data

o Import the Super Market dataset into the sandbox environment, either through the Data Center Plugin or the DS Lab Plugin, offering flexibility and convenience.

2. Data Preparation

o Apply cleaning and transformation techniques to improve data quality and ensure consistency for modeling.

3. Create AutoML Experiment

o Configure and run an AutoML experiment using both classification and regression algorithms on the prepared dataset.

4. Analyze Results

o Review the automatically generated comprehensive report.

o Explore the Model Explainer Dashboard to understand model predictions and behavior.

o Use the Dataset Explainer to identify hidden patterns and relationships within the data.

To create an AutoML workflow, follow these steps

· From the Home Screen of the platform, select the DS Lab module from the Apps menu. This will take you to the Data Science Lab Homepage.

· You will see a list of projects, each containing details such as:

  • Name: Workflow 5

  • Description

  • Environment: Python TensorFlow

  • Resource Allocation: Medium

  • Idle Time: 1 hour

  • Libraries

  • Actions: Push/Pull from version control, Share, Edit, Delete, or Activate

· Once the project is configured, click Save.

· Activate the project by clicking the Activate button, then select it from the list.

· You will be redirected to the Project Main Page, which contains multiple tabs such as Repo, Utils, and Files.

· To work on the AutoML workflow, navigate to the Data tab located on the left of the search bar in the workspace. This tab allows you to manage and explore datasets associated with your project.

· In the Data Tab, you will see a list of uploaded datasets, each showing details such as type and available actions (Preview, Generate Data Profile, Create Experiment, Delete, and Data Preparation).

· Select the appropriate dataset for your Super Market project.

· If the dataset is not available, follow the upload steps to add it.On the right-hand side, there is an "+" icon. Click on it to navigate to the "Add Data" page.

· In the "Add Data" window, select the "Data Sandbox" option as the data source. This will list all the existing files in the sandbox. If you want to add a new dataset to the sandbox, click the "Upload" button, which will take you to the "Upload Data Sandbox" page and upload file i.e. “supermarket data”

Create Data Preparation

Data Preparation is used to clean the data. Let's get started with the data preparation process

· Select Data Preparation Icon by navigating to 3 dots of the data you just now added in the tab. will navigate to the Data preparation page

· Here on the Data Preparation page, you can see your complete dataset displayed in a grid form. The Data Preparation Plugin automatically profiles the data, Providing valuable insights into its characteristics and detecting any anomaly data. You can also view the data profiling details.

· On the right-hand side, you'll find the selected column's profile. Here, you can explore various details such as charts, information, and patterns associated with the selected column

· All the Transformations will appear inside transforms tab

· Now, let's start preparing and cleaning the data.

o To remove the empty cells in the Gender' column, just select Gender column and navigate to the 'Transforms' tab and search for the 'Delete Rows with Empty Cells' transform. Click on it to remove all the empty rows from the Gender column". You can see that the empty rows in the Gender' column have been removed

o Next, let's perform the Delete rows with invalid cell transformation on the Unit Price' column.

· Select the column and search for the Delete rows with invalid cell ' transform. Click on it to Delete invalid cell from the Unit Price ' column.

· Great! The Invalid cell in the Unit Price ' column have been successfully removed."

· Now, let's Rename the Customer type column. Simply select the column and search for the ‘Rename Column' transform.

· Click on it to and rename it let me specify customer_type and submit it. Perfect! The customer type column has been Renamed.

· You can see that all the performed transforms are recorded in the 'Steps' section. This helps you keep track of the changes made to the dataset.

· Now, let's rename the preparation for identification purposes. Simply click on the edit icon and give it a new name, such as 'SuperMarketPreparation

Great! The preparation has been renamed to 'SuperMarketPreparation Click on the back icon, and the preparation will be automatically saved and exported to different plugins, such as the Data Pipeline or AutoML/DS Lab

Auto ML Experiment Creation

· Navigate to the AutoML section in the DS Lab Module and click the Create Experiment button.

· Configure the experiment with the following details:

o Experiment Name: Supermarket Classification

o Experiment Type: Classification

· Under the Configure Dataset option:

o Select Sandbox as the dataset source.

o Select the File Type as CSV from the dropdown.

o Choose the required sandbox from the dropdown menu.

· In the Advanced Information section:

o Select the Data Preparation process you created earlier.

o Set the Target Column to "Production Line".

· Click Save to initialize and start running the experiment.

· The AutoML will automatically test multiple models in the background, evaluate their performance, and select the most optimal approach.

When you navigate to the AutoML tab, you will see a list of all your experiments. Your newly created experiment will appear at the top.

· Each experiment has a status associated with it, indicating its progress and outcome. Initially, a new experiment has the status of "Started."

· As the model training progresses, the status changes to "Running." You will receive a notification to keep you informed.

· Once the model training is successfully completed, the status changes to "Completed," indicated by a green color.

· However, if any issues arise during training, the status will change to "Failed," and it will be indicated in red.

· Remember, you have the option to delete an experiment or view its detailed report.

· Use the "Delete" button to remove an experiment from the list, and click "View Report" to access a comprehensive analysis of the experiment's results.

· Once your experiment is completed, Look for the completed experiment that you want to analyze. Click on the “view report” to open the experiment details.

After clicking, you will find all the model that are created

· Now go the model section in the ds lab module, Once the model is created, select the recommended model and click on model explainer to see the explanation about the model.

'Classification Stats' tab provides various statistics regarding the classification model. Here, you can access a range of performance metrics that evaluate the model's accuracy and effectiveness. These metrics offer insights into how well the model performs in classifying positive and negative instances.

Global Cutoff

Within the 'Classification Stats' tab, you can set a global cutoff. This cutoff is a threshold that determines the classification of instances into positive or negative classes. By setting the cutoff, you can control the balance between false positives and false negatives, optimizing the model's performance.

Model Performance Metrics

The 'Model Performance Metrics' section displays a list of various performance metrics. These metrics provide valuable insights into the model's performance, including accuracy, precision, recall, F1 score, and more. Reviewing these metrics helps you assess the model's strengths and weaknesses in classification tasks.

Confusion Matrix

The confusion matrix presents a visual representation of the model's performance in classifying instances. It shows the number of true negatives, true positives, false negatives, and false positives. By examining the confusion matrix, you can understand the costs associated with misclassifications and select an optimal cutoff.

Precision Plot

The precision plot illustrates the relationship between the predicted probability of a record belonging to the positive class and the percentage of observed records in the positive class. This plot helps assess the model's calibration and its ability to accurately predict the positive class.

Classification Plot

The classification plot displays the fraction of each class above and below the selected cutoff. It provides insights into how the model's predictions are distributed between the positive and negative classes based on the chosen threshold.

ROC AUC Plot

The ROC AUC plot is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at different classification thresholds. It helps assess the model's performance in distinguishing between positive and negative instances across different thresholds.

PR AUC Plot

The PR AUC plot shows the trade-off between precision and recall in a single plot. It provides insights into how the model's precision and recall change with varying classification thresholds.

Lift Curve and Cumulative Precision

The lift curve chart displays the percentage of positive classes when selecting observations with scores above the cutoff compared to random selection. It helps evaluate the model's performance compared to random selection.

Individual Prediction: The individual prediction section displays the predicted probability for each target label. It provides insights into how the model assigns probabilities to different classes for a specific observation.

Contributions Plot

The contributions plot shows the contribution that each feature has provided to the prediction for a specific observation. These contributions, starting from the population average, add up to the final prediction. This plot helps explain how each prediction is built up from the individual features in the model.

Partial Dependence Plot

The partial dependence plot (PDP) shows how the model prediction would change if you change a particular feature while keeping other features constant. It provides insights into the relationship between a specific feature and the model's predictions. The average effect is shown in grey, and the effect of changing the feature for a single record is shown in blue.

Contributions Table

The contributions table shows the contribution each individual feature has had on the prediction for a specific observation. These contributions, starting from the population average, add up to the final prediction. This table allows you to explain how each individual prediction is built up from the features in the model.

What If Analysis: The What If Analysis allows stakeholders to understand the potential consequences of different scenarios or decisions. In this analysis, you can change the values of selected variables to see how the outcome would change. It helps identify the sensitivity of the outcome to different inputs and which variables are most important.

Feature Input

Within the What If Analysis, you can adjust the input values to see predictions for different scenarios. This feature allows you to explore how changing input values affects the model's predictions.

Contribution & Partial Dependence Plots

In the What If Analysis, analysts typically start with a baseline scenario and identify variables that may impact the outcome. Contribution and partial dependence plots help visualize the effects of changing these variables on the model's predictions.

Feature Dependence

The feature dependence analysis explores the relationship between feature values and their impact on predictions. This analysis allows you to investigate how the model uses features in line with intuitions or learn about the relationships the model has learned between input features and predicted outcomes.

The Shap summary summarizes the Shap values per feature. It provides an aggregate display, showing the mean absolute Shap value per feature. This summary helps understand the overall impact of each feature on the model's predictions.

Shap Dependence

The Shap dependence plot displays the relationship between feature values and Shap values. It allows you to investigate the general relationship between feature values and their impact on predictions. This plot helps you understand how the model uses features and their influence on the predicted outcome.

Creating Auto ml Experiment using Regression.

Let’s Create an AutoML Regression Experiment for Supermarket Sales Prediction. Data scientists can easily create and manage their experiments with AutoML. Let's see how to create an AutoML Regression experiment.

· To begin, click on the auto ml option in the ds lab tab and then click create experiment to create an auto ml experiment

· These datasets will serve as the foundation for your experiment.

· Now, let's configure the experiment-specific details. Start by providing a name for your experiment, such as 'Supermarket Regression.

o Experiment Type: Regression

· Under the Configure Dataset option:

o Select Sandbox as the dataset source.

o Select the File Type as CSV from the dropdown.

o Choose the required sandbox from the dropdown menu.

· In the Advanced Information section:

o Select the Data Preparation process you created earlier.

o Set the Target Column to ‘Total Sales’.

· Click Save to initialize and start running the experiment.

· Congratulations! You have successfully created a new AutoML Regression experiment. You will receive a notification confirming its creation.

· Now in the AutoML tab, you will see a list of all your experiments. Your newly created experiment will appear at the top.

· Each experiment has a status associated with it, indicating its progress and outcome. Initially, a new experiment has the status of 'Started'

As the model training progresses, the status changes to 'Running,' and you will receive notifications to keep you informed.

· Once the model training is successfully completed, the status changes to 'Completed,' indicated by a green color Once your AutoML experiment is completed, you have a range of options at your disposal to analyse the data and gain valuable insights. Utilizing tools such as Model Detail, Model Explainer, and Dataset Explainer. By leveraging these options, you can gain deep insights into your data, interpret the models' behavior, and make informed decisions based on the analysis of your AutoML experiment.

By following this workflow, you can unlock valuable insights from your data, interpret the behavior of the models, and make informed decisions.

Hope this Document has provided you with a clear understanding of how to leverage Automated Machine Learning for regression and classification tasks.

Last updated