Loading...
Loading...
Loading...
The View Explanation option will redirect the user to the below given options. Let us see all of them one by one explained as separate topics.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Model Summary option is displayed by default while clicking the View Explanation option for an Auto ML model.
The Model Summary/ Run Summary displays the basic information about the trained top model.
The Model Summary/ Run Summary will display the basic information about the trained top model. It opens by default by clicking the View Explanation option for the selected model.
The Model Summary page displays the details based on the selected Algorithm types:
Algorithm Name
Model Status
Created Date
Started Date
Duration
Performance Metrics are described by displaying the below-given metrics:
Root Mean Squared Error (RMSE): RMSE is the square root of the mean squared error. It is more interpretable than MSE and is often used to compare models with different units.
Median Absolute Error (MAE): MAE is a performance metric for regression models that measures the median of the absolute differences between the predicted values and the actual values.
R-squared (R2): R-squared measures the proportion of the variance in the dependent variable that is explained by the independent variables in the model. It is a popular metric for linear regression problems.
Pearsonr: Pearsonr is a function in the SciPy. Stats module that calculates the Pearson correlation coefficient and its p-value between two arrays of data. The Pearson correlation coefficient is a measure of the linear relationship between two variables.
Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted values and the actual values in the dataset. It is less sensitive to outliers than MSE and is a popular metric for regression problems.
Algorithm Name
Model Status
Created Date
Started Date
Duration
Performance Metrics are described by displaying the below-given metrics:
Root Mean Squared Error (RMSE): RMSE is the square root of the mean squared error. It is more interpretable than MSE and is often used to compare models with different units.
Mean Squared Error (MSE): MSE measures the average squared difference between the predicted values and the actual values in the dataset. It is a popular metric for regression problems and is sensitive to outliers.
Percentage Error (PE): PE can provide insight into the relative accuracy of the predictions. It tells the user how much, on average, the predictions deviate from the actual values in percentage terms.
Root Mean Absolute Error: RMSE is the square root of the mean squared error. It is more interpretable than MSE and is often used to compare models with different units.
Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted values and the actual values in the dataset. It is less sensitive to outliers than MSE and is a popular metric for regression problems.
Algorithm Name
Model Status
Created Date
Started Date
Duration
Performance Metrics are described by displaying the below-given metrics:
Precision: Precision is the percentage of correctly classified positive instances out of all the instances that were predicted as positive by the model. In other words, it measures how often the model correctly predicts the positive class.
Recall: Recall is the percentage of correctly classified positive instances out of all the actual positive instances in the dataset. In other words, it measures how well the model.
F1-score: The F1-score is the harmonic mean of precision and recall. It is a balance between precision and recall and is a better metric than accuracy when the dataset is imbalanced.
Support: Support is the number of instances in each class in the dataset. It can be used to identify imbalanced datasets where one class has significantly fewer instances than the others.
The Auto ML tab allows the users to create various experiments on top of their datasets and list all the created experiments.
Automated Machine Learning (AutoML) is a process that involves automating the selection of machine learning models and hyperparameters tuning. It aims to reduce the time and resources required to develop and train accurate models by automating some of the time-consuming and complex tasks.
The Auto ML feature provided under the Data Science Lab is capable of covering all the steps, from starting with a raw data set to creating a ready-to-go machine learning model.
An Auto ML experiment is the application of machine learning algorithms to a dataset.
Please Note:
AutoML functionality is a tool to help speed up the process of developing and training machine learning models. It’s always important to carefully evaluate the performance of a model generated by the AutoML tool.
The Create Experiment option is provided on the Data List page.
The user is taken to a dashboard upon clicking Model Explainer to gather insights and explanations about predictions made by the selected AutoML model.
Model interpretation techniques like SHAP values, permutation importance, and partial dependence plots are essential for understanding how a model arrives at its predictions. They shed light on which features are most influential and how they contribute to each prediction, offering transparency and insights into model behavior. These methods also help detect biases and errors, making machine learning models more trustworthy and interpretable to stakeholders. By leveraging model explainers, organizations can ensure that their AI systems are accountable and aligned with their goals and values.
Please Note: The user can access the Model Explainer Dashboard under the Model Interpretation page only.
The Dataset Explainer tab provides a high-level preview of the dataset that has been used for the experiment. It redirects the user to the Data Profile page.
The Data Profile is displayed using various sections such as:
Data Set Info
Variable Types
Warnings
Variables
Correlations
Missing Values
Sample
Let us see each of them one by one.
The Data Profile displayed under the Dataset Explainer section displays the following information for the Dataset.
Numbers of variables
Number of observations
Missing cells
Duplicate rows
Total size in memory
Average record size in memory
This section mentions variable types for the data set variables. The selected Data set contains the following variable types:
Numeric
Categorical
Boolean
Date
URL
Text (Unique)
Rejected
Unsupported
This section informs user about the warnings for the selected dataset.
It lists all the variables from the selected Data Set with the following details:
Distinct count
Unique
Missing (in percentage)
Missing (in number)
Infinite (in percentage)
Infinite (in number)
Mean
Minimum
Maximum
Zeros (in percentage)
It displays the variables in the correlation chart by using various popular methods.
This section provides information on the missing values through Count, Matrix, and Heatmap visualization.
Count: The count of missing values is explained through column chart.
Matrix
Heatmap
This section describes the first 10 and last 10 rows of the selected dataset as a sample.
A Data Scientist can create various Experiments based on specified algorithms.
There can be different types of Experiments based on the algorithm type specified. In the DS Lab module, we currently support Classification, Regression, and Forecasting.
A Classification experiment can be created for discrete data when the user wants to predict one of the several categories.
A Regression experiment can be created for continuous numeric values.
A Forecasting experiment can be created to predict future values based on historical data.
Please Note:
AutoML experiments are running as Jobs and a new Job will be allocated for each experiment created in the AutoML tab.
Jobs will spin up once the Experiment is created and after models are trained and ready, it will get killed automatically.
Creating an Experiment is a two-step process that involves configuration and selection of the algorithm type as steps.
A user can create a supervised learning (data science) experiment by choosing the Create Experiment option.
Please Note: The Create Experiment icon is provided on the Dataset List page under the Dataset tab of a Repo Sync Data Science Project.
Navigate to the Data List page.
Select a Dataset from the list.
Click the Create Experiment icon.
The Configure tab opens (by default) while selecting the Create Experiment option.
Provide the following information:
Provide a name for the experiment.
Provide Description (optional).
Select a Target Column.
Select a Data Preparation from the drop-down menu.
Use the checkbox to select a Data Preparation from the displayed drop-down.
Select columns that need to be excluded from the experiment.
Use the checkbox to select a field to be excluded from the experiment.
Please Note: The selected fields will not be considered while training the Auto ML model experiment.
Click the Next option.
The user gets redirected to the Select Experiment Type tab.
Select a prediction model using the checkbox.
Based on the selected experiment type a validation notification message appears.
Click the Done option.
A notification message appears.
The user is redirected to the AutoML list page.
The newly created experiment gets added to the list with Status mentioned as Started.
The Status tab indicates various phases of the experiments/model training. The different phases for an experiment are as given below:
The newly created experiment gets Started status. It is the first status when a new experiment is created.
Another notification message appears to inform the user that the model training has started. The same is indicated through the Status column of the model. The Status for such models will be Running.
After the experiment is completed, a notification message appears stating that the model trained. The Status for a trained model will be indicated as Completed.
Please Note: The unsuccessful experiments are indicated as Failed under the status. The View Report is mentioned in red color for the Failed experiments.
This page provides model explainer dashboards for Classification Models.
Check out the given walk-through to understand the Model Explainer dashboard for the Classification models.
This table shows the contribution each feature has had on prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each prediction has been built up from all the individual ingredients in the model.
This tab provides various stats regarding the Classification model.
It includes the following information:
Select a model cutoff such that all predicted probabilities higher than the cutoff will be labeled positive and all predicted probabilities lower than the cutoff will be labeled negative. The user can also set the cutoff as a percentile of all observations. By setting the cutoff it will automatically set the cutoff in the multiple other connected components.
It displays a list of various performance metrics.
The Confusion matrix/ shows the number of true negatives (predicted negative, observed negative), true positives (predicted positive, observed positive), false negatives (predicted negative but observed positive), and false positives (predicted positive but observed negative). The number of false negatives and false positives determine the costs of deploying an imperfect model. For different cut-offs, the user will get a different number of false positives and false negatives. This plot can help you select the optimal cutoff.
The user can see the relation between the predicted probability that a record belongs to the positive class and the percentage of observed records in the positive class on this plot. The observations get binned together in groups of roughly equal predicted probabilities and the percentage of positives is calculated for each bin. a perfectly calibrated model would show a straight line from the bottom left corner to the top right corner. a strong model would classify most observations correctly and close to 0% or 100% probability.
This plot displays the fraction of each class above and below the cut-off.
The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at different classification thresholds.
The true positive rate is the proportion of actual positive samples that are correctly identified as positive by the model, i.e., TP / (TP + FN). The false positive rate is the proportion of actual negative samples that are incorrectly identified as positive by the model, i.e., FP / (FP + TN).
It shows the trade-off between Precision and Recall in one plot.
The Lift Curve chart shows you the percentage of positive classes when you only select observations with a score above the cut-off vs selecting observations randomly. This displays to the user how much it is better than the random (the lift).
This plot shows the percentage of each label that you can expect when you only sample the top x% with the highest scores.
The user can select a record directly by choosing it from the dropdown or hit the Random Index option to randomly select a record that fits the constraints. For example, the user can select a record where the observed target value is negative but the predicted probability of the target being positive is very high. This allows the user to sample only false positives or only false negatives.
It displays the predicted probability for each target label.
This plot shows the contribution that each feature has provided to the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This helps to explain exactly how each prediction has been built up from all the individual ingredients in the model.
The PDP plot shows how the model prediction would change if you change one particular feature. the plot shows you a sample of observations and how these observations would change with this feature (gridlines). The average effect is shown in grey. The effect of changing the feature for a single record is shown in blue. The user can adjust how many observations to sample for the average, how many gridlines to show, and how many points along the x-axis to calculate model predictions for (grid points).
This table shows the contribution each individual feature has had on the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each individual prediction has been built up from all the individual ingredients in the model.
The What If Analysis is often used to help stakeholders understand the potential consequences of different scenarios or decisions. This tab displays how the outcome would change when the values of the selected variables get changed. This allows stakeholders to see how sensitive the outcome is to different inputs and can help them identify which variables are most important to focus on.
What-if analysis charts can be used in a variety of contexts, from financial modeling to marketing analysis to supply chain optimization. They are particularly useful when dealing with complex systems where it is difficult to predict the exact impact of different variables. By exploring a range of scenarios, analysts can gain a better understanding of the potential outcomes and make more informed decisions.
The user can adjust the input values to see predictions for what-if scenarios.
In a What-if analysis chart, analysts typically start by specifying a baseline scenario, which represents the current state of affairs. They then identify one or more variables that are likely to have a significant impact on the outcome of interest, and specify a range of possible values for each of these variables.
This table shows the contribution each individual feature has had on the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each individual prediction has been built up from all the individual ingredients in the model.
The Shap Summary summarizes the Shap values per feature. The user can either select an aggregate display that shows the mean absolute Shap value per feature or get a more detailed look at the spread of Shap values per feature and how they co-relate the feature value (red is high).
This plot displays the relation between feature values and Shap values. This allows you to investigate the general relationship between feature value and impact on the prediction. The users can check whether the model uses features in line with their intuitions, or use the plots to learn about the relationships that the model has learned between the input features and the predicted outcome.
This page provides model explainer dashboards for Regression Models.
Check out the given walk-through to understand the Model Explainer dashboard for the Regression models.
This table shows the contribution each feature has had on prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each prediction has been built up from all the individual ingredients in the model.
The user can find a number of regression performance metrics in this table that describe how well the model can predict the target column.
This plot shows the observed value of the target column and the predicted value of the target column. A perfect model would have all the points on the diagonal (predicted matches observed). The further away points are from the diagonal the worse the model is in predicting the target column.
Residuals: The residuals are the difference between the observed target column value and the predicted target column value. in this plot, one can check if the residuals are higher or lower for higher /lower actual /predicted outcomes. So, one can check if the model works better or worse for different target value levels.
Plot vs Features: This plot displays either residuals (difference between observed target value and predicted target value) plotted against the values of different features or the observed or predicted target value. This allows one to inspect whether the model is more inappropriate for a particular range of feature values than others.
The user can select a record directly by choosing it from the dropdown or hit the Random Index option to randomly select a record that fits the constraints. For example, the user can select a record where the observed target value is negative but the predicted probability of the target being positive is very high. This allows the user to sample only false positives or only false negatives.
It displays the predicted probability for each target label.
This plot shows the contribution that each feature has provided to the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This helps to explain exactly how each prediction has been built up from all the individual ingredients in the model.
The PDP plot shows how the model prediction would change if you change one particular feature. the plot shows you a sample of observations and how these observations would change with this feature (gridlines). The average effect is shown in grey. The effect of changing the feature for a single record is shown in blue. The user can adjust how many observations to sample for the average, how many gridlines to show, and how many points along the x-axis to calculate model predictions for (grid points).
This table shows the contribution each individual feature has had on the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each individual prediction has been built up from all the individual ingredients in the model.
The user can select a record directly by choosing it from the dropdown or hit the Random Index option to randomly select a record that fits the constraints. For example, the user can select a record where the observed target value is negative but the predicted probability of the target being positive is very high. This allows the user to sample only false positives or only false negatives.
It displays the predicted probability for each target label.
The user can adjust the input values to see predictions for what-if scenarios.
This table shows the contribution each individual feature has had on the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each individual prediction has been built up from all the individual ingredients in the model.
The Shap Summary summarizes the Shap values per feature. The user can either select an aggregate display that shows the mean absolute Shap value per feature or get a more detailed look at the spread of Shap values per feature and how they co-relate the feature value (red is high).
This plot displays the relation between feature values and Shap values. This allows you to investigate the general relationship between feature value and impact on the prediction. The users can check whether the model uses features in line with their intuitions, or use the plots to learn about the relationships that the model has learned between the input features and the predicted outcome.
Please Note: Refer the Data Science Lab Quick Start Flow page to get an overview of the Data Science Lab module in nutshell.
This section describes the Actions provided for the created AutoML experiments on the AutoML List page.
Once the initiated AutoML experiment is completed, it gets two Actions. The allotted Actions for an AutoML Experiment are:
Delete
View Report
It is indicated in Green color for the Completed Experiments (for the successful experiment).
It is indicated in Red color for the Failed Experiments).
This option provides the summary of the experiment (completed or failed) along with the details of the recommended model (in case of a completed experiment).
Navigate to the Auto ML tab.
All the created Experiments will be listed.
Select a Completed experiment.
Click the View Report option from the Actions column.
The Details tab opens for the selected completed experiment.
The Details tab opens while clicking the View Report icon for an experiment with Completed status.
Click the View Report option for a completed experiment.
The Details tab opens by default displaying the following details for the model:
Recommended Model: This will be the most suitable model determined based on the metric score of the model.
Model Name: Name of the model
Model Score: Score of the model
Metric Value: On which basis the model was considered
Created On: Date of model creation
Run Summary: This portion will have the basic information about the experiment and trained model.
Task Type: it displays the selected algorithm name to complete the experiment.
Experiment Status: This indicates the status of the AutoML model.
Created By: Name of the creator.
Dataset: mentions the dataset.
Target Column: It indicates the target column.
The Models tab lists the top three models based on their metrics score. The user gets the View Explanation option for each of the selected top three models to explain the details of that model.
Navigate to the Models tab of a completed Auto ML experiment.
Select a Model from the displayed list and click the View Explanation option. The View Explanation option allows the users to check details about each of the top 3 models.
A new page opens displaying the various information for the selected Model.
The following options are displayed for a selected model:
Model Summary: This tab displays the model summary for the selected model. It opens by default.
Model Interpretation: This tab contains the Model Explainer dashboard displaying the various details for the model.
Dataset Explainer: This tab displays the Data Profile of the dataset for the selected model.
Please Note: Refer to this document's View Explanation section for more details.
If the user opens the View Report option for a failed Experiment, it will display the Model Logs and mention the reason for the model's failure.
Navigate to the Auto ML tab.
Select a Failed experiment.
Click the View Report option from the Actions column.
The Logs tab opens for the selected completed experiment.
The Model Logs are displayed with the reason for failure.
The Delete option helps the user to remove the selected AutoML from the list.
Check out the walk-through to understand the steps to Delete an AutoML.
Navigate to the Auto ML list page.
Select a model/experiment from the list. (It can be any experiment irrespective of the Status).
Click the Delete icon for the model.
A dialog box opens to ensure the deletion.
Click the Yes option.
The selected experiment gets removed from the list.
Please Note: The user can remove any Auto ML experiment irrespective of its status.
This page provides model explainer dashboards for Forecasting Models.
Check out the given walk-through to understand the Model Explainer dashboard for the Forecasting models.
The forecasting model stats get displayed through the Timeseries visualization that presents values generated over based on the selected time.
This chart will display predicted values generated by the timeseries model over a specific time period.
This chart displays a comparison of the predicted values with the actual obsereved vlaues over a specific period of time.
It depicts difference between the predicted and actual (residuals) values over a period of time.
A Scatter Plot chart is displayed depicting how well the predicted values align with the actual values.
Please Note: Refer the page to get an overview of the Data Science Lab module in nutshell.