Classification Model Explainer

The Classification Model Explainer in the Model Explainer provides a comprehensive suite of tools to interpret how a classification model makes decisions. It includes dedicated tabs that enable users to understand feature importance, individual prediction behavior, scenario testing, and feature dependencies through intuitive visualizations and interactive controls.

Feature Importance Tab

Purpose: The Feature Importance tab identifies which input features have the greatest impact on the model’s predictions, enabling users to interpret global model behavior.

Key interface elements:

Importance Type Selector:
- Two available methods are shown:
  - SHAP Values (currently selected) – a model-agnostic explainability technique that quantifies how each feature contributes to the model’s output.
  - Permutation Importances – another option that measures how model performance changes when a feature’s values are randomly shuffled.
No. of Features Dropdown:
- The user has chosen to display the Top 10 features ranked by their importance scores.
Visualization (Horizontal Bar Chart):
- Displays the ranked list of features in descending order of impact, based on SHAP values.
- Each bar’s length represents the magnitude of feature contribution to the model’s predictions.

Use Cases:

Identify the most influential drivers of model performance.
Detect irrelevant or redundant features that may be excluded during model retraining.
Validate whether influential features align with domain expertise and business intuition.

Classification Stats

The Classification Stats section provides analytical tools and visualizations that help users assess the predictive accuracy, discrimination power, and calibration of a classification model. These components collectively enable model validation, threshold optimization, and decision boundary analysis to ensure reliable deployment in real-world scenarios.

Global Cutoff

Purpose: The Global Cutoff defines the probability threshold that separates predicted positive outcomes from predicted negative outcomes. Adjusting this cutoff helps optimize model performance for specific business or operational objectives.

Functionality:

Predictions with probabilities above the cutoff are classified as positive, while those below the cutoff are classified as negative.
Users can alternatively define the cutoff as a percentile of all predicted probabilities.
Once set, the cutoff automatically propagates to connected visual and analytical components (e.g., Confusion Matrix, Classification Plot, Lift Curve).

Use Cases:

Optimize the trade-off between sensitivity (recall) and specificity.
Align prediction thresholds with business priorities (e.g., fraud detection sensitivity vs. false alarm rate).
Ensure consistency across all evaluation visualizations and reports.

Model Performance Metrics

Purpose: Displays a comprehensive summary of standard classification performance metrics to quantify the model’s predictive power and calibration.

Common Metrics Include:

Accuracy: Ratio of correctly predicted instances to total predictions.
Precision: Fraction of predicted positives that are actually positive.
Recall (Sensitivity): Fraction of actual positives correctly identified.
F1 Score: Harmonic mean of precision and recall.
AUC (Area Under Curve): Measures the model’s ability to distinguish between classes.
Specificity, Log Loss, and Matthews Correlation Coefficient (MCC) where applicable.

Use Cases:

Compare models quantitatively during evaluation or retraining cycles.
Identify strengths and weaknesses (e.g., precision-oriented vs. recall-oriented models).
Establish standardized metrics for governance and performance reporting.

3. Confusion Matrix

Purpose: The Confusion Matrix provides a visual summary of prediction outcomes by comparing predicted classes against actual observations.

Structure:

Predicted Negative

Predicted Positive

Actual Negative

True Negative (TN)

False Positive (FP)

Actual Positive

False Negative (FN)

True Positive (TP)

Interpretation:

True Positives (TP): Correctly predicted positive cases.
True Negatives (TN): Correctly predicted negative cases.
False Positives (FP): Incorrectly predicted positives (Type I error).
False Negatives (FN): Missed positives (Type II error).

The matrix helps visualize error trade-offs as the cutoff varies, enabling users to select an optimal threshold balancing business costs and risks.

4. Precision Plot

Purpose: The Precision Plot assesses the calibration of predicted probabilities by showing how well they align with observed positive outcomes.

Functionality:

Observations are grouped (binned) by similar predicted probabilities.
For each bin, the percentage of actual positives is plotted against the average predicted probability.
A perfectly calibrated model follows a diagonal line from (0,0) to (1,1).
Well-performing models display clear separation, with most observations near 0% or 100%.

Use Cases:

Evaluate probability calibration for risk scoring models.
Identify probability ranges where model confidence may be overstated or understated.
Support post-model calibration methods such as Platt scaling or isotonic regression.

5. Classification Plot

Purpose: The Classification Plot displays the fraction of each predicted class above and below the chosen cutoff, providing a visual summary of the model’s classification distribution.

Interpretation: This helps users quickly gauge the class balance, identify any bias toward one class, and visualize how the selected threshold affects predicted class proportions.

Use Cases:

Validate model output distribution across thresholds.
Detect class imbalance or prediction skew.
Optimize cutoff settings to achieve desired positive-class proportions.

6. ROC AUC Plot

Purpose: The Receiver Operating Characteristic (ROC) Curve illustrates the model’s capability to distinguish between classes at various thresholds.

Functionality:

X-axis: False Positive Rate (FPR) = FP / (FP + TN).
Y-axis: True Positive Rate (TPR) = TP / (TP + FN).
The AUC (Area Under Curve) quantifies the overall discriminative power—values closer to 1.0 indicate superior performance.

Use Cases:

Compare competing models’ discriminative abilities.
Select thresholds that maximize TPR while minimizing FPR.
Provide standardized visual evidence for model evaluation reports.

7. PR AUC Plot (Precision-Recall Curve)

Purpose: The Precision-Recall (PR) Curve shows the trade-off between precision and recall at different thresholds, especially informative for imbalanced datasets.

Interpretation: A model with high area under the PR curve (AUC-PR) maintains strong precision and recall across thresholds, indicating robust performance on minority classes.

Use Cases:

Evaluate classifier performance on imbalanced data (e.g., fraud, rare diseases).
Optimize decision thresholds for minority event detection.

8. Lift Curve

Purpose: The Lift Curve compares the effectiveness of the model’s predictions against random selection.

Functionality:

The curve plots the percentage of positive observations captured when selecting the top X% of records ranked by prediction score.
Lift is calculated as:
Lift=Positive rate with modelPositive rate with random selection\text{Lift} = \frac{\text{Positive rate with model}}{\text{Positive rate with random selection}}Lift=Positive rate with random selectionPositive rate with model
A lift greater than 1 indicates the model performs better than random guessing.

Use Cases:

Quantify the improvement over baseline for marketing, credit scoring, or risk models.
Evaluate return on targeting strategies using model-driven segmentation.

9. Cumulative Precision Plot

Purpose: The Cumulative Precision Plot shows the percentage of positive labels expected when sampling the top X% of records with the highest predicted scores.

Interpretation: This cumulative view helps decision-makers determine how many top-ranked records should be targeted to achieve a desired precision level.

Use Cases:

Identify optimal targeting segments (e.g., top 10% of customers likely to convert).
Support operational planning for limited resource allocation (e.g., audit prioritization, campaign selection).

Summary

Visualization

Purpose

Key Insights

Typical Use

Global Cutoff

Define probability threshold

Optimize classification decision boundary

Control model sensitivity/specificity

Model Performance Metrics

Summarize core KPIs

Quantify accuracy, precision, recall, F1

Benchmark and monitor model quality

Confusion Matrix

Visualize classification outcomes

Evaluate TP/FP/FN/TN trade-offs

Select optimal threshold

Precision Plot

Check calibration

Validate probability accuracy

Model reliability analysis

Classification Plot

Show class fractions vs cutoff

Identify bias and distribution

Threshold tuning

ROC AUC Plot

Compare TPR vs FPR

Evaluate separability

Model discrimination assessment

PR AUC Plot

Show Precision-Recall tradeoff

Evaluate minority-class performance

Imbalanced dataset analysis

Lift Curve

Compare to random selection

Quantify model’s predictive gain

Marketing and risk scoring

Cumulative Precision

Show top X% sampling precision

Optimize targeting or resource allocation

Prioritization strategies

Individual Predictions Tab

The Individual Predictions tab provides local explanations for single prediction instances, allowing users to trace specific outcomes back to their feature-level contributions.

This section of the Model Explainer provides a detailed breakdown of how individual predictions are formed for specific observations. It helps users understand how input features interact within the model to generate a given predicted probability for each target label. These insights enhance transparency, interpretability, and trust in AI-driven decision-making.

Select Index

The user can select a record directly by choosing it from the dropdown or by hitting the Random Index option to randomly select a record that fits the constraints. For example, the user can select a record where the observed target value is negative but the predicted probability of the target being positive is very high. This allows the user to sample only false positives or only false negatives.

Prediction

Purpose: Displays the predicted probability associated with each target label for the selected observation.

Functionality:

Shows the model’s confidence level in assigning an observation to a specific class or category.
Allows comparison between multiple class probabilities to identify the most likely predicted outcome.
Helps users validate if the model’s probability distribution aligns with domain expectations.

Use Cases:

Interpret classification model output beyond binary “positive” or “negative” labels.
Assess model confidence in multi-class scenarios (e.g., customer churn risk, product recommendation likelihood).
Support probability calibration and threshold tuning during model evaluation.

Contributions Plot

Purpose: The Contributions Plot provides a visual explanation of how each feature contributes to the prediction for an individual observation.

Functionality:

Starts from the population average prediction (baseline value).
Adds or subtracts feature-level contributions until reaching the final predicted probability for the selected record.
Positive contributions indicate features that increase the prediction value, while negative contributions indicate features that decrease it.

Interpretation: This visualization helps users understand precisely how the model builds each prediction, highlighting which variables were most influential for a particular instance.

Use Cases:

Provide transparency into model decision-making at the individual level.
Debug unexpected predictions by inspecting the contribution of key features.
Identify outliers or anomalous prediction behavior.

Partial Dependence Plot (PDP)

Purpose: The Partial Dependence Plot (PDP) illustrates how the model’s prediction changes as the value of one feature varies, holding other features constant.

Functionality:

X-axis: Represents the selected feature.
Y-axis: Represents the model’s predicted outcome or probability.
Grey Line: Shows the average effect of changing the feature across sampled observations.
Blue Line(s): Represent individual record trajectories, depicting how predictions evolve for specific observations.
User Controls: Allow adjustments for:
- Number of observations sampled for averaging.
- Number of gridlines (individual observations) shown.
- Number of grid points (intervals) used to compute predictions along the x-axis.

Interpretation: The PDP provides insights into feature sensitivity and non-linear relationships—showing how model output reacts to variations in a single input variable.

Use Cases:

Detect critical thresholds or breakpoints (e.g., income, age, credit score).
Analyze monotonic or non-monotonic feature effects.
Guide feature engineering and inform business rule definitions.

Contributions Table

Purpose: The Contributions Table complements the visual Contributions Plot by providing a tabular summary of how each feature contributes numerically to the prediction for a specific observation.

Functionality:

Lists all input features with their corresponding contribution value and direction of influence.
The cumulative sum of all contributions (starting from the population average) equals the final predicted probability.
Allows precise quantification of feature effects for validation and reporting.

Interpretation: This tabular representation provides granular transparency into the prediction process, helping analysts and auditors trace the decision pathway behind any individual prediction.

Use Cases:

Support model explainability in audit and compliance scenarios.
Facilitate communication between data scientists and business users.
Enable comparison of feature contributions across multiple records.

Summary

Component

Purpose

Key Insights

Typical Use

Predicted Probability

Displays confidence per target label

Understand likelihood distribution

Validate prediction confidence

Contributions Plot

Visual breakdown of feature effects

See how individual features build the prediction

Explain model decision paths

Partial Dependence Plot

Shows sensitivity to one feature

Identify thresholds and non-linear relationships

Guide feature engineering

Contributions Table

Numeric breakdown of contributions

Quantify and audit prediction influence

Governance and validation

What-if Analysis Tab

Purpose: The What-if Analysis tab enables users to test hypothetical input scenarios and observe how real-time changes in features influence the model’s output.

Key Components:

Manual Input Modification: Allows direct adjustment of feature values (e.g., income, age, transaction amount) for the selected record.
Real-time Prediction Updates: The dashboard dynamically recalculates predictions with modified inputs.
Sensitivity Testing: Supports experimentation to evaluate the stability of the model under changing conditions.

Use Cases:

Evaluate fairness by simulating demographic or socioeconomic variations.
Test operational thresholds (e.g., determine at what credit score a loan is approved).
Assess the robustness and resilience of predictions to small perturbations in feature values.

Feature Input

The user can adjust the input values to see predictions for what-if scenarios.

Feature Dependence Tab

Purpose: The Feature Dependence tab visualizes the relationship between an individual feature and the model’s predicted outcome using Partial Dependence Plots (PDPs) or ICE (Individual Conditional Expectation) curves.

Shap Summary

The Shap Summary summarizes the Shap values per feature. The user can either select an aggregate display that shows the mean absolute Shap value per feature or get a more detailed look at the spread of Shap values per feature and how they correlate with the feature value (red is high).

Shap Dependence

This plot displays the relation between feature values and Shap values. This allows you to investigate the general relationship between feature value and impact on the prediction. The users can check whether the model uses features in line with their intuitions, or use the plots to learn about the relationships that the model has learned between the input features and the predicted outcome.

Use Cases:

Identify key inflection points (e.g., optimal credit score or income level).
Understand monotonic or non-monotonic feature effects on prediction outcomes.
Inform feature engineering strategies and guide business rule formulation.

Summary

Each tab in the Model Explainer serves a specific interpretability objective:

Tab Name

Interpretability Focus

Primary Value

Feature Importance

Global influence across all predictions

Understand dominant drivers

Individual Predictions

Local explanation for single instance

Explain individual outcomes

What-if Analysis

Scenario-based input testing

Evaluate model sensitivity and fairness

Feature Dependence

Relationship visualization

Reveal trends and thresholds for business decisions

PreviousModel Explainer Dashboard NextRegression Model Explainer

Last updated 20 days ago