Perform Churn Analysis Using DS Lab and Explainable AI

To perform churn analysis using DS Lab Notebooks, apply explainable AI for model insights, and integrate results into a pipeline and data sandbox to drive customer retention strategies.

Purpose

This guide explains how to perform Churn Analysis using the BDB Data Science Lab (DS Lab), leveraging Explainable AI (XAI) and data pipelines for operational integration. The workflow demonstrates how to train and interpret a churn prediction model, export it to the Churn Prediction Pipeline, and write processed results into the Data Sandbox—enabling business users to drive customer retention strategies through data-driven decisions.

Business Context

Customer churn poses a critical challenge for telecom, banking, and subscription-based businesses. Predicting and understanding churn helps organizations proactively design retention campaigns, improve customer engagement, and optimize marketing spend.

This workflow integrates AI modeling, Explainable AI, and Pipeline Automation within BDB Platform’s unified architecture, enabling real-time model execution and storytelling through interactive dashboards.

Workflow Overview

The process includes the following high-level steps:

  1. Create and Configure a DS Lab Project

  2. Import and Execute the Churn Prediction Notebook

  3. Train and Register the Model

  4. Build a Churn Pipeline for Model Execution and Data Transformation

  5. Write Processed Data into a Sandbox for Reporting and Visualization

  6. Develop a Business Story for Explainable Insights

Step 1 – Create a New DS Lab Project

  1. From the Apps Menu, open the DS Lab Plugin.

  2. Click Create +.

  3. Enter the following details:

    Field
    Example
    Description

    Project Name

    DS LAB WORKFLOW 3

    Unique name identifying this workflow

    Description

    “End-to-End Churn Analysis Project”

    Optional

    Algorithm Type

    Regression / Forecasting / Classification

    Select based on use case

    Environment

    Python (TensorFlow)

    Chosen for this workflow

    Resource Allocation

    Medium

    Adjust as per dataset size

    Idle Shutdown

    1 Hour

    To release idle compute resources automatically

  4. Click Save to create the project.

  5. Once created, click Activate → then click View to open it.

  6. Wait until the kernel initialization completes.

Step 2 – Import Notebook and Add Sandbox Data

Activate the Project

  • On the project card, click Activate.

  • Once activated, click View to enter the workspace.

Import the Churn Notebook

  1. In the Repo section, click the three-dot (⋮) menu → select Import.

  2. Enter a name (e.g., Churn Prediction Notebook) and a short description.

  3. Browse and upload your .ipynb notebook file.

Step 3 – Explore and Preprocess Churn Data

3.1 Retrieve Data from Sandbox

from Notebook.DSNotebook.NotebookExecutor import NotebookExecutor
nb = NotebookExecutor()
data = nb.get_data('59801689926743119', '@SYS.USERID', 'True', {}, [])
data['Churn'] = data['Churn'].map({'No': 0, 'Yes': 1})
data.head(3)

Note: This code is auto-generated when you select a data source.

3.2 Inspect Dataset Structure

data.dtypes
data.select_dtypes('object').columns
data.select_dtypes('number').columns

These commands identify categorical and numerical columns for downstream processing.

3.3 Preprocess Data

Define preprocessing functions for categorical and numerical variables:

import numpy as np
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler

def preprocess_categorical(df_in, categorical_columns):
    ohe = OneHotEncoder()
    df_cat = ohe.fit_transform(df_in[categorical_columns]).todense()
    return df_cat, ohe

def preprocess_numerical(df_in, numerical_columns):
    scaler = MinMaxScaler()
    df_num = scaler.fit_transform(df_in[numerical_columns])
    return df_num, scaler

def preprocess_data(df_in, categorical_columns, numerical_columns):
    df_cat, ohe = preprocess_categorical(df_in, categorical_columns)
    df_num, scaler = preprocess_numerical(df_in, numerical_columns)
    X = np.concatenate((df_cat, df_num), axis=1)
    return X, ohe, scaler, df_cat.shape[1], df_num.shape[1]

3.4 Define Feature Lists

categorical_columns = ['gender', 'Partner', 'Dependents', 'PhoneService',
 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup',
 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
 'Contract', 'PaperlessBilling', 'PaymentMethod']

numerical_columns = ['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges']

X, ohe, scaler, n_cat_out, n_num_out = preprocess_data(data, categorical_columns, numerical_columns)
y = data['Churn']

Step 4 – Train the Churn Prediction Model

4.1 Train/Test Split & Random Forest Model

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = RandomForestClassifier()
model.fit(X_train, y_train)

preds_train = model.predict(X_train)
preds_test = model.predict(X_test)

print(classification_report(y_train, preds_train))
print(classification_report(y_test, preds_test))

The output provides accuracy, precision, recall, and F1-scores to evaluate churn prediction performance.

Step 5 – Save and Register the Model

  1. In a new notebook cell, click the three dots (⋮) → select Save Model. A pre-formatted code snippet appears automatically.

  2. Run the generated cell to save the trained model.

  3. Open the Models tab (next to Data).

  4. Click All, locate your model, then click the three dots (⋮) → select Register. The model is now available for reuse in pipelines.

Step 6 – Create a Sandbox Using Forecasting Data

  1. From the Apps Menu, open the Data Center module.

  2. Navigate to the Sandbox tab and click Create.

  3. Provide:

    • Name: Churn_Data_Sandbox

    • Description: “Sandbox for Churn Forecasting Data”

  4. Upload the CSV file.

  5. Click Upload → A success message confirms creation.

Step 7 – Build the Churn Prediction Pipeline

7.1 Create a Pipeline

  1. From the Apps Menu, open the Data Pipeline Plugin.

  2. Click Create → provide:

    • Pipeline Name: Churn_Prediction_DataPipeline

    • Description: “End-to-End DS Lab Workflow for Churn Analysis”

    • Resource Allocation: Medium

  3. Click Save. The pipeline appears in the list and can now be configured.

Step 8 – Add Components to the Pipeline

8.1 Sandbox Reader

  • Drag the Sandbox Reader from the Reader section.

  • In Basic Information, set Invocation Type: Realtime.

  • In Meta Information, configure:

    • Storage Type: Network

    • File Type: CSV

    • Sandbox Name: Select previously created sandbox

    • Check Header and Infer Schema.

  • Click Save.

8.2 Add Kafka Event

  • Open Event Panel → + Add Event.

  • Set Partition: 1 → Add Kafka Event.

  • Drag the event to the canvas and connect it to the Sandbox Reader.

Step 9 – Add and Configure DS Lab Components

9.1 First DS Lab Component (Model Runner)

  • Drag DS Lab Component (Machine Learning section) → drop onto the canvas.

  • Set Invocation Type: Batch.

  • In Meta Information:

    • Execution Type: Model Runner

    • Project Name: Churn Prediction Project

    • Model Name: churn_preprocess_model

  • Save and connect it to the previous event.

9.2 Python Script Component

  • Drag Python Script Component → set Invocation Type: Batch.

  • In Meta Information:

    • Component Name: DropColumn

    • Script:

      def func(df):
          df_out = df.drop(columns=['index'])
          return df_out
    • Start Function: func

    • In Event Data Type: DataFrame

  • Save the component and connect it via a new Kafka Event.

9.3 Second DS Lab Component (Script Runner)

  • Drag another DS Lab Component.

  • Invocation Type: Batch.

  • Execution Type: Script Runner.

  • Configure:

    • Project Name: Churn Prediction Project

    • Script Name: churn_pred_util

    • Function Type: DataFrame

    • Start Function: func

  • Add Input Arguments (Secrets):

    host = @ENV.DS_CH_HOST
    port = @ENV.DS_CH_TCP_PORT
    database = @ENV.DS_CH_DB_DEVELOPMENT
    user = @ENV.DS_CH_USER_DEVELOPMENT
  • Save configuration.

Step 10 – Add Sandbox Writer

  • Drag the Sandbox Writer from the Writer section.

  • Set Invocation Type: Realtime.

  • Configure:

    • Storage Type: Network

    • File Type: CSV

    • Save Mode: Overwrite

    • Target Sandbox File: Churn_Output.csv

  • Connect the output of the last DS Lab component to this writer.

  • Click Save.

Step 11 – Activate and Monitor the Pipeline

  1. Click Update Pipeline → then click Activate.

  2. Navigate to the Logs / Advanced Logs panel.

  3. Ensure all pods are running successfully.

  4. Click individual pods to view execution logs for each component.

  5. Verify the message: "Sandbox Writer successfully written data" — confirming pipeline completion.

Step 12 – Business Story Creation and Visualization

  1. From the Apps Menu, open Data Center → Sandbox.

  2. Create a Data Store from the sandbox output.

  3. Open the Business Story Plugin.

  4. Select appropriate visualizations:

    • Bar charts for churn distribution

    • Line graphs for trend analysis

    • Pie charts for retention segmentation

  5. Use Explainable AI metrics to narrate why specific customers are predicted to churn.

  6. Share insights with business teams to drive retention actions.

Step 13 – Deactivate Resources

After validation:

  • Deactivate the Pipeline.

  • Deactivate the DS Lab Project. This ensures optimal resource utilization.

Outcome

By completing this workflow, you have successfully:

  • Built a churn prediction model in DS Lab.

  • Applied Explainable AI for interpretability.

  • Deployed the model in a real-time pipeline.

  • Wrote transformed data into the Data Sandbox.

  • Created a business story dashboard for strategic decision-making.

This integrated approach empowers organizations to predict, explain, and prevent churn—transforming data into actionable retention strategies.