Perform Churn Analysis Using DS Lab and Explainable AI

To perform churn analysis using DS Lab Notebooks, apply explainable AI for model insights, and integrate results into a pipeline and data sandbox to drive customer retention strategies.

Purpose

This guide explains how to perform Churn Analysis using the BDB Data Science Lab (DS Lab), leveraging Explainable AI (XAI) and data pipelines for operational integration. The workflow demonstrates how to train and interpret a churn prediction model, export it to the Churn Prediction Pipeline, and write processed results into the Data Sandbox—enabling business users to drive customer retention strategies through data-driven decisions.

Business Context

Customer churn poses a critical challenge for telecom, banking, and subscription-based businesses. Predicting and understanding churn helps organizations proactively design retention campaigns, improve customer engagement, and optimize marketing spend.

This workflow integrates AI modeling, Explainable AI, and Pipeline Automation within BDB Platform’s unified architecture, enabling real-time model execution and storytelling through interactive dashboards.

Workflow Overview

The process includes the following high-level steps:

Create and Configure a DS Lab Project
Import and Execute the Churn Prediction Notebook
Train and Register the Model
Build a Churn Pipeline for Model Execution and Data Transformation
Write Processed Data into a Sandbox for Reporting and Visualization
Develop a Business Story for Explainable Insights

Step 1 – Create a New DS Lab Project

From the Apps Menu, open the DS Lab Plugin.
Click Create +.
Enter the following details:
Field
Example
Description
Project Name
DS LAB WORKFLOW 3
Unique name identifying this workflow
Description
“End-to-End Churn Analysis Project”
Optional
Algorithm Type
Regression / Forecasting / Classification
Select based on use case
Environment
Python (TensorFlow)
Chosen for this workflow
Resource Allocation
Medium
Adjust as per dataset size
Idle Shutdown
1 Hour
To release idle compute resources automatically
Click Save to create the project.
Once created, click Activate → then click View to open it.
Wait until the kernel initialization completes.

Step 2 – Import Notebook and Add Sandbox Data

Activate the Project

On the project card, click Activate.
Once activated, click View to enter the workspace.

Import the Churn Notebook

In the Repo section, click the three-dot (⋮) menu → select Import.
Enter a name (e.g., Churn Prediction Notebook) and a short description.
Browse and upload your .ipynb notebook file.

Step 3 – Explore and Preprocess Churn Data

3.1 Retrieve Data from Sandbox

from Notebook.DSNotebook.NotebookExecutor import NotebookExecutor
nb = NotebookExecutor()
data = nb.get_data('59801689926743119', '@SYS.USERID', 'True', {}, [])
data['Churn'] = data['Churn'].map({'No': 0, 'Yes': 1})
data.head(3)

Note: This code is auto-generated when you select a data source.

3.2 Inspect Dataset Structure

data.dtypes
data.select_dtypes('object').columns
data.select_dtypes('number').columns

These commands identify categorical and numerical columns for downstream processing.

3.3 Preprocess Data

Define preprocessing functions for categorical and numerical variables:

import numpy as np
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler

def preprocess_categorical(df_in, categorical_columns):
    ohe = OneHotEncoder()
    df_cat = ohe.fit_transform(df_in[categorical_columns]).todense()
    return df_cat, ohe

def preprocess_numerical(df_in, numerical_columns):
    scaler = MinMaxScaler()
    df_num = scaler.fit_transform(df_in[numerical_columns])
    return df_num, scaler

def preprocess_data(df_in, categorical_columns, numerical_columns):
    df_cat, ohe = preprocess_categorical(df_in, categorical_columns)
    df_num, scaler = preprocess_numerical(df_in, numerical_columns)
    X = np.concatenate((df_cat, df_num), axis=1)
    return X, ohe, scaler, df_cat.shape[1], df_num.shape[1]

3.4 Define Feature Lists

categorical_columns = ['gender', 'Partner', 'Dependents', 'PhoneService',
 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup',
 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
 'Contract', 'PaperlessBilling', 'PaymentMethod']

numerical_columns = ['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges']

X, ohe, scaler, n_cat_out, n_num_out = preprocess_data(data, categorical_columns, numerical_columns)
y = data['Churn']

Step 4 – Train the Churn Prediction Model

4.1 Train/Test Split & Random Forest Model

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = RandomForestClassifier()
model.fit(X_train, y_train)

preds_train = model.predict(X_train)
preds_test = model.predict(X_test)

print(classification_report(y_train, preds_train))
print(classification_report(y_test, preds_test))

The output provides accuracy, precision, recall, and F1-scores to evaluate churn prediction performance.

Step 5 – Save and Register the Model

In a new notebook cell, click the three dots (⋮) → select Save Model. A pre-formatted code snippet appears automatically.
Run the generated cell to save the trained model.
Open the Models tab (next to Data).
Click All, locate your model, then click the three dots (⋮) → select Register. The model is now available for reuse in pipelines.

Step 6 – Create a Sandbox Using Forecasting Data

From the Apps Menu, open the Data Center module.
Navigate to the Sandbox tab and click Create.
Provide:
- Name: Churn_Data_Sandbox
- Description: “Sandbox for Churn Forecasting Data”
Upload the CSV file.
Click Upload → A success message confirms creation.

Step 7 – Build the Churn Prediction Pipeline

7.1 Create a Pipeline

From the Apps Menu, open the Data Pipeline Plugin.
Click Create → provide:
- Pipeline Name: Churn_Prediction_DataPipeline
- Description: “End-to-End DS Lab Workflow for Churn Analysis”
- Resource Allocation: Medium
Click Save. The pipeline appears in the list and can now be configured.

Step 8 – Add Components to the Pipeline

8.1 Sandbox Reader

Drag the Sandbox Reader from the Reader section.
In Basic Information, set Invocation Type: Realtime.
In Meta Information, configure:
- Storage Type: Network
- File Type: CSV
- Sandbox Name: Select previously created sandbox
- Check Header and Infer Schema.
Click Save.

8.2 Add Kafka Event

Open Event Panel → + Add Event.
Set Partition: 1 → Add Kafka Event.
Drag the event to the canvas and connect it to the Sandbox Reader.

Step 9 – Add and Configure DS Lab Components

9.1 First DS Lab Component (Model Runner)

Drag DS Lab Component (Machine Learning section) → drop onto the canvas.
Set Invocation Type: Batch.
In Meta Information:
- Execution Type: Model Runner
- Project Name: Churn Prediction Project
- Model Name: churn_preprocess_model
Save and connect it to the previous event.

9.2 Python Script Component

Drag Python Script Component → set Invocation Type: Batch.
In Meta Information:
- Component Name: DropColumn
- Script:
  def func(df): df_out = df.drop(columns=['index']) return df_out
- Start Function: func
- In Event Data Type: DataFrame
Save the component and connect it via a new Kafka Event.

9.3 Second DS Lab Component (Script Runner)

Drag another DS Lab Component.
Invocation Type: Batch.
Execution Type: Script Runner.
Configure:
- Project Name: Churn Prediction Project
- Script Name: churn_pred_util
- Function Type: DataFrame
- Start Function: func

Add Input Arguments (Secrets):

host = @ENV.DS_CH_HOST
port = @ENV.DS_CH_TCP_PORT
database = @ENV.DS_CH_DB_DEVELOPMENT
user = @ENV.DS_CH_USER_DEVELOPMENT

Save configuration.

Step 10 – Add Sandbox Writer

Drag the Sandbox Writer from the Writer section.
Set Invocation Type: Realtime.
Configure:
- Storage Type: Network
- File Type: CSV
- Save Mode: Overwrite
- Target Sandbox File: Churn_Output.csv
Connect the output of the last DS Lab component to this writer.
Click Save.

Step 11 – Activate and Monitor the Pipeline

Click Update Pipeline → then click Activate.
Navigate to the Logs / Advanced Logs panel.
Ensure all pods are running successfully.
Click individual pods to view execution logs for each component.
Verify the message: "Sandbox Writer successfully written data" — confirming pipeline completion.

Step 12 – Business Story Creation and Visualization

From the Apps Menu, open Data Center → Sandbox.
Create a Data Store from the sandbox output.
Open the Business Story Plugin.
Select appropriate visualizations:
- Bar charts for churn distribution
- Line graphs for trend analysis
- Pie charts for retention segmentation
Use Explainable AI metrics to narrate why specific customers are predicted to churn.
Share insights with business teams to drive retention actions.

Step 13 – Deactivate Resources

After validation:

Deactivate the Pipeline.
Deactivate the DS Lab Project. This ensures optimal resource utilization.

Outcome

By completing this workflow, you have successfully:

Built a churn prediction model in DS Lab.
Applied Explainable AI for interpretability.
Deployed the model in a real-time pipeline.
Wrote transformed data into the Data Sandbox.
Created a business story dashboard for strategic decision-making.

This integrated approach empowers organizations to predict, explain, and prevent churn—transforming data into actionable retention strategies.

PreviousCreate Forecasting Models in DS Lab NextBuild and Deploy a Sentiment Analysis Model as an API in DS Lab