Perform Churn Analysis Using DS Lab and Explainable AI
To perform churn analysis using DS Lab Notebooks, apply explainable AI for model insights, and integrate results into a pipeline and data sandbox to drive customer retention strategies.
Purpose
This guide explains how to perform Churn Analysis using the BDB Data Science Lab (DS Lab), leveraging Explainable AI (XAI) and data pipelines for operational integration. The workflow demonstrates how to train and interpret a churn prediction model, export it to the Churn Prediction Pipeline, and write processed results into the Data Sandbox—enabling business users to drive customer retention strategies through data-driven decisions.
Business Context
Customer churn poses a critical challenge for telecom, banking, and subscription-based businesses. Predicting and understanding churn helps organizations proactively design retention campaigns, improve customer engagement, and optimize marketing spend.
This workflow integrates AI modeling, Explainable AI, and Pipeline Automation within BDB Platform’s unified architecture, enabling real-time model execution and storytelling through interactive dashboards.
Workflow Overview
The process includes the following high-level steps:
Create and Configure a DS Lab Project
Import and Execute the Churn Prediction Notebook
Train and Register the Model
Build a Churn Pipeline for Model Execution and Data Transformation
Write Processed Data into a Sandbox for Reporting and Visualization
Develop a Business Story for Explainable Insights
Step 1 – Create a New DS Lab Project
From the Apps Menu, open the DS Lab Plugin.
Click Create +.
Enter the following details:
FieldExampleDescriptionProject Name
DS LAB WORKFLOW 3
Unique name identifying this workflow
Description
“End-to-End Churn Analysis Project”
Optional
Algorithm Type
Regression / Forecasting / Classification
Select based on use case
Environment
Python (TensorFlow)
Chosen for this workflow
Resource Allocation
Medium
Adjust as per dataset size
Idle Shutdown
1 Hour
To release idle compute resources automatically
Click Save to create the project.
Once created, click Activate → then click View to open it.
Wait until the kernel initialization completes.
Step 2 – Import Notebook and Add Sandbox Data
Activate the Project
On the project card, click Activate.
Once activated, click View to enter the workspace.
Import the Churn Notebook
In the Repo section, click the three-dot (⋮) menu → select Import.
Enter a name (e.g., Churn Prediction Notebook) and a short description.
Browse and upload your
.ipynbnotebook file.
Step 3 – Explore and Preprocess Churn Data
3.1 Retrieve Data from Sandbox
from Notebook.DSNotebook.NotebookExecutor import NotebookExecutor
nb = NotebookExecutor()
data = nb.get_data('59801689926743119', '@SYS.USERID', 'True', {}, [])
data['Churn'] = data['Churn'].map({'No': 0, 'Yes': 1})
data.head(3)3.2 Inspect Dataset Structure
data.dtypes
data.select_dtypes('object').columns
data.select_dtypes('number').columnsThese commands identify categorical and numerical columns for downstream processing.
3.3 Preprocess Data
Define preprocessing functions for categorical and numerical variables:
import numpy as np
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
def preprocess_categorical(df_in, categorical_columns):
ohe = OneHotEncoder()
df_cat = ohe.fit_transform(df_in[categorical_columns]).todense()
return df_cat, ohe
def preprocess_numerical(df_in, numerical_columns):
scaler = MinMaxScaler()
df_num = scaler.fit_transform(df_in[numerical_columns])
return df_num, scaler
def preprocess_data(df_in, categorical_columns, numerical_columns):
df_cat, ohe = preprocess_categorical(df_in, categorical_columns)
df_num, scaler = preprocess_numerical(df_in, numerical_columns)
X = np.concatenate((df_cat, df_num), axis=1)
return X, ohe, scaler, df_cat.shape[1], df_num.shape[1]3.4 Define Feature Lists
categorical_columns = ['gender', 'Partner', 'Dependents', 'PhoneService',
'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup',
'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
'Contract', 'PaperlessBilling', 'PaymentMethod']
numerical_columns = ['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges']
X, ohe, scaler, n_cat_out, n_num_out = preprocess_data(data, categorical_columns, numerical_columns)
y = data['Churn']Step 4 – Train the Churn Prediction Model
4.1 Train/Test Split & Random Forest Model
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = RandomForestClassifier()
model.fit(X_train, y_train)
preds_train = model.predict(X_train)
preds_test = model.predict(X_test)
print(classification_report(y_train, preds_train))
print(classification_report(y_test, preds_test))The output provides accuracy, precision, recall, and F1-scores to evaluate churn prediction performance.
Step 5 – Save and Register the Model
In a new notebook cell, click the three dots (⋮) → select Save Model. A pre-formatted code snippet appears automatically.
Run the generated cell to save the trained model.
Open the Models tab (next to Data).
Click All, locate your model, then click the three dots (⋮) → select Register. The model is now available for reuse in pipelines.
Step 6 – Create a Sandbox Using Forecasting Data
From the Apps Menu, open the Data Center module.
Navigate to the Sandbox tab and click Create.
Provide:
Name:
Churn_Data_SandboxDescription: “Sandbox for Churn Forecasting Data”
Upload the CSV file.
Click Upload → A success message confirms creation.
Step 7 – Build the Churn Prediction Pipeline
7.1 Create a Pipeline
From the Apps Menu, open the Data Pipeline Plugin.
Click Create → provide:
Pipeline Name:
Churn_Prediction_DataPipelineDescription: “End-to-End DS Lab Workflow for Churn Analysis”
Resource Allocation: Medium
Click Save. The pipeline appears in the list and can now be configured.
Step 8 – Add Components to the Pipeline
8.1 Sandbox Reader
Drag the Sandbox Reader from the Reader section.
In Basic Information, set Invocation Type:
Realtime.In Meta Information, configure:
Storage Type: Network
File Type: CSV
Sandbox Name: Select previously created sandbox
Check Header and Infer Schema.
Click Save.
8.2 Add Kafka Event
Open Event Panel → + Add Event.
Set Partition: 1 → Add Kafka Event.
Drag the event to the canvas and connect it to the Sandbox Reader.
Step 9 – Add and Configure DS Lab Components
9.1 First DS Lab Component (Model Runner)
Drag DS Lab Component (Machine Learning section) → drop onto the canvas.
Set Invocation Type:
Batch.In Meta Information:
Execution Type:
Model RunnerProject Name: Churn Prediction Project
Model Name: churn_preprocess_model
Save and connect it to the previous event.
9.2 Python Script Component
Drag Python Script Component → set Invocation Type:
Batch.In Meta Information:
Component Name:
DropColumnScript:
def func(df): df_out = df.drop(columns=['index']) return df_outStart Function:
funcIn Event Data Type:
DataFrame
Save the component and connect it via a new Kafka Event.
9.3 Second DS Lab Component (Script Runner)
Drag another DS Lab Component.
Invocation Type:
Batch.Execution Type:
Script Runner.Configure:
Project Name: Churn Prediction Project
Script Name: churn_pred_util
Function Type:
DataFrameStart Function:
func
Add Input Arguments (Secrets):
host = @ENV.DS_CH_HOST port = @ENV.DS_CH_TCP_PORT database = @ENV.DS_CH_DB_DEVELOPMENT user = @ENV.DS_CH_USER_DEVELOPMENTSave configuration.
Step 10 – Add Sandbox Writer
Drag the Sandbox Writer from the Writer section.
Set Invocation Type:
Realtime.Configure:
Storage Type: Network
File Type: CSV
Save Mode: Overwrite
Target Sandbox File: Churn_Output.csv
Connect the output of the last DS Lab component to this writer.
Click Save.
Step 11 – Activate and Monitor the Pipeline
Click Update Pipeline → then click Activate.
Navigate to the Logs / Advanced Logs panel.
Ensure all pods are running successfully.
Click individual pods to view execution logs for each component.
Verify the message:
"Sandbox Writer successfully written data"— confirming pipeline completion.
Step 12 – Business Story Creation and Visualization
From the Apps Menu, open Data Center → Sandbox.
Create a Data Store from the sandbox output.
Open the Business Story Plugin.
Select appropriate visualizations:
Bar charts for churn distribution
Line graphs for trend analysis
Pie charts for retention segmentation
Use Explainable AI metrics to narrate why specific customers are predicted to churn.
Share insights with business teams to drive retention actions.
Step 13 – Deactivate Resources
After validation:
Deactivate the Pipeline.
Deactivate the DS Lab Project. This ensures optimal resource utilization.
Outcome
By completing this workflow, you have successfully:
Built a churn prediction model in DS Lab.
Applied Explainable AI for interpretability.
Deployed the model in a real-time pipeline.
Wrote transformed data into the Data Sandbox.
Created a business story dashboard for strategic decision-making.
This integrated approach empowers organizations to predict, explain, and prevent churn—transforming data into actionable retention strategies.