# Perform Churn Analysis Using DS Lab and Explainable AI

### **Purpose**

This guide explains how to perform **Churn Analysis** using the BDB **Data Science Lab (DS Lab)**, leveraging **Explainable AI (XAI)** and **data pipelines** for operational integration.\
The workflow demonstrates how to train and interpret a churn prediction model, export it to the **Churn Prediction Pipeline**, and write processed results into the **Data Sandbox**—enabling business users to drive **customer retention strategies** through data-driven decisions.

### **Business Context**

Customer churn poses a critical challenge for telecom, banking, and subscription-based businesses. Predicting and understanding churn helps organizations proactively design retention campaigns, improve customer engagement, and optimize marketing spend.

This workflow integrates **AI modeling**, **Explainable AI**, and **Pipeline Automation** within BDB Platform’s unified architecture, enabling real-time model execution and storytelling through interactive dashboards.

### **Workflow Overview**

The process includes the following high-level steps:

1. **Create and Configure a DS Lab Project**
2. **Import and Execute the Churn Prediction Notebook**
3. **Train and Register the Model**
4. **Build a Churn Pipeline for Model Execution and Data Transformation**
5. **Write Processed Data into a Sandbox for Reporting and Visualization**
6. **Develop a Business Story for Explainable Insights**

### **Step 1 – Create a New DS Lab Project**

1. From the **Apps Menu**, open the **DS Lab Plugin**.
2. Click **Create +**.
3. Enter the following details:

   <table><thead><tr><th width="182.22222900390625">Field</th><th width="231.5555419921875">Example</th><th>Description</th></tr></thead><tbody><tr><td><strong>Project Name</strong></td><td>DS LAB WORKFLOW 3</td><td>Unique name identifying this workflow</td></tr><tr><td><strong>Description</strong></td><td>“End-to-End Churn Analysis Project”</td><td>Optional</td></tr><tr><td><strong>Algorithm Type</strong></td><td>Regression / Forecasting / Classification</td><td>Select based on use case</td></tr><tr><td><strong>Environment</strong></td><td>Python (TensorFlow)</td><td>Chosen for this workflow</td></tr><tr><td><strong>Resource Allocation</strong></td><td>Medium</td><td>Adjust as per dataset size</td></tr><tr><td><strong>Idle Shutdown</strong></td><td>1 Hour</td><td>To release idle compute resources automatically</td></tr></tbody></table>
4. Click **Save** to create the project.
5. Once created, click **Activate** → then click **View** to open it.
6. Wait until the kernel initialization completes.

### **Step 2 – Import Notebook and Add Sandbox Data**

#### **Activate the Project**

* On the project card, click **Activate**.
* Once activated, click **View** to enter the workspace.

#### **Import the Churn Notebook**

1. In the **Repo** section, click the **three-dot (⋮)** menu → select **Import**.
2. Enter a name (e.g., *Churn Prediction Notebook*) and a short description.
3. Browse and upload your `.ipynb` notebook file.

### **Step 3 – Explore and Preprocess Churn Data**

#### **3.1 Retrieve Data from Sandbox**

```python
from Notebook.DSNotebook.NotebookExecutor import NotebookExecutor
nb = NotebookExecutor()
data = nb.get_data('59801689926743119', '@SYS.USERID', 'True', {}, [])
data['Churn'] = data['Churn'].map({'No': 0, 'Yes': 1})
data.head(3)
```

{% hint style="info" %}
**Note:** This code is auto-generated when you select a data source.
{% endhint %}

#### **3.2 Inspect Dataset Structure**

```python
data.dtypes
data.select_dtypes('object').columns
data.select_dtypes('number').columns
```

These commands identify **categorical** and **numerical** columns for downstream processing.

#### **3.3 Preprocess Data**

Define preprocessing functions for categorical and numerical variables:

```python
import numpy as np
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler

def preprocess_categorical(df_in, categorical_columns):
    ohe = OneHotEncoder()
    df_cat = ohe.fit_transform(df_in[categorical_columns]).todense()
    return df_cat, ohe

def preprocess_numerical(df_in, numerical_columns):
    scaler = MinMaxScaler()
    df_num = scaler.fit_transform(df_in[numerical_columns])
    return df_num, scaler

def preprocess_data(df_in, categorical_columns, numerical_columns):
    df_cat, ohe = preprocess_categorical(df_in, categorical_columns)
    df_num, scaler = preprocess_numerical(df_in, numerical_columns)
    X = np.concatenate((df_cat, df_num), axis=1)
    return X, ohe, scaler, df_cat.shape[1], df_num.shape[1]
```

#### **3.4 Define Feature Lists**

```python
categorical_columns = ['gender', 'Partner', 'Dependents', 'PhoneService',
 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup',
 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
 'Contract', 'PaperlessBilling', 'PaymentMethod']

numerical_columns = ['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges']

X, ohe, scaler, n_cat_out, n_num_out = preprocess_data(data, categorical_columns, numerical_columns)
y = data['Churn']
```

### **Step 4 – Train the Churn Prediction Model**

#### **4.1 Train/Test Split & Random Forest Model**

```python
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = RandomForestClassifier()
model.fit(X_train, y_train)

preds_train = model.predict(X_train)
preds_test = model.predict(X_test)

print(classification_report(y_train, preds_train))
print(classification_report(y_test, preds_test))
```

The output provides **accuracy, precision, recall,** and **F1-scores** to evaluate churn prediction performance.

### **Step 5 – Save and Register the Model**

1. In a new notebook cell, click the **three dots (⋮)** → select **Save Model**.\
   A pre-formatted code snippet appears automatically.
2. Run the generated cell to save the trained model.
3. Open the **Models tab** (next to Data).
4. Click **All**, locate your model, then click the **three dots (⋮)** → select **Register**.\
   The model is now available for reuse in pipelines.

### **Step 6 – Create a Sandbox Using Forecasting Data**

1. From the **Apps Menu**, open the **Data Center** module.
2. Navigate to the **Sandbox** tab and click **Create**.
3. Provide:
   * **Name:** `Churn_Data_Sandbox`
   * **Description:** “Sandbox for Churn Forecasting Data”
4. Upload the CSV file.
5. Click **Upload** → A success message confirms creation.

### **Step 7 – Build the Churn Prediction Pipeline**

#### **7.1 Create a Pipeline**

1. From the **Apps Menu**, open the **Data Pipeline Plugin**.
2. Click **Create** → provide:
   * **Pipeline Name:** `Churn_Prediction_DataPipeline`
   * **Description:** “End-to-End DS Lab Workflow for Churn Analysis”
   * **Resource Allocation:** Medium
3. Click **Save**.\
   The pipeline appears in the list and can now be configured.

### **Step 8 – Add Components to the Pipeline**

#### **8.1 Sandbox Reader**

* Drag the **Sandbox Reader** from the **Reader** section.
* In **Basic Information**, set Invocation Type: `Realtime`.
* In **Meta Information**, configure:
  * **Storage Type:** Network
  * **File Type:** CSV
  * **Sandbox Name:** Select previously created sandbox
  * Check **Header** and **Infer Schema**.
* Click **Save**.

#### **8.2 Add Kafka Event**

* Open **Event Panel → + Add Event**.
* Set **Partition:** 1 → **Add Kafka Event**.
* Drag the event to the canvas and connect it to the Sandbox Reader.

### **Step 9 – Add and Configure DS Lab Components**

#### **9.1 First DS Lab Component (Model Runner)**

* Drag **DS Lab Component** (Machine Learning section) → drop onto the canvas.
* Set Invocation Type: `Batch`.
* In **Meta Information:**
  * Execution Type: `Model Runner`
  * Project Name: *Churn Prediction Project*
  * Model Name: *churn\_preprocess\_model*
* Save and connect it to the previous event.

#### **9.2 Python Script Component**

* Drag **Python Script Component** → set Invocation Type: `Batch`.
* In **Meta Information:**
  * Component Name: `DropColumn`
  * Script:

    ```python
    def func(df):
        df_out = df.drop(columns=['index'])
        return df_out
    ```
  * Start Function: `func`
  * In Event Data Type: `DataFrame`
* Save the component and connect it via a new Kafka Event.

#### **9.3 Second DS Lab Component (Script Runner)**

* Drag another **DS Lab Component**.
* Invocation Type: `Batch`.
* Execution Type: `Script Runner`.
* Configure:
  * Project Name: *Churn Prediction Project*
  * Script Name: *churn\_pred\_util*
  * Function Type: `DataFrame`
  * Start Function: `func`
* Add **Input Arguments (Secrets):**

  ```
  host = @ENV.DS_CH_HOST
  port = @ENV.DS_CH_TCP_PORT
  database = @ENV.DS_CH_DB_DEVELOPMENT
  user = @ENV.DS_CH_USER_DEVELOPMENT
  ```
* Save configuration.

### **Step 10 – Add Sandbox Writer**

* Drag the **Sandbox Writer** from the Writer section.
* Set Invocation Type: `Realtime`.
* Configure:
  * Storage Type: Network
  * File Type: CSV
  * Save Mode: Overwrite
  * Target Sandbox File: *Churn\_Output.csv*
* Connect the output of the last DS Lab component to this writer.
* Click **Save**.

### **Step 11 – Activate and Monitor the Pipeline**

1. Click **Update Pipeline** → then click **Activate**.
2. Navigate to the **Logs / Advanced Logs** panel.
3. Ensure all pods are running successfully.
4. Click individual pods to view execution logs for each component.
5. Verify the message:\
   `"Sandbox Writer successfully written data"` — confirming pipeline completion.&#x20;

   <figure><img src="/files/lDIfamikbiuPUH8Muwbp" alt=""><figcaption></figcaption></figure>

### **Step 12 – Business Story Creation and Visualization**

1. From the **Apps Menu**, open **Data Center → Sandbox**.
2. Create a **Data Store** from the sandbox output.
3. Open the **Business Story Plugin**.
4. Select appropriate visualizations:
   * Bar charts for churn distribution
   * Line graphs for trend analysis
   * Pie charts for retention segmentation
5. Use Explainable AI metrics to narrate why specific customers are predicted to churn.
6. Share insights with business teams to drive retention actions.

### **Step 13 – Deactivate Resources**

After validation:

* Deactivate the Pipeline.
* Deactivate the DS Lab Project.\
  This ensures optimal resource utilization.

### **Outcome**

By completing this workflow, you have successfully:

* Built a **churn prediction model** in DS Lab.
* Applied **Explainable AI** for interpretability.
* Deployed the model in a **real-time pipeline**.
* Wrote transformed data into the **Data Sandbox**.
* Created a **business story dashboard** for strategic decision-making.

This integrated approach empowers organizations to predict, explain, and prevent churn—transforming data into actionable retention strategies.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bdb.ai/bdb-user-documentation/platform-modules/10.0/how-to-guides-and-tutorials/data-science-lab/perform-churn-analysis-using-ds-lab-and-explainable-ai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Field	Example	Description
Project Name	DS LAB WORKFLOW 3	Unique name identifying this workflow
Description	“End-to-End Churn Analysis Project”	Optional
Algorithm Type	Regression / Forecasting / Classification	Select based on use case
Environment	Python (TensorFlow)	Chosen for this workflow
Resource Allocation	Medium	Adjust as per dataset size
Idle Shutdown	1 Hour	To release idle compute resources automatically