1 of 81

Data Science Lab

What is Data Science Lab?

The BDB Data Science Lab serves as a collaborative hub for data scientists to work together. Within this module, they can collectively conduct experiments, and exchange Notebooks, models, and other important elements with their team. This collaborative environment allows for validation and seamless deployment of these resources to the Production environment.

What is a Data Science Project?

A Data Science Project created inside the Data Science Lab is like a Workspace inside which the user can create and store multiple data science experiments and their associated artifacts.

Please Note: The user can create a new Notebook for coding or upload an existing Notebook only after Activating a Data Science Project.

What is a Feature Store?

A Feature Store is a centralized repository for storing, managing, and serving features used in machine learning models. It plays a crucial role in the machine learning lifecycle by providing a consistent and efficient way to manage features. It is a scalable solution for organizing and cataloging features, making them easily accessible to data scientists and ML engineers across an organization.

Feature Stores facilitate collaboration, version control, and reusability of features, streamlining the ML development process and improving model quality and efficiency.

What is a Workspace?

A Workspace in a Data Science module provides a cohesive and integrated environment that supports the end-to-end data science workflow, from data ingestion and processing to analysis, model building, and deployment.

The Workspace is a placeholder to create and save various data science experiments inside the Data Science Lab module.
The Workspace is the default tab to open for each Data Science Lab project.

What is a Notebook/ Data Science Notebook?

A Data Science Notebook is an interactive and collaborative digital platform used by data scientists and analysts for data exploration, analysis, modeling, and visualization. It combines executable code, visualizations, and explanatory text in a flexible and shareable format, making it a versatile tool for data science projects. Key features include code execution, rich text and visualizations, interactive data exploration, collaboration and sharing, reproducibility and documentation, and integration with data science libraries and tools.

In the current Data Science Lab module a .ipynb file that is created or imported inside a project works like a Data Science Notebook for the users. The Workspace tab of a Data Science Project contains such Data Science Notebooks in a Repo folder.

What is Data Set in context to Data Science Lab module?

A dataset in data science is a structured collection of data used for analysis and modeling. It represents a specific domain or problem and can include various data types. Datasets are essential for tasks like data analysis, modeling, and extracting insights in both supervised and unsupervised learning. They can be sourced from different domains and collected from surveys, experiments, or existing databases. Datasets contain features and labels for supervised learning, while they are unlabeled for unsupervised learning. They are typically split into training and test sets. Publicly available datasets are widely used for research and benchmarking. Datasets form the foundation for various data science tasks and enable solving complex problems.

The Data Set tab provided under the Data Science Lab module supports the following types of Data sets:

Dataset - Here, Dataset stands for a table or filtered data from database.
Data Sandbox - Data Sandbox are files that are uploaded or appended to Data Sandbox folder from local directory (excel, csv, text etc.).

What is a Data Science Model?

A data science model refers to a mathematical or computational representation of a real-world phenomenon or problem that data scientists use to make predictions, gain insights, or automate decision-making processes. It is a key component of the data science workflow and is built using data, statistical techniques, and algorithms.

The Model tab under a Data Science Project includes:

Imported Models: Models trained using external tools and libraries, which are brought into the data science workflow for analysis or prediction tasks.
Models created in Data Science Lab Notebook: Models built and trained within the Data Science Lab Notebook environment, utilizing its features and capabilities.
AutoML Models: Models generated through automated machine learning (AutoML) techniques, which automatically search and select the best model based on the given data and desired outcome.

What is Utility script?

The Utility tab allows to create and list the python scripts (.py files) that can be imported to your notebook.

What is AutoML?

AutoML (Automated Machine Learning) refers to the automated process of building and optimizing machine learning models without extensive manual intervention. It leverages intelligent algorithms and techniques to automate tasks such as data preprocessing, feature selection, model selection, hyperparameter tuning, and model evaluation. AutoML aims to simplify and accelerate the model development process, enabling users with limited machine learning expertise to create effective models efficiently.

The Auto ML tab allows the users to create data science experiments and lists them.

Accessing the Data Science Lab Module

This page displays the steps to access the DS Lab module under the platform.

Navigate to the Platform Homepage.
Click the Apps menu icon on the Platform homepage.
Click the DS Lab module.

The user gets redirected to the Homepage of the Data Science Lab module.

Please Note: To access the DS Lab module available inside the Apps menu, the logged-in user must have the App Permission to access it from the security level settings.

Homepage

The homepage is a centralized hub where users can access, interact with, and manage the various features, functionalities, and resources provided by the Data Science Lab module.

The users can access the various sections of the Data Science Lab module using the menu on the left side of the homepage.

The following options are provided on the left side menu of the Homepage:

Icon

Name

Action

List Feature Stores

This page focuses on the Feature Store List Actions.

Editing a Feature Store

Check out the illustration to understand the steps to edit a feature store.

Navigate to the Feature Stores page.
Select a Feature Store from the list.
Click the Edit icon for the selected Feature Store.

The Edit Feature Store form opens.
Modify the required information.
Click the Validate option for the Feature Store.

A notification message ensures that the action updating table is executed.
The data preview is displayed below.
Click the Update option after getting a notification message for successful validation.

Another notification message appears to ensure that the updated Feature Store is saved.
Use the Refresh icon provided on the Feature Stores list.

The status of the updated Feature Store will be listed in the Feature Stores list.

Click the Refresh icon again till the Feature Store status turns Completed.
The Version column will display the version number, indicating that the Feature Store has been updated.

Deleting a Feature Store

Check out the illustration to understand the steps to delete a feature store.

Navigate to the Feature Stores List page.
Select a Feature Store from the list. Select a Feature Store with more than one version with Status marked as Completed.

It will display all the available versions of the selected Feature Store.
Click the Delete icon for a version of the selected Feature Store you wish to delete.

The Delete confirmation dialog box appears.
Click the Yes option.

A notification message appears to inform the user about the deletion.

The selected version of the Feature Store will be removed, but another version will be listed in the Feature Stores List.

Please Note: The Feature Store with only one version, gets removed from the Feature List.

The deleted Feature Store version can be accessed from the Trash page. The user can restore it or delete it permanently from this page.

Create

This section displays steps on how to create a Project or Feature Store.

Create Project

This page explains project creation steps for a Data Science Lab Project.

What is a Project?

A Data Science Project created inside the Data Science Lab is like a Workspace inside which the user can create and store multiple data science experiments and their associated artifacts.

Creating a new Project

Check out the given illustration on how to create a DSL Project.

Pre-requisite: The users must have the following Admin-level settings configured to access and use the Repo Syncs Project functionality inside the DS Lab module.

Configuring the DS Lab Settings option is mandatory before beginning with the Data Science Project creation.
Also, select the Algorithms by using the Algorithms field from the DS Lab Settings section you wish to use for your DS Lab project.
The user must have the following Version Control settings done.
- The token key has to be configured for the DS Lab module.
- The repository and branch have to be specified to save the settings.
The user must complete the following Custom Field Settings:
- Token key – bdbvcstoken
- User id key - bdbvcsuserid
The user must do the following User-level configuration to create a Repo Sync DS Lab project.
- Git Token
- Git Username

Steps to create a new DSL Project

Navigate to the Home page of the Data Science Lab module.
Click the Create icon from the homepage.

The Create Project or Feature Store drawer opens.
Click the Create option provided for the Project.

The Create Project opens to provide the related information for a new Project.
Provide the following details for a new project:
- Project Name: Give a name to the new project.
- Project Description: Describe the project.
- Select Algorithms: Select algorithms using the drop-down menu.
- Environment: Allows users to select the environment they want to work in. Currently, supported environments are Python TensorFlow, Python PyTorch, and PySpak.
  - Users who select the TensorFlow environment do not need to install packages like the TensorFlow and Keras explicitly in the notebook. These packages can be imported inside the notebook.
  - Users who select the PyTorch environment do not need to install packages like Torch and Torchvision in the notebook. These packages can be imported inside the notebook.
- Resource Allocation: This allows the users to allocate CPU/ GPU and memory to be used by the Notebook container inside a given project. The currently supported Resource Allocation options are Low, Medium, and High.
- Idle Shutdown: It allows the users to specify the idle time limit after which the notebook session will get disconnected, and the project will be deactivated. To use the notebook again, the project should be activated. The supported Idle Shutdown options are 30m, 1h, and 2h.
- External Libraries: Mention the names of external libraries (if a specific version is required then mention the library name with the version number) that must be installed in your DSL project /notebook. The names of the external libraries should be separated only by commas (without space) for this field. This is an optional field.
After you fill in the mandatory fields the following modifiable fields appear with pre-selected values:
- Image Name
- Image Version
- Limit
- Memory
- Request (CPU)
- Memory
- Git Project: Select a project from the drop-down menu.
- Git Branch: Select a branch option from the drop-down menu (The supported branches are main, migration, and version).
GPU Type: Select GPU type from the drop-down menu (Currently we support Nvidia as the GPU Type).
- GPU Limit: Set the GPU limit using this field (This field appears only after the GPU Type option is selected).
Nodepool: Use this field to select a node pool option for the efficient execution of your data science project.
Sync git repo at project creation: Put a checkmark in the given checkbox to avail of sync git repo while creating a DS Lab project.

Please Note:

You can enable the Sync git repo at the project creation option to make your DSL Project a Git Repo Sync Project. The Repo Sync Projects will be displayed in the Project list with a branch icon in their title.
You can configure the Git access for a normal Data Science Lab project by configuring the Git Repository and Git Branch fields while creating a new project. Such projects will display the branch icon without the drop-down option while opening that project. For example,

Click the Save option.

The confirmation message appears.
The newly created project gets saved, and it appears on the screen.

Container Status Message

A DSL Project displays various status of the container on the top right side of the header panel.

The user gets all the updates regarding container status through color coded message display for a specific DSL Project. After creating a new project and opening it the user gets to see various status messages on the top right side of the page.

Steps to see the container message:

Open an active Data Science Project.
The user gets redirected to create or import Notebook.
The container status message gets displayed on the top right side of this screen.
The following status messages get displayed till the container gets created and comes into the running status.

Please Note: A container status message appears when container is not available. An error message also appears to inform user that the Project container is not up and running.

Container status message when container is getting created, and it is initializing.

Container status message when container is running.

Please Note: The user can click on the branch icon to get the latest branch related configuration.

Registered Models and APIs

This page displays all the registered Models and APIs in a list format.

The Registered Models and APIs icon provided in the left-side menu on the homepage of the Data Science Lab module redirects the user to this page that lists all the registered models and allows them to register the available registered model as an API.

Accessing Registered Models & APIs Page

Navigate to the Data Science Lab homepage.
Click the Registered Models & APIs icon from the left-side panel.

The user will be redirected to the Registered Models & APIs page.
There will be two tabs Models and APIs under the Registered Models & APIs page.

Unregistering Models

The Registered Models tab lists all the registered models with an option to Unregister them.

Check out the given illustration on unregistering a model as an API

Navigate to the Registered Models & APIs page.
The Models tab opens by default.
Select a registered model from the displayed list.
Click the Unregister icon for the selected model.
The Unregister dialog box appears.
Click the Yes option.

A notification message appears the model gets unregistered and removed from this list.

Please Note:

The user can register a model from the Model tab. Refer to the Register a Model page.
The user can also register a user while creating a model using the DS Notebook.

Registering a Model as an API

The Models tab also provides an icon to register a selected Model as an API.

Check out the given illustration on registering a model as an API.

Navigate to the Registered Models & APIs page.
The Models tab opens by default.
Select a registered model from the displayed list.
Click the Register as API icon for the selected model.

The Update Model page opens.
Provide a Max instance for it.
Click the Save and Register option.
A notification message appears and the selected model gets registered as API.
Navigate to the APIs tab.
The recently registered model as API will be added to this list.

Unregistering a Registered Model as API

The APIs tab lists all the models registered as APIs. The user can unregister a registered model as an API using that tab.

Check out the illustration to unregister a registered model as an API.

Navigate to the Registered Models & APIs page.
Open the APIs tab.
Select a registered model as an API from the displayed list.
Click the Unregister as API icon for the selected model.

The Unregister as API dialog box opens with the selected model name.
Click the Yes option.

A notification message appears to ensure that the model is unregistered.

Navigate back to the Models tab.
The unregistered model will be listed under the Models page.

Please Note: Refer to the Register a Model as an API Service section to understand the steps required for registering an API client and passing the model values in the postman.

Settings

This page helps the user to access and modify the default settings for the DSL Project.

Check out the given illustration on how to access and save modifications for the Project default settings.

Navigate to the Home page of Data Science Lab module.
The Settings icon appears in the left side menu panel. Click the Settings option.

Click the Default Settings page opens displaying the default settings.
The user can modify the following details:
- Algorithms: The user can select or deselect algorithms from the given drop-down menu. The provided choices are Regression, Classification, Forecasting, Unsupervised, Natural Language Processing.
- Environment: The user can select an Environment option from the given choices. The provided choices are Python TensorFlow, Python PyTorch, PySpark.
- Resource Allocation: The user can select a Resource Allocation option from the given choices. The provided choices are low, medium, and high.
- Idle Shutdown: The user can select a time limit option for idle shutdown. The provided time limit options are 30m, 1h, and 2h.
Click the Save option.

A notification message appears and the modified default settings will be saved.

Trash

The Trash page lists all the deleted Projects and Feature Stores.

The Trash page will display the deleted Projects and Feature Stores accessible for the logged in user. The user gets options to Restore them or Delete them permanently from this page.

Restoring a Project

Check out the given workflow to restore a project.

Navigate to the Data Science Lab Homepage.
Click the Trash icon provided in the left-side menu panel.

The Trash page opens displaying two tabs:
- Deleted Projects
- Deleted Feature Stores

Select a Project from the displayed list of the Deleted Projects.
Click the Restore icon.

A dialog message appears to confirm the selected action.
Click Yes to confirm the action.

A notification message appears.

The concerned project gets restored to the Projects list.

Deleting a Project Permanently

Check out the given workflow to delete a project permanently.

Navigate to the Data Science Lab Homepage.
Click the Trash icon provided in the left-side menu panel.

The Trash page opens displaying two tabs:
- Deleted Projects
- Deleted Feature Stores

Select a Project from the displayed list.
Click the Delete icon.

A dialog message appears to confirm the selected action.
Click Yes to confirm the action.

A notification message appears, and the selected Project gets removed permanently from the Data Science Lab module.

Tabs for a DSL Project

A DSL project utilizes tabs to structure a data science experiment, enabling the outcome to be readily consumed for further data analytics.

How to access the Tabs?

The users can click on the View icon available for a DSL Project, it redirects to a page displaying the various tabs for the selected DSL Project.

Navigate to the Projects page.
Select a DSL project from the list.
Click the View icon.

The next page appears with the accessible tabs for the selected Project.

If you select a PySpark project, the following tabs will be available:

Various Tabs of a DSL Project

The following table provides an outlook of the various tabs provided to a DSL Project:

Name of the Tab

Functions covered by the tab

Please Note: The allocation of tabs to a DSL project is environment-based.

If the user selects the PySpark environment, the available tabs to the user will be and . The user will not have access to the Models and AutoML tabs.
The DSL Projects created based on Python TensorFlow and Python PyTorch environments will contain all four tabs.

Workspace

The Workspace is a placeholder to create and save various data science experiments inside the Data Science Lab modules.

The Workspace is the default tab to open for each Data Science Lab project. Based on the Project types the options to begin working with Workspace may differ.

The Repo Sync Projects offer File and Folder options on the default page of the Workspace tab.
The normal Data Science Projects will have Create and Import options under the Workspace landing page.

Accessing the Workspace Tab for a Repo Sync Projects

Navigate to the Projects page.
Select an activated Repo Sync Project from the displayed list.
Click the View icon to open the project.
The Repo Sync project opens displaying the Workspace tab.
- A Repo folder gets added to the selected Repo Sync project based on the selected Git repository account (at the user-level settings) under the Notebook tab with Refresh and Git Console icons.

Icons

Name of the Icons

Actions

Please Note:

The Repo Sync Project opens with a branch configured at the project level.
A Repo Sync Project contains other than .ipynb files under the Workspace tab.

Accessing the Workspace Tab for other Data Science Projects

Navigate to the Projects page.
Select an activated Project from the displayed list.
Click the View icon to open the project.

The Project opens displaying the Workspace tab.
- The Repo, Utils, and Files default folders appear under the Workspace tab.

Please Note: If the selected project is a Repo Sync Project, it will only contain a Repo folder under the Workspace tab. Here, the Repo folder will support all file types. Three folders (Repo, Utils, and Files) will be available under the Workspace tab for a normal Data Science Lab project.

A Refresh icon is provided to refresh the data.
The users get two options to start with their data science exploration:
1. Create - By Creating a new Notebook 
2. Import -By Importing a Notebook

Libraries

The Libraries icon on the Workspace displays all the installed libraries with version and status.

Navigate to the Workspace tab.
Click the Libraries icon.

The Libraries window opens displaying Versions and Status for all the installed libraries.

Click the Failed status to expand the details of a failed library installation.

Workspace Folders

The Workspace tab contains default folders named Repo, Utils, and Files. All the created and saved folders and files will be listed under either of these folders.

Accessing Workspace Default Folders

Navigate to the Workspace tab (it is a default tab to open for a Project).
The left side panel displays the default Folders.
- These folders will save all the created or imported folders/ files by the user.
The Workspace tab also contains a Search bar to search the available Assets.

Please Note: The Workspace will be blank for the user, in case of a new Project until the first Notebook is created. It will contain the default folders named Repo, Utils, and Files.

Collapsing the Left side Panel

Navigate to the Workspace Assets.
Click the Collapse icon.

The Workspace left-side panel will be collapsed displaying all the created or imported files and folders as icons.

Expanding the Left side Panel

Navigate to the Workspace tab with the collapsed left-side panel.
Click the Expand icon.

The Workspace's left-side panel will be expanded. In the expanded mode of the left-side panel, the default folders of the Workspace tab will be visible in the default view.

Please Note:

The Workspace left side menu appears in the expanded mode by default while opening the Workspace tab.
The Workspace List displays the saved/ created folders and files in the collapsed mode (if any folder or file is created inside that Workspace).
The normal Data Science Project where Git Repository and Git Branch are selected while creating the project, displays the selected branch on the header.
A Repo Sync Project can display the selected branch on the Project header, and the user will be allowed to change the branch using the drop-down menu.

Repo Folder Attributes

The Repo folder is a default folder created under the Workspace tab. It opens by default while accessing the Workspace tab.

The user can perform some attributive actions on the Repo folder using the ellipsis icon provided next to it. This page explains all the attributes given to the Repo folder. This folder contains only .ipynb files in it. The actions provided for a .ipynb file (Notebook) are mentioned under the Notebook Actions page.

Create

This option redirects the user to the Create Notebook page to create a new Notebook.

Navigate to the Workspace tab.
Select the Repo folder.
Click the Elipsis icon.

A Context Menu appears. Select the Create option from the Context Menu.

The Create Notebook drawer opens.

Please Note: Refer to the Create page to learn the steps to create a new Notebook.

Add Folder

This option allows the user to create folders under the Repo folder.

Navigate to the Workspace tab.
Select the Repo folder.
Click the Elipsis icon.

A Context Menu appears. Select the Add Folder option from the Context Menu.

The Add folder dialog box opens.
Provide a name to the folder.
Click the Yes option.

A notification appears to ensure the folder creation.

The newly added folder is listed under the Repo folder. Expand the Repo folder to see the newly added folder.

Import

The Import option allows users to import a .ipynb file to the selected Data Science Lab project from their system.

Navigate to the Workspace tab.
Select the Repo folder.
Click the Elipsis icon.

A Context Menu appears. Select the Import option from the Context Menu.

The Import Notebook page opens.

Please Note:

Refer to the Import Notebook page to learn how to import a Notebook.
Created or Imported Notebooks will get some attributed Actions. The Notebook Actions are described under this documentation's Data Science Notebook section.

Utils Folder Attributes

This section explains the attributive action provided for the Utils folder.

Accessing Utilis Folder

The Utilis folder allows the users to import the utility files from their systems and Git repository.

Please Note: The Utils folder will be added by default to only normal Data Science Lab projects.

Navigate to the Workspace tab.
Select the Utils folder.
Click the ellipsis icon to open the context menu.

Click the Import option that appears in the context menu.

The Import Utility File window opens.
The user can import a utility file using either of the options: Import Utility or Pull from Git.

Importing a Utility File

Check out the walk-through video to understand the Import Utility functionality.

Navigate to the Import Utility File window.
Select the Import Utility option by using the checkbox.
Describe the Utility script using the Utility Description space.
Click the Choose File option to import a utility file.

Search and upload a utility file from the system.

The uploaded utility file title appears next to the Choose File option.
Click the Save option.

The imported utility file will display completed 100% when imported completely.

A notification also ensures that the file has been imported.

Open the Utils folder provided under the Workspace tab.
The imported utility file appears under the Utils folder.

Files Attributes

This section helps the user to understand the attributes provided to the file folder created inside a normal Data Science Lab project.

Accessing the File Folder Attributes

Check out the illustration to access the attributes for a File folder.

Navigate to the Workspace tab of a normal Data Science project.
Select the File folder that is created by default.
Click the Ellipsis icon for the File folder.

The credited attributive will be listed in the context menu.

File Folder Attributives

Add File

Check out the illustration on adding a file to the File folder of a normal Data Science Project.

Add Folder

Check out the illustration on adding a folder to the File folder of a normal Data Science Project.

Copy path

Check out the illustration on using the Copy path functionality inside the File folder of a normal Data Science Project.

Import

Check out the illustration on importing a file to the File folder of a normal Data Science Project.

Working with the Workspace tab

This section explains way to begin work with the Workspace tab. The Create and Import options are provided for Repo folders.

Create

The Create option redirects the user to create a new Notebook under the selected Project.

Check out the illustration on creating a new Notebook inside a DSL Project.

Please Note: The Create option appears for the Repo folder that opens by default under the Workspace tab.

Creating a New Notebook

Navigate to the Workspace tab for a Data Science Lab project.
Click the Create option from the Notebook tab.

Please Note: The Create option gets enabled only if the Project status is Active as mentioned in the above-given image.

The Create Notebook page opens.
Provide the following information to create a new Notebook:
- Notebook Name
- Description
Click the Save option.

The Notebook gets created with the given name and the Notebook page opens. The Notebook may take a few seconds to save and start the Kernel.
The user will get notifications to ensure the new Notebook has been saved and started.
The same gets notified on the Notebook header (as highlighted in the image).
The newly created Notebook is ready now for the user to commence Data Science experiments. The newly created Notebook is listed on the left side of the Notebook page.

Adding a New Notebook

Check out the illustration on adding a new Notebook.

The users also get an Add option to create a new Notebook. This option becomes available to the users only after at least one Notebook is created using the Create option and open it.

Open an existing Notebook from a Project.
The Add icon appears on the header next to the opened Notebook name. Click the Add icon.

The Create Notebook window opens.
Provide the Notebook Name and Description.
Click the Save option.

A new Notebook gets created and the user will be redirected to the interphase of the newly created Notebook.

Soon the notification messages assuring the user that the newly created Notebook has been saved and started appear on the screen.
The Notebook gets listed under the Notebook list provided on the left side of the screen.
A code cell gets added by default to the newly created Notebook for the user to begin the data science experiment.

Please Note:

The user can edit the Notebook name by using the Edit Notebook Name icon.
The accessible datasets, models, and artifacts will be listed under the Datasets, Models, and Artifacts menus.
The Find/Replace menu facilitates the user to find and replace a specific text in the notebook code.
The created Notebook (.ipynb file) gets added to the Repo folder. The Notebook Actions are provided to each created and saved Notebook. Refer to the page to get detailed information.

Import

This section describes steps to import a Notebook to a DSL project.

Adding File and Folders

These options are provided under the Workspace tab of a repo sync folder.

Adding a File

Check out the illustration on how to add a file inside a Repo Sync Project.

Navigate to the Workspace tab of an activated Repo Sync Project.
Click the Add File option.

The Add file window opens.
Provide a File name.
Click the Yes option.

A notification message appears to ensure that the new file has been created.

The newly created file gets added to the Repo Sync Project.

Defining a File Type

The user can insert the file type while adding a file to define the file type.

Check out the illustration on defining a file type while adding a file to the Repo Sync project.

Navigate to the Workspace tab for a repo sync project.
Click the Add File option.

The Add file window opens.
File name: Provide the file type extension while giving it a name.
Click the Yes option.

A notification message appears.

The new file gets added with the provided file extension.

Adding a New Folder

Check out the illustration on how to add a folder inside a Repo Sync Project.

Navigate to the Notebook tab of the Repo Sync Project.
Click the Add Folder option.

The Add folder window opens.
Provide a Folder name.
Click the Yes option.

A notification message appears to ensure that the new folder has been created.

The newly created folder gets added to the Repo folder.

Adjustable Repository Panel

Users can manually adjust the width of the repository panel to sight multiple files and sub-folders.

Users can manually adjust the width of the repository panel in the Workspace tab, allowing for better visibility and organization of multiple sub-folders and files within a project.

Check out the illustration to understand how users can adjust the repository panel inside a DS Project.

Data

This section focuses on how to add or upload datasets to your DSL Projects. The Dataset tab lists all the added Data to a Project.

The Add Data option provided under the Data tab redirects the users to add various types of data to a DSL Project. The users can also upload sandbox files or create feature stores using this functionality.

Please Note: Users can add Datasets by using the Data tab or Notebook page provided under the Workspace tab.

Open a Data Science Lab Project.
Click on the Data tab from the opened Project.
The Data tab opens displaying the Add Data option.

The Add Data page opens the uploaded and added Data Sources for the selected DSL Project.
The Add Data page offers the following Data source options to add as datasets:
1. Data Sets – These are the uploaded data sets from the Data Center module.
2. Data Sandbox – This option lists all the available/ uploaded Data Sandbox files.
3. Feature Store – This option lists all the available Feature Stores under the selected DSL Project.

Model

The Model tab includes various models created, saved, or imported using the Data Science Lab module. It broadly list Data Science Models, Imported Models, and Auto ML models.

Explainer Generator

This page explains how a model explainer can be generated through a job.

The user can generate an explainer dashboard for a specific model using this functionality.

Check out the illustration on Explainer as a Job.

Navigate to the Workspace tab.
Open a Data Science Notebook (.ipynb file) that contains a model.
Navigate to the code cell containing the model script.

Check out the Model name. You may modify it if needed.
Click the Models tab.

The Exit Page dialog box opens to save the notebook before redirecting the user to the Models tab.
Click the Yes option.

A notification message ensures that the concerned Notebook is saved. The user gets redirected to the Models tab.
Click the Refresh icon to refresh the displayed model list.

The model will be listed at the top of the list. Click the Explainer Creator icon.

A notification ensures that a job is triggered.
Click the Refresh icon.

The Explainer icon is enabled for the model. Click the Explainer icon.

The Explainer dashboard for the model opens.

Export to GIT/ Model Migration

This page explains Model migration functionality. You can find steps to Export and Import a model to and from Git repository explained on this page.

Prerequisite: The user must do the required configuration for the DS Lab Migration using the Admin module before migrating a DS Lab script or model.

Export a DSL Model to GIT

The user can use the Migrate Model icon to export the selected model to the GIT repository.

Check out the illustration on Export to Git functionality.

Navigate to the Models tab.
Select a model from the displayed list
Click the Model Migration icon for a Model.

The Export to GIT dialog box opens.
Provide a Commit Message in the given space.
Click the Yes option.

A notification message appears informing that the model is migrated.

Import a DSL Model from GIT

Check out the given walk-through to understand the import of a Migrated DSL Model. inside another user under a different space.

Choose a different user or another space for the same user to import the exported model. In this case, the selected space is different from the space from where the model is exported.

Select a different tenant to sign in to the Platform.

Choose a different space while signing into the platform.

Navigate to the Admin module.
Select the GIT Migration option from the admin menu panel.

Click the Import File option.

The Import Document page opens, click the Import option.

The Migration- Document Import page opens. By default, the New VCS as Version Control Configuration will be selected .
Select the DSLab option from the module drop-down menu.
Select the Models option from the left side panel.

Use Search space to search for a specific model name.
All the migrated Models get listed based on your search.

Select a Model from the displayed list to get the available versions of that Model.

Select a Version that you wish to import.
Click the Migrate option.

A notification message appears informing that the file has been migrated.

The migrated model gets imported inside the Models tab of the targeted user.

Please Note: While migrating the Model the concerned Data Science Project also gets migrated to the targeted user's account.

Share a Model

The share option for a model facilitates the user to share it with other users and user groups. It also helps the user to exclude the privileges of a previously shared model.

Check out the following video for guidance on the Share model functionality.

Navigate to the Models tab where your saved models are listed.
Find the Model you want to share and select it.
Click the Share icon for that model from the Actions column.
The Manage Access page opens for the selected model.
Select permissions using the Grant Permissions checkboxes.
Navigate to the Users or User tab to select user(s) or user group(s).
Use the search function to locate a specific user or user group you want to share the Model with.
Select a user or user group using the checkbox.
Click the Save option.

A notification message appears ensuring that it has been shared.
The selected user/ user group will be listed under the Granted Permissions section.

Accessing a Shared Model

Log in to the user account where the Model has been shared.
Navigate to the Projects page within the DS Lab module.
The Project where the source model was created will be listed.
Click the View icon to open the shared Project.
Open the Model tab for the project.
Locate the Shared Model, which will be marked as shared, in the Model list.

Shared Model with View Permission

When a Model is shared from User A to User B with View Permission, User B will have the following privileges:

View the shared model.

Shared Model with Edit Permission

When a Model is shared from User A to User B with Edit Permission, User B will have the following privileges:

View the model and trigger the explainer dashboard job.
View the model and generate the explainer dashboard.
View the model and migrate.

Shared Model with Execute Permission

When a Model is shared from User A to User B with Execute Permission, User B will have the following privileges:

View the model and Register the Model into the Data Pipeline.
View the model, Update and save the model information, and Register the model as API.
View the model and unregister the registered model as an API service.

Please Note: A targeted share user cannot re-share or delete a shared model regardless of the permission level (View/ Edit/Execute).

Excluding Users

Check out the illustration on using the Exclude Users functionality.

Navigate to the Models tab.
Select a model from the displayed list.
Click the Share icon.

The Manage Access window appears for the selected model.
Select permissions using the checkboxes from the Grant Permissions option.
Open the User Groups tab.
Select user group(s) using the checkbox(es).

Navigate to the Exclude Users tab.
Select users to be excluded using checkboxes.
Click the Save option.

A notification message appears.
The excluded users will be listed under the Excluded Users section.

Including an Excluded User

Check out the illustration for including an excluded user to access a shared model.

Navigate to the Manage Access window for a shared model.
The Excluded Users section will list the excluded users from accessing that model.
Select a user from the list.
Click the Include User icon.

The Include User dialog box opens.
Click the Yes option.

A notification message appears ensuring that the selected user is included.
The user gets removed from the Excluded Users section.

Revoking Privileges

Check out the illustration on revoking privileges for a user.

Navigate to the Manage Access window for a shared model.
The Granted Permissions section will list the shared user(s)/ user group(s).
Select a user/ user group from the list.
Click the Revoke icon.

The Revoke Privileges dialog box opens.
Click the Yes option.

A notification message ensures that shared model privileges are revoked for the selected user/user group. The user/ user group will be removed from the Granted Permissions section.

Please Note: The same set of steps can be followed to revoke privileges for a user group.

Register a Model

To register a model implies pushing the model into the Pipeline environment where it can be used for inferencing when Production data is read.

Please Note: The currently supported model types are: Sklearn (ML & CV), Keras (ML & CV), and PyTorch (ML).

Check out the walk-through to Register a Data Science model to the Data Pipeline (from the Model tab).

The user can export a saved DSL model to the Data Pipeline module from the Models tab.

Navigate to the Models tab.
Select a model (unregistered model) from the list.
Click the Register icon for the model.

The Register dialog box appears to confirm the action.
Click the Yes option.

A notification message appears to inform the same.

Please Note: The registered model gets published to the Data Pipeline (it is moved to the Registered list of the models).

The model gets listed under the Registered model list.

Please Note:

The Register option is also available under the Models section inside a Data Science Notebook.
The Registered Models can be accessed within the DS Lab Model Runner component of the Data Pipeline module.

Unregister a Model

To unregister a model means to remove it from the Data Pipeline environment.

Check out the illustration on unregistering a model functionality using the Models tab.

A user can unregister a registered model by using the Models tab.

Navigate to the Models tab.
Select a registered model (use the Registered filter option to access a model).
Click the Unregister icon for the same model.

The Unregister dialog box appears to confirm the action.
Click the Yes option.

A notification message appears to inform the same.

The unregistered model appears under the Unregistered filter of the Models tab.

Please Note:

The Unregister function when applied to a registered model, gets removed from the Data Pipeline module. It also disappears from the Registered list of the models and gets listed under the Unregistered list of models.

Delete a Model

This section focuses on how to delete a model using the Models tab.

Users can delete any unregistered model using the delete icon from the Actions panel of the Model list.

Check out the illustration on deleting a model.

Navigate to the Models tab.
Select an unregistered model filter option.
Select a model from the displayed list.
Click the Delete icon.
A confirmation message appears.
Click the Yes option.

A notification message appears.
The selected model gets deleted.

Please Note: The Delete icon appears only for the unregistered models. The registered models will not get the Delete icon.

AutoML

The Auto ML tab allows the users to create various experiments on top of their datasets and list all the created experiments.

Automated Machine Learning (AutoML) is a process that involves automating the selection of machine learning models and hyperparameters tuning. It aims to reduce the time and resources required to develop and train accurate models by automating some of the time-consuming and complex tasks.

The Auto ML feature provided under the Data Science Lab is capable of covering all the steps, from starting with a raw data set to creating a ready-to-go machine learning model.

An Auto ML experiment is the application of machine learning algorithms to a dataset.

Please Note:

AutoML functionality is a tool to help speed up the process of developing and training machine learning models. It’s always important to carefully evaluate the performance of a model generated by the AutoML tool.
The Create Experiment option is provided on the Data List page.

Creating AutoML Experiment

A Data Scientist can create various Experiments based on specified algorithms.

There can be different types of Experiments based on the algorithm type specified. In the DS Lab module, we currently support Classification, Regression, and Forecasting.

A Classification experiment can be created for discrete data when the user wants to predict one of the several categories.
A Regression experiment can be created for continuous numeric values.
A Forecasting experiment can be created to predict future values based on historical data.

Please Note:

AutoML experiments are running as Jobs and a new Job will be allocated for each experiment created in the AutoML tab.
Jobs will spin up once the Experiment is created and after models are trained and ready, it will get killed automatically.

Creating a AutoML Experiment

Creating an Experiment is a two-step process that involves configuration and selection of the algorithm type as steps.

A user can create a supervised learning (data science) experiment by choosing the Create Experiment option.

Please Note: The Create Experiment icon is provided on the Dataset List page under the Dataset tab of a Repo Sync Data Science Project.

Navigate to the Data List page.
Select a Dataset from the list.
Click the Create Experiment icon.

The Configure tab opens (by default) while selecting the Create Experiment option.
Provide the following information:
- Provide a name for the experiment.
- Provide Description (optional).
- Select a Target Column.
- Select a Data Preparation from the drop-down menu.
  - Use the checkbox to select a Data Preparation from the displayed drop-down.
- Select columns that need to be excluded from the experiment.
  - Use the checkbox to select a field to be excluded from the experiment.
  Please Note: The selected fields will not be considered while training the Auto ML model experiment.
Click the Next option.

The user gets redirected to the Select Experiment Type tab.
Select a prediction model using the checkbox.
Based on the selected experiment type a validation notification message appears.
Click the Done option.

A notification message appears.
The user is redirected to the AutoML list page.
The newly created experiment gets added to the list with Status mentioned as Started.

Various Status of a Created Experiment

The Status tab indicates various phases of the experiments/model training. The different phases for an experiment are as given below:

The newly created experiment gets Started status. It is the first status when a new experiment is created.

Another notification message appears to inform the user that the model training has started. The same is indicated through the Status column of the model. The Status for such models will be Running.

After the experiment is completed, a notification message appears stating that the model trained. The Status for a trained model will be indicated as Completed.

Please Note: The unsuccessful experiments are indicated as Failed under the status. The View Report is mentioned in red color for the Failed experiments.

View Explanation

The View Explanation option will redirect the user to the below given options. Let us see all of them one by one explained as separate topics.

Model Interpretation

The user is taken to a dashboard upon clicking Model Explainer to gather insights and explanations about predictions made by the selected AutoML model.

Model interpretation techniques like SHAP values, permutation importance, and partial dependence plots are essential for understanding how a model arrives at its predictions. They shed light on which features are most influential and how they contribute to each prediction, offering transparency and insights into model behavior. These methods also help detect biases and errors, making machine learning models more trustworthy and interpretable to stakeholders. By leveraging model explainers, organizations can ensure that their AI systems are accountable and aligned with their goals and values.

Please Note: The user can access the Model Explainer Dashboard under the Model Interpretation page only.

Classification Model Explainer

This page provides model explainer dashboards for Classification Models.

Check out the given walk-through to understand the Model Explainer dashboard for the Classification models.

Feature Importance

This table shows the contribution each feature has had on prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each prediction has been built up from all the individual ingredients in the model.

Classification Stats

This tab provides various stats regarding the Classification model.

It includes the following information:

Global cutoff

Select a model cutoff such that all predicted probabilities higher than the cutoff will be labeled positive and all predicted probabilities lower than the cutoff will be labeled negative. The user can also set the cutoff as a percentile of all observations. By setting the cutoff it will automatically set the cutoff in the multiple other connected components.

Model Performance Metrics

It displays a list of various performance metrics.

Confusion Matrix

The Confusion matrix/ shows the number of true negatives (predicted negative, observed negative), true positives (predicted positive, observed positive), false negatives (predicted negative but observed positive), and false positives (predicted positive but observed negative). The number of false negatives and false positives determine the costs of deploying an imperfect model. For different cut-offs, the user will get a different number of false positives and false negatives. This plot can help you select the optimal cutoff.

Precision Plot

The user can see the relation between the predicted probability that a record belongs to the positive class and the percentage of observed records in the positive class on this plot. The observations get binned together in groups of roughly equal predicted probabilities and the percentage of positives is calculated for each bin. a perfectly calibrated model would show a straight line from the bottom left corner to the top right corner. a strong model would classify most observations correctly and close to 0% or 100% probability.

Classification Plot

This plot displays the fraction of each class above and below the cut-off.

ROC AUC Plot

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at different classification thresholds.

The true positive rate is the proportion of actual positive samples that are correctly identified as positive by the model, i.e., TP / (TP + FN). The false positive rate is the proportion of actual negative samples that are incorrectly identified as positive by the model, i.e., FP / (FP + TN).

PR AUC Plot

It shows the trade-off between Precision and Recall in one plot.

Lift Curve

The Lift Curve chart shows you the percentage of positive classes when you only select observations with a score above the cut-off vs selecting observations randomly. This displays to the user how much it is better than the random (the lift).

Cumulative Precision

This plot shows the percentage of each label that you can expect when you only sample the top x% with the highest scores.

Individual Predictions

Select Index

The user can select a record directly by choosing it from the dropdown or hit the Random Index option to randomly select a record that fits the constraints. For example, the user can select a record where the observed target value is negative but the predicted probability of the target being positive is very high. This allows the user to sample only false positives or only false negatives.

Prediction

It displays the predicted probability for each target label.

Contributions Plot

This plot shows the contribution that each feature has provided to the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This helps to explain exactly how each prediction has been built up from all the individual ingredients in the model.

Partial Dependence Plot

The PDP plot shows how the model prediction would change if you change one particular feature. the plot shows you a sample of observations and how these observations would change with this feature (gridlines). The average effect is shown in grey. The effect of changing the feature for a single record is shown in blue. The user can adjust how many observations to sample for the average, how many gridlines to show, and how many points along the x-axis to calculate model predictions for (grid points).

Contributions Table

This table shows the contribution each individual feature has had on the prediction for a specific observation. The contributions (starting from the population average) add up to the final prediction. This allows you to explain exactly how each individual prediction has been built up from all the individual ingredients in the model.

What If Analysis

The What If Analysis is often used to help stakeholders understand the potential consequences of different scenarios or decisions. This tab displays how the outcome would change when the values of the selected variables get changed. This allows stakeholders to see how sensitive the outcome is to different inputs and can help them identify which variables are most important to focus on.

What-if analysis charts can be used in a variety of contexts, from financial modeling to marketing analysis to supply chain optimization. They are particularly useful when dealing with complex systems where it is difficult to predict the exact impact of different variables. By exploring a range of scenarios, analysts can gain a better understanding of the potential outcomes and make more informed decisions.

Select Index & Prediction

Feature Input

The user can adjust the input values to see predictions for what-if scenarios.

Contribution & Partial Dependence Plots

In a What-if analysis chart, analysts typically start by specifying a baseline scenario, which represents the current state of affairs. They then identify one or more variables that are likely to have a significant impact on the outcome of interest, and specify a range of possible values for each of these variables.

Contributions Table

Feature Dependence

Shap Summary

The Shap Summary summarizes the Shap values per feature. The user can either select an aggregate display that shows the mean absolute Shap value per feature or get a more detailed look at the spread of Shap values per feature and how they co-relate the feature value (red is high).

Shap Dependence

This plot displays the relation between feature values and Shap values. This allows you to investigate the general relationship between feature value and impact on the prediction. The users can check whether the model uses features in line with their intuitions, or use the plots to learn about the relationships that the model has learned between the input features and the predicted outcome.

Regression Model Explainer

This page provides model explainer dashboards for Regression Models.

Check out the given walk-through to understand the Model Explainer dashboard for the Regression models.

Feature Importance

Regression Stats

Model Summary

The user can find a number of regression performance metrics in this table that describe how well the model can predict the target column.

Predicted Vs Actual Plots

This plot shows the observed value of the target column and the predicted value of the target column. A perfect model would have all the points on the diagonal (predicted matches observed). The further away points are from the diagonal the worse the model is in predicting the target column.

Residuals & Plot Vs Features

Residuals: The residuals are the difference between the observed target column value and the predicted target column value. in this plot, one can check if the residuals are higher or lower for higher /lower actual /predicted outcomes. So, one can check if the model works better or worse for different target value levels.

Plot vs Features: This plot displays either residuals (difference between observed target value and predicted target value) plotted against the values of different features or the observed or predicted target value. This allows one to inspect whether the model is more inappropriate for a particular range of feature values than others.

Individual Predictions

Select Index

Prediction

It displays the predicted probability for each target label.

Contributions Plot

Partial Dependence Plot

Contributions Table

What If Analysis

Select Index

Prediction

It displays the predicted probability for each target label.

Feature Input

The user can adjust the input values to see predictions for what-if scenarios.

Contribution & Partial Dependence Plots

Contributions Table

Feature Dependence

Shap Summary

The Shap Summary summarizes the Shap values per feature. The user can either select an aggregate display that shows the mean absolute Shap value per feature or get a more detailed look at the spread of Shap values per feature and how they co-relate the feature value (red is high).

Shap Dependence

Please Note: Refer the page to get an overview of the Data Science Lab module in nutshell.

Dataset Explainer

The Dataset Explainer tab provides a high-level preview of the dataset that has been used for the experiment. It redirects the user to the Data Profile page.

The Data Profile is displayed using various sections such as:

Data Set Info
Variable Types
Warnings
Variables
Correlations
Missing Values
Sample

Let us see each of them one by one.

Data Info

The Data Profile displayed under the Dataset Explainer section displays the following information for the Dataset.

Numbers of variables
Number of observations
Missing cells
Duplicate rows
Total size in memory
Average record size in memory

Variable Types

This section mentions variable types for the data set variables. The selected Data set contains the following variable types:

Numeric
Categorical
Boolean
Date
URL
Text (Unique)
Rejected
Unsupported

Warnings

This section informs user about the warnings for the selected dataset.

Variables

It lists all the variables from the selected Data Set with the following details:

Distinct count
Unique
Missing (in percentage)
Missing (in number)
Infinite (in percentage)
Infinite (in number)
Mean
Minimum
Maximum
Zeros (in percentage)

Correlation

It displays the variables in the correlation chart by using various popular methods.

Missing Values

This section provides information on the missing values through Count, Matrix, and Heatmap visualization.

Count: The count of missing values is explained through column chart.

Matrix

Heatmap

Sample

This section describes the first 10 and last 10 rows of the selected dataset as a sample.

First rows

Last Rows

Data Science Notebook

Explore the page where all the Data Science activities take place. The listed topics will be supported only for .ipynb files.

Preview File

A Data Science Notebook (.ipynb files) page can be opened, and code & markdown cells can be previewed without activating the respective project.

Check out the illustration to understand the preview file content inside a project.

Please Note: A Repo Sync project contains all the files under the Repo folder. A Normal project contains only Data Science Notebook(.ipynb) files under the Repo folder.

The user can preview the content saved under any file without activating the Project where it is saved.

Navigate to the Project List page.
Select a deactivated Repo Sync Project from the list.
Click on the View option to open the Project.

The Workspace tab opens under the selected Repo Sync Project.
Click on the Repo folder that is displayed under the Notebook tab.
A list of available folders and files appears under the Repo.

Click on a file.
The file content gets displayed.

Open a .ipynb file.
The content of the file is displayed.

Click the Add code or markdown cell.

The Activate Project window opens prompting the user to activate the selected Project.
Click the Yes option from the confirmation window to activate the project. The user can choose the No option if there is no need for the project activation.

Please Note: Only Data Science Notebooks (.ipynb files) have Code, Markdown, and BDB Assist cells. The Data Science Noteboks content can be edited/ modified after activating the concerned project. The content of the other files remains in the preview category only for the activated projects as well.

.ipynb File Cells

A Data Science Notebook or .ipynb file contains various types of cells inside it to create Data Science experiments.

These cells contain explanatory text (Markdown), executable code, and BDB Assist cells and their output.

Navigate to the Notebook tab for a repo sync project.
Open a .ipynb file from the left side menu.
The user can use the Add pre-cell icon to add a new code cell at the beginning of the .ipynb file.

You can add new cells by using the +Code, +Markdown, and +Assist options given at the bottom of the cell.

Using a Code Cell

Write & Run Code to create Data Science Scripts and models using the .ipynb file.

A user can write and execute code using the Data Science Notebook interface. This section covers the steps to write and run a sample code in the Code cell of the Data Science Notebook.

Check out the given walk-through on how to use a Code Cell under a .ipynb file.

Please Note: The above-given video displays inserting a new code cell using the Add Pre-cell icon for a code cell.

Running Code inside a Code Cell

Create a new .ipynb file.

A notification message appears to ensure the creation of the new .ipynb file.

Open the newly created .ipynb file.
Insert the first Code cell by using the Add pre-cell icon.

Write code inside the cell.
Click the Run cell icon to run the code.

Please Note: The Code cells also get code from the selected Notebook operations by using the right-side panel and selecting a specific option. E.g., The user can use the Data tab to get an added data set to the code cell.

The Run cell button is changed into the Interrupt cell icon while running the code.

Once the code has run successfully a checkmark appears below the Run cell icon.
The code result is displayed below it.

Another code cell gets added below (as shown in the following image).
Click the Save icon provided for the Notebook.

A notification message appears to indicate the completion of the action.
The Data Science Notebook's status gets changed as saved and the new updates get saved in it.

Various Options provided to a Code Cell

By clicking on an inserted Code cell, some code-related options are displayed as shown in the image:

Sl. No.

Icon

Name

Action

Move the cell up

Moves the cell upwards

Move the cell down

Moves the cell downwards

Delete Cell

Deletes the code cell.

More Actions

Opens four more actions that include:

Transform, Save Model, Predict, and Save artifact.

Please Note: The +Code, +Markdown, and +Assist options provided at the bottom of a cell insert a new cell after the given code/ Markdown cell.

The user should run the Notebook cells only after the Kernel is up and Running. If the user attempts to run a Notebook cell before the Kernel is started/ restarted, the following warning will be displayed.

Using a Markdown Cell

This page describes steps to use the text cells of the Data Science Notebook.

The Markdown cells are used to enter a description, links, images, headings, and text with Bold or Italics effect to a Data Science Notebook. They are formatted using a simple markup language called Markdown. The Markdown cell contains a toolbar to assist with editing.

Inserting a Markdown Cell

Navigate to a .ipynb file.
Use the Add pre-cell icon to insert a new code cell to the file.

Click the +Markdown option that appears below the code cell.
The Markdown cell appears below to insert Markdown into the Notebook.

Choose an action from the toolbar.
It gets added to the left side of the Markdown cell.
The right-side Markdown space displays the text with the applied effect.

The image displays a few actions from the toolbar (such as Bold, Italic, Heading, and link) applied to the Markdown text.
Click the Save option.

The Markdown cell with inserted effect gets saved and the Markdown display gets changed displaying the text with saved effects on the left side (as shown in the given image).

Please Note: A Code cell gets added below the saved Markdown cell.

The user can click the Save option provided for the Notebook to save the update in the Notebook (after the Markdown cell has been added to it).

The Notebook gets updated and the same gets communicated through a notification message.

Editing a Markdown Cell

Use the double clicks on a saved Markdown cell.

The Markdown cell opens in the editable format to edit it.

Modify the text inside the Markdown cell.
Click the Save option to update the edited Markdown in the Notebook.

Click the Save option for the file.
A notification message appears.

The file gets saved with the Markdown cell.

Deleting a Markdown Cell

Click the Delete markdown icon for a saved Markdown cell.

The Delete Cell dialog box opens.
Click the Yes option.

The selected Markdown gets removed and the same gets communicated by a notification message.

Uploading an Image in the Markdown

Navigate to a .ipynb file inside an activated Project.
Access a Markdown cell.
Click the Upload icon.

Upload an image.

The image gets uploaded to the markdown cell.
Click the Save icon.

The markdown cell gets saved the uploaded image appears in the View mode of the markdown.

Please Note: Do not forget to click the Save icon for the Data Science Notebook to save the markdown updates in the .ipynb file.

Expanding and Collapsing Markdown Cell

The user can expand and collapse the multiple Markdown cells based on their levels in a DS Notebook. The user can create a hierarchy of three levels using the Heading option in a Markdown cell.

Please Note:

The related code cells under one Markdown will fall into the same level as the Markdown.
The maximum three levels of hierarchy can be inserted for a Markdown cell using the Heading option.

Check out the following illustration on how to set the expand and collapse functionality in Markdown cells.

Navigate to a Notebook.
Access a Markdown cell.
To create a hierarchy within a Markdown cell, use the Heading button.
- Click once for the first level, twice for the second, and thrice for the third.
- Unassigned Markdown cells default to the nearest existing hierarchy.
- Remember to click Save to preserve changes.
The Markdown cell will get a collapse/expand icon added to it.

Check out the illustration to see the Markdown expand and collapse feature at work.

Using an Assist Cell

This section focuses on the BDB Assist functionality provided inside the Data Science Notebook infrastructure.

BDB Assist is designed to be a transparent and explainable AI assistant. Our notebook system guarantees that every AI recommendation transforms into transparent and replicable outcomes, enabling data teams to place unprecedented trust in AI.

Some of the key features of the BDB Assist are as listed below:

Generate Code Automatically: Starting from scratch is no longer a hurdle with BDB Assist code generation capability. Provide your prompts, questions, or instructions, and watch as an entire notebook— including code, SQL queries, and text — materializes before your eyes.
Explain the code: BDB Assist doesn't let complex pieces of code baffle you anymore with concise, easy-to-understand explanations.
Debug & Edit the code: BDB Assist helps you to revise or refactor your code, pinpoints the issue, and provides an immediate fix.

Steps to use an Assist cell:

Navigate to a Notebook.
Click on the Assist option.
The Assist cell gets inserted below.

Type a prompt in the Assist cell.
Click the Send icon.
The response based on your prompt is generated below.

Since the generated result in this case is a code, add a new code cell and copy the generated code in it.
Run the code cell.

The Bar plot gets generated below the code cell.

Resource Utilization Graph

This feature helps to identify the resource utilization of a Data Science Lab Project where the Notebook is saved and executed.

Please Note: The graph displays requests and limits of CPU and Memory. The values will be calculated and previewed in the UI after each cell execution.

The image displays the resource utilization graph when the utilized resources are within the set limit.

The resource utilization graph turns yellow if 60% of the given limit is utilized.

If 80% of the given limit is utilized the resource utilization graph turns red (as shown in the below-given image).

Please Note:

The user can open a maximum of four files in the Tab format.
If CPU and Memory usage exceeds the threshold, the Kernal and the Data Science Notebook will be restarted.

Taskbar

The Data Science Notebook task bar presents different options that may be used to manipulate the way the notebook functions.

A taskbar has been provided on the top left of the Data Science Notebook screen to perform various tasks quickly.

Click on each tab of the following Taskbar to read about the specific tasks of that Notebook taskbar.

Notebook Actions

The credited options provided to a Notebook are explained under this section.

Register

This page displays the steps to Export a DSL script and register it as Job.

Export

The Export icon provided for a Notebook redirects the user to export the Notebook as a script to the Data Pipeline module and GIT Repository.

Exporting a Data Science Script

A Notebook can be exported to the Data Pipeline module using this option.

Navigate to the Repo folder and select a Notebook from the Workspace tab.
Click the Ellipsis icon for the selected Notebook to open the context menu.
Click the Register option for the Notebook.

The Register window opens.
Select the Select All option or the required script using the checkbox(es).
Click the Next option.

Please Note: The user must write a function to use the Export to Pipeline functionality.

A notification appears stating that the selected script is valid.
Select Export as a Script option by selecting it via the checkbox.
Click the Libraries icon.

The Libraries drawer opens.
Select available libraries by using checkboxes.
Click the Close icon to close the Libraries drawer.

The user gets redirected to the Register page.
Click the Finish option.

A notification message appears to ensure that the selected script is exported.

Please Note: The exported script will be available for the Data Pipeline module to be consumed inside a DS Lab Runner component.

Accessing an Exported Script in the Data Pipeline

Navigate to a Data Pipeline containing the DS Lab Runner component.
Open the Meta Information tab of the DS Lab Runner component.
Select the required information as given below to access the exported script:
- Execution Type: Select the Script Runner option.
- Function Input Type: Select one option from the given options: Data Frame or List.
- Project Name: Select the Project name using the drop-down menu.
- Script Name: Select the script name using the drop-down menu.
- External Library: Mention the external library.
- Start Function: Select a function name using the drop-down menu.
The exported Script is displayed under the Script section.

Delete

This page explains steps to delete a Notebook.

Navigate to the Workspace tab.
Open the Repo folder.
Select a Notebook from the Repo folder.
Click on the ellipsis icon provided for the selected Notebook.

A Context menu appears. Click the Delete option from the Context menu.

The Delete Notebook dialog box appears for the deletion confirmation.
Click the Yes option.

A notification appears to ensure the successful removal of the selected Notebook. The concerned Notebook gets removed from the Repo folder.

Information

This option displays the last modified date for the selected notebook.

Navigate to the Workspace tab.
Open the Repo folder.
Select a notebook from the Repo folder and click the ellipsis icon for the selected notebook.
A Context Menu opens. Select the Information option from the Context Menu.
The last modified date for the selected notebook is displayed.

The Notebooks pulled from Git get 'Pulled from git' mentioned inside the Information Context menu.

Notebook Operations

This section aims at describing the various operations for a Data Science Notebook.

Please Note: The Notebook Operations may differ based on the selection of the project environments. A notebook created under the PySpark environment only supports Data, Secrets, Variable Explorer, and Writers operations.

A Data Science Notebook created under the PyTorch or TensorFlow environment will contain the following operations:

Data: Add data and get a list of all the added datasets.
Secrets: You can generate Environment Variables to save your confidential information from getting exposed.
Algorithms: You can get steps to do Algorithm Settings and Project-level access to use Algorithms inside Notebook.
Transforms: Save and load models with transform script, register them, or publish them as an API through the DS Lab module.
Models: You can train, save, and load the models (Sklearn, Keras/TensorFlow, PyTorch). You can also register a model using this tab. Refer to Model Creation using Data Science Notebook for more details.
Artifacts: You can save the plots and datasets as Artifacts inside a DS Notebook.
Variable Explorer: Get detailed information on Variables declared inside a Notebook.
Writers: Write the DSL experiments' output into the database writers' supported range.

Data

The Data options enables a user to add data inside their project from the Data Science Notebook infrastructure.

Adding Data

Navigate to a Data Science Notebook page (.ipynb file).
Click the Data icon given in the right side panel.

The Data option opens displaying the related icons.
Click on the Add icon.

The Add Data page appears.
The steps to add data may vary based on the selected Data source.

Please Note: Refer to the Adding Data page for more details on how to add data.

Please refer to these links: , , and

Reading the Added Data

Please Note: Using the get_data function datasets and data sandbox files (csv & xlsx files) can be read.

Add a new Code cell to Notebook or access an empty Code cell.
Select a dataset from the Data tab.
The get_data function appears in the code cell.

Provide the df (DataFrame) to print the data from the selected Dataset. A Dataset can be an added dataset, data sandbox file, or feature store.
Run the cell.

The Data preview appears below after the cell run is completed.

Project Level Data Tab

The Data Sets/ Sandbox files/ Feature Stores added to a Data Science Notebook will also be listed under the Data tab provided under the same project. Hence, the added datasets will be available for all the Data Science Notebooks created or imported under the same project.

Reading Multiple Sheets inside an Excel Sheet

Check out the illustration to read multiple sheets in a Notebook cell.

Add an Excel file with multiple sheets to a DS Project.
Insert a Markdown cell with the names of the Excel sheets.
Insert a new code cell.
Use a checkbox next to read data.
The get_data function in the code cell.
Run the code cell.
The data preview will appear below.
Select another datasheet name and copy it from the markdown cell.
Paste the copied datasheet name in the code cell that contains the get_data function.
Run the code cell.
The data preview will be displayed below.

Secrets

Generate Environment Variables to save your confidential information from getting exposed.

You can generate Environment variables for the confidential information of your database using the Secret Management function. Thus, it saves your secret information from getting exposed to all the accessible users.

Pre-requisite:

The users must configure the Secret Management using the Admin module of the platform before attempting the Secret option inside the DS Lab module.
The configured Secrets must be shared with a user group to access it inside the Data Science Lab module.
The user account selected for this activity must belong to the same user group to which the configured secrets were shared.

Configuring the Secret Management Administration option

Once the Secret Management has been configured from the Admin module it will have the Secret Key and related fields as explained in this section.

Navigate to the Secret Management option from the Admin module.
Add a Secret Key name.
Insert field values for the added Secret Key.
Click the Save option to save the Secret Management configuration.

Please Note: The given image displays a sample Secret key name. The exact secret key name should be provided or configured by the administrator.

Share the configured Secret Management key to a user group.

Accessing the Secrets tab under a DS Notebook

Access a Data Science Notebook from a user account that is part of the User group with which the configured secret is shared.
Open the Secrets tab from the right side.
Use the Refresh icon to get the latest configured Secret Key.
The newly created Secret Key is listed below. Click on a Secret Key option.
The selected Secret Key name option is displayed with a drop-down icon. Click the drop-down icon next to the Secret Key name to get the fields.

Add a new Code cell.
Select the Secret Keys by using the given checkboxes.
The encrypted environment variables for the fields are generated in the code cell.

Add a new Code cell.
Open the Writers tab.
Select a writer type using the checkbox. E.g., In this case, MySQL has been selected.
Map the encrypted secret keys for the related configuration details like Username, Password, Port, Host, and Database by copying them.
Run the cell.

The data frame will be written to the selected writer's database.

Transforms

Save and load models with transform script, register them or publish them as an API through DS Lab module.

Check out a walk-through on how to use the Transform script inside Notebook.

You can write or upload a script containing the transform function to a Notebook and save a model based on it. You can also register the model as an API service. This entire process is completed in the below-given steps:

Saving and loading a Model with Transform script

Navigate to a Notebook.
Add a Code cell. Write or provide a transform script to the cell (In this case, it has been supplied in three cells).
Run the cell(s) (In this case, run all the three cells).

Add a new code cell and define the model.

Add another cell and click the Save Model option for the newly added code cell.

Specify the model name and type in the auto-generated script in the next code cell.
Run the cell.
Open the Transforms tab.
The model gets saved under the Transforms tab.

Add a new code cell.
Load the transform model by using the checkbox.
Run that cell.

Insert a new code cell.
Click the Transforms option for the code cell.

The auto-generated script appears.

Specify the train data.
Run the code cell.
It will display the transformed data below.

Registering a Transform Model

Open the Transforms tab inside a Notebook.
Click the ellipsis icon for the saved transform.
Select the Register option for a listed transform.

The Register Model dialog box opens to confirm the action.
Click the Yes option.

A confirmation message appears to inform the completion of the action.

The model gets registered and listed under the Registered list of the models.

Open a pipeline workflow with a DS Lab model runner component.
The registered model gets listed under the Meta Information tab of the DS Lab model runner component inside the Data Pipeline module.

Publishing a Transform Model as API

The steps to publish a model as an API that contains transform remain the same as described for a Data Science Model. Refer to the

Writers

This page explains the Writers tab available in the right-side panel of the Data Science Notebook.

The Data Science Lab module provides a Writers tab inside the Notebook to write the output of the data science experiments.

Check out the illustration on how to use the Writers operation inside a DS Notebook.

Navigate to a code cell with dataset details.
Run the cell.
The preview of the dataset appears below.

Click the Secrets tab to get the registered DB secrets.
Select the registered DB secret keys from the Secrets tab.
Add a new code cell.
Get the Secret keys of the DB using the checkboxes provided for the listed Secret keys.

Add a new code cell.
Open the Writers section.
Use the given checkbox to select a driver type for the writers.
The code gets added to the newly added cell.

Provide the Secret values for the required information of the writer such as Username, Password, Host, Port, Database name, table name, and DataFrame.
Run the code cell with the modified database details.
A message below states that the DataFrame has been written to the database. The data gets written to the specified database.

Please Note: The supported DB writers are MYSQL, MSSQL, Oracle, MongoDB, PostgreSQL, and ClickHouse.