Data Science Lab
  • What is Data Science Lab?
  • Accessing the Data Science Lab Module
  • Data Science Lab Quick Start Flow
  • Project
    • Environments
    • Creating a Project
    • Project List
      • View
      • Keep Multiple Versions of a Project
      • Sharing a Project
      • Editing a Project
      • Activating a Project
      • Deactivating a Project
      • Deleting a Project
    • Tabs for a Data Science Lab Project
      • Tabs for TensorFlow and PyTorch Environment
        • Notebook
          • Ways to Access Notebook
            • Create
            • Import
              • Importing a Notebook
              • Pull from Git
          • Notebook Page
            • Preview Notebook
            • Notebook Cells
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Renaming a Notebook
            • Resource Utilization Graph
            • Notebook Taskbar
            • Notebook Operations
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Algorithms
              • Transforms
              • Utility Notebook Operation
              • Models
                • Model Explainer
                • Registering & Unregistering a Model
                • Model Filter
              • Artifacts
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Notebook Actions
          • Notebook List
            • Notebook List Actions
              • Export
                • Export to Pipeline
                • Export to GIT
              • Register as Job
              • Notebook Version Control
              • Sharing a Notebook
              • Deleting a Notebook
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Create Experiment
            • Data Preparation
            • Delete
        • Utility
          • Pull from Git (Utility)
        • Model
          • Model Explainer
          • Share a Model
          • Import Model
          • Export to GIT
          • Register a Model
          • Unregister A Model
          • Register a Model as an API Service
            • Register a Model as an API
            • Register an API Client
            • Pass Model Values in Postman
          • AutoML Models
        • Auto ML
          • Creating AutoML Experiments
            • Creating an Experiment
          • AutoML List Page
            • View Report
              • Details
              • Models
                • View Explanation
                  • Model Summary
                  • Model Interpretation
                    • Classification Model Explainer
                    • Regression Model Explainer
                    • Forecasting Model Explainer
                  • Dataset Explainer
            • Delete
      • Tabs for PySpark Environment
        • Notebook
          • Ways to Access Notebook
            • Create
            • Import
              • Importing a Notebook
          • Notebook Page
            • Preview Notebook
            • Notebook Cells
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Renaming a Notebook
            • Resource Utilization Graph
            • Notebook Taskbar
            • Notebook Operations
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Utility
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Notebook Actions
          • Notebook List
            • Notebook List Actions
              • Export
                • Export to Pipeline
                • Export to GIT
              • Register as Job
              • Notebook Version Control
              • Sharing a Notebook
              • Deleting a Notebook
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Data Preparation
            • Delete
        • Utility
  • Repo Sync Project
    • Environments
    • Creating a Repo Sync Project
    • Project List
      • View
      • Project Migration
      • Keep Multiple Versions of a Project
      • Sharing a Project
      • Editing a Project
      • Activating a Project
      • Deactivating a Project
      • Deleting a Project
    • Tabs for a Data Science Lab Project
      • Tabs for TensorFlow and PyTorch Environment
        • Notebook
          • Accessing the Notebook Tab
          • Adding a Folder or File
          • Notebook Page
            • Preview File
            • .ipynb Cells
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Resource Utilization Graph
            • Notebook Taskbar
            • Operations for an .ipynb File
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Algorithms
              • Transforms
              • Models
                • Model Explainer
                • Registering & Unregistering a Model
                • Model Filter
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Actions Icons for .ipynb File
          • File Options
            • Export
            • Register
            • Delete
          • Git Console
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Create Experiment
            • Data Preparation
            • Delete
        • Model
          • Import Model
          • Model Explainer
          • Share a Model
          • Export to GIT
          • Register a Model
          • Unregister A Model
          • Register a Model as an API Service
            • Register a Model as an API
            • Register an API Client
            • Pass Model Values in Postman
          • AutoML Models
        • Auto ML
          • Creating AutoML Experiments
            • Creating an Experiment
          • AutoML List Page
            • Experiment Status
            • Actions
              • View Report
                • Details
                • Models
                  • View Explanation
                    • Model Summary
                    • Model Interpretation
                      • Classification Model Explainer
                      • Regression Model Explainer
                      • Forecasting Model Explainer
                    • Dataset Explainer
              • Delete
      • Tabs for PySpark Environment
        • Notebook
          • Accessing the Notebook Tab
          • Adding a Folder or File
          • Notebook Page
            • Preview a File
            • Cells for .ipynb Files
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Resource Utilization Graph
            • Notebook Taskbar
            • Operations for an .ipynb File
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Actions for .ipynb Files
            • File Options
              • Export
              • Register
              • Delete
            • Git Console
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Data Preparation
            • Delete
Powered by GitBook
On this page
  • Project Creation
  • Dataset
  • Data Science Experiment

Data Science Lab Quick Start Flow

This page aims to provide all the major steps in the concise manner for the user to kick start their Data Science Experiments.

PreviousAccessing the Data Science Lab ModuleNextProject

Last updated 1 year ago

Data Science module allows the user to create Data Science Experiments and productionize them. This page tries to provide the entire Data Science flow in nutshell for the user to quickly begin their Data Science experiment journey.

Project Creation

A Data Science Project created inside the Data Science Lab is like a Workspace inside which the user can create and store multiple data science experiments.

Pre-requisite: It is mandatory to configure the DS Lab Settings option before beginning with the Data Science Project creation. Also, select the algorithms by using the Algorithms field from the section which you wish to use inside your Data Science Lab project.

Creating a New Project
Project List

Once a Data Science Project gets created it gets listed under the Projects page. Each Project in the list gets the following Actions to be applied on it:

Supported Environment

The following environments are supported inside a Data Science Lab Project.

  • TensorFlow: Users can execute Sklearn commands by default in the notebook. If the users select the TensorFlow environment, they do not need to install packages like the TensorFlow and Keras explicitly in the notebook. These packages can simply be imported inside the notebook.

  • PyTorch: If the users select the PyTorch environment, they do not need to install packages like the Torch and Torchvision explicitly in the notebook. These packages can simply be imported inside the notebook.

  • PySpark: If the users select the PySpark environment, they do not need to install packages like the PySpark explicitly in the notebook. These packages can simply be imported inside the notebook.

Please Note:

  • The Sklearn environment is a default environment for the Data Science Lab Project.

  • The Project level tabs provided for TensorFlow and PyTorch environments remain same, so the current document presents content for them together.

Dataset

Data is the first requirement for any Data Science Project. The user can add the required datasets and view the added datasets under a specific Project by using the Dataset tab.

Adding Data Sets

The user can get a list of uploaded Data Sets and Data Sandbox from the Data Center module under this tab.

The Add Datasets page offers the following Data service options to add as datasets:

Checkout the given illustrations to understand the Adding Dataset (Data Service) and Adding Data Sandbox steps in details.

Please Note:

  1. The user can add Datasets by using the Dataset tab or Notebook page.

  2. Based on the selected Environment the supported Data Sets types can be added to a Project or Notebook. E.g., PySpark environment does not support the Data Service as Dataset.

  3. Refer the Adding Data Sets section with the sub-pages to understand it in details.

Data Set List

All the uploaded and added datasets get various Actions that can help the users to create more accurate Data Science Experiments. The following major Actions are provided to an added Data Set.

Please Note: The user can click each of the above-given Action option to open the information in details.

Data Science Experiment

Once the user creates a Project and adds the required Data sets to the Project, it gets ready to hold a Data Science Experiment. The Data Science Lab user gets the following ways to go ahead with their Data Science Experiments:

Data Science Model

The the Notebook Operations section of the current documentation provides the details on the above stated aspects of the Data Science Models.

  • Algorithms and Transforms are also available for the Data Science models inside the Notebook Page as Notebook Operations.

  • The Notebook Page may contain a customized Notebook operations list based on the selected environment. E.g., The Data Science Projects created using the PySpark environment contain the following Notebook Operations:

  • Refer the Environment specific Notebook Operations by using the following options:

Notebook List Page

The Notebook List page lists all the created and saved Notebooks inside one Data Science Project. The user gets to apply the following Actions on a Notebook from the Notebook List page:

Data Science Experiment
  • Access the Create Experiment option under the Dataset List for the Projects that are created under the supported environments such as PyTorch and TensorFlow.

  • Once the AutoML experiment gets created successfully, the user gets directed to the AutoML List.

  • The following Actions are provided on the AutoML List page:

      • Details

      • Models - This option provides the detailed model explanation.

The View Explanation option is provided for both manually created Data Science models and AutoML generated models.

Model Explainability

The Model Explainer dashboard for a Data Science Lab model.

Please Note:

  • The Auto ML functionality is not supported at present for the Project created in within PySpark environment.

  • Model As API functionality is not available for the AutoML Models.

  • Model Explainability and Model As API functionalities are not available for the Imported models.

Repo Sync Project

Refer the below-given page links to get directed to the various functionalities provided under the Repo Sync Project:

Open the Data Science Lab module and access the Create Project option to begin with the Project creation. Refer the page to understand the steps involved in the Project Creation in details.

(only available for an activated Project)

(only available for an activated Project)

Please Note: Refer the page to understand all the above listed options in details.

The user needs to click on the tab from the Project List page to access the Add Datasets option.

– These are the uploaded data sets (data services) from the Data Center module.

​ – This option lists all the available/ uploaded Data Sandbox.

Refer the page to understand how the user can apply required Data Preparation steps on a specific dataset from the Data Set List page.

- Opens preview of the selected dataset.

- Displays the detailed profile of data to know about data quality, structure and consistency.

- Creates an Auto ML experiment on the selected Dataset.

- Cleans data to enhance quality and accuracy that directly impacts reliability of the results.

- Deletes the selected Dataset.

Use Notebook infrastructure provided under the Project to create script, save as a model or script, load, and predict a model. It is also possible to save the Artifacts for a Saved Model. Refer the section for more details.

Use the Auto ML functionality to get the auto-trained Data Science models. The user can use the option provided under the Dataset List Page to begin with the AutoML model creation. Refer the section of this documentation for more details.

The option carries the following flow to Explain a Model.

- The Model Summary/ Run Summary will display the basic information about the trained top model.

- This option provides the Model Explainer dashboards for an AutoML model.

- This page provides the explainer dashboards for Classification Models.

-This page provides the explainer dashboards for Regression Models.

- This page provides model explainer dashboards for Forecasting Models.

- The Dataset Explainer tab provides high-level preview of the dataset that has been used for the experiment

(TensorFlow or PyTorch Environment)

Creating a Project
View
Push to VCS
Pull from VCS
Share
Edit
Activate Project
Deactivate Project
Delete
Project List
Dataset
​Data Sets
Data Sandbox
Data Preparation
Preview
Data Profile
Create Experiment
Data Preparation
Delete
Notebook
Save a Data Science Model
Load a Data Science Model
Save Artifacts for a saved Data Science Model
Please Note:
Datasets
Secrets
Variable Explorer
Writers
Find and Replace
Notebook Operations for the TensorFlow & PyTorch
Notebook Operations for the PySpark
Export to Pipeline
Export to GIT
Register as Job
Notebook Version Control
Sharing a Notebook
Deleting a Notebook
Create Experiment
AutoML
View Report
Delete
View Explanation
Model Summary
Model Interpretation
Classification Model Explainer
Regression Model Explainer
Forecasting Model Explainer
Dataset Explainer
Creating a Repo Sync Project
Repo Sync Project List
Repo Sync Project in the Python Environment
Repo Sync Project in the PySpark Environment
DS Lab Settings
Creating a DSL Project
Creating a Notebook
Adding a Data Service as Dataset to DSL Project
Uploading a Sandbox file and adding it to a DSL Project
Importing a Notebook