Data Science Lab Quick Start Flow

This page aims to provide all the major steps in the concise manner for the user to kick start their Data Science Experiments.

Data Science module allows the user to create Data Science Experiments and productionize them. This page tries to provide the entire Data Science flow in nutshell for the user to quickly begin their Data Science experiment journey.

Project Creation

A Data Science Project created inside the Data Science Lab is like a Workspace inside which the user can create and store multiple data science experiments.

circle-info

Pre-requisite: It is mandatory to configure the DS Lab Settings option before beginning with the Data Science Project creation. Also, select the algorithms by using the Algorithms field from the DS Lab Settings section which you wish to use inside your Data Science Lab project.

Creating a Data Science Project Lab
chevron-rightCreating a New Projecthashtag

Open the Data Science Lab module and access the Create Project option to begin with the Project creation. Refer the Create Project page to understand the steps involved in the Project Creation in details.

chevron-rightProject Listhashtag

Once a Data Science Project gets created it gets listed under the Projects page. Each Project in the list gets the following Actions to be applied on it:

  1. Push to VCSarrow-up-right (only available for an activated Project)

  2. Pull from VCS arrow-up-right(only available for an activated Project)

Please Note: Refer the Project Listarrow-up-right page to understand the concept in details.

chevron-rightSupported Environmenthashtag

The following environments are supported inside a Data Science Lab Project.

  • TensorFlow: Users can execute Sklearn commands by default in the notebook. If the users select the TensorFlow environment, they do not need to install packages like the TensorFlow and Keras explicitly in the notebook. These packages can simply be imported inside the notebook.

  • PyTorch: If the users select the PyTorch environment, they do not need to install packages like the Torch and Torchvision explicitly in the notebook. These packages can simply be imported inside the notebook.

  • PySpark: If the users select the PySpark environment, they do not need to install packages like the PySpark explicitly in the notebook. These packages can simply be imported inside the notebook.

Please Note: The Sklearn environment is a default environment for the Data Science Lab Project.

Dataset

Data is the first requirement for any Data Science Project. The user can add the required datasets and view the added datasets under a specific Project by using the Dataset tab.

The user needs to click on the Dataset tab from the Project List page to access the Add Datasets option.

chevron-rightAdding Data Setshashtag

The user can get a list of uploaded Data Sets and Data Sandbox from the Data Center module under this tab.

The Add Datasetsarrow-up-right page offers the following Data service options to add as datasets:

  1. ​Data Sets arrow-up-right– These are the uploaded data sets (data services) from the Data Center module.

  2. ​Data Sandboxarrow-up-right – This option lists all the available/ uploaded Data Sandbox.

circle-check
Adding Dataset (Data Service) as Dataset
Adding Data Sandbox as Dataset
circle-info

Please Note:

  1. The user can add Datasets by using the Dataset tab or Notebook page.

  2. Based on the selected Environment the supported Data Sets types can be added to a Project or Notebook. E.g., PySpark environment does not support the Data Service as Dataset.

  3. Refer the Adding Data Setsarrow-up-right section with the sub-pages to understand it in details.

  4. Refer the Data Preparationarrow-up-right page to understand how the user can apply required Data Preparation steps on a specific dataset from the Data Set List page.

chevron-rightData Set List hashtag

All the uploaded and added datasets get various Actions that can help the users to create more accurate Data Science Experiments. The following major Actions are provided to an added Data Set.

Please Note: The user can click each of the above-given Action option to open the information in details.

Data Science Experiment

Once the user creates a Project and adds the required Data sets to the Project, it gets ready to hold a Data Science Experiment. The Data Science Lab user gets the following ways to go ahead with their Data Science Experiments:

Use Notebook infrastructure provided under the Project to create script, save as a model or script, load, and predict a model. It is also possible to save the Artifacts for a Saved Model. Refer the Notebookarrow-up-right section for more details.

Creating a Notebook
Uploading a Notebook
chevron-rightData Science Model hashtag

The the Notebook Operationsarrow-up-right section of the current documentation provides the details on the above stated aspects of the Data Science Models.

Please Note:

chevron-rightNotebook List Pagehashtag

The Notebook List page lists all the created and saved Notebooks inside one Data Science Project. The user gets to apply the following Actions on a Notebook from the Notebook List page:

Use the Auto ML functionality to get the auto-trained Data Science models. The user can use the Create Experiment option provided under the Dataset List Page to begin with the AutoML model creation. Refer the AutoML arrow-up-rightsection of this documentation for more details.

chevron-rightData Science Experimenthashtag
  • Access the Create Experiment option under the Dataset List for the Projects that are created under the supported environments such as PyTorch and TensorFlow.

  • Once the AutoML experiment gets created successfully, the user gets directed to the AutoML List.

  • The following Actions are provided on the AutoML List page:

    • View Report

      • Details

      • Models - This option provides the detailed model explanation.

    • Delete

The View Explanation option is provided for both manually created Data Science models and AutoML generated models.

chevron-rightModel Explainabilityhashtag

The Model Explainer arrow-up-rightdashboard for a Data Science Lab model.

The View Explanationarrow-up-right option carries the following flow to Explain a Model.

circle-info

Please Note:

  • The Auto ML functionality is not supported at present for the Project created in within PySpark environment.

  • Model As API functionality is not available for the AutoML Models.

  • Model Explainability and Model As API functionalities are not available for the Imported models.

Last updated

Was this helpful?