Start your Data Science Experiment with Data Science Lab

Follow the given sequence to accomplish your experimentation with Data Science Lab

  • Navigate to the Projects Page of the Data Science Lab plugin.

  • Click the Create Project option to create a new project.

  • A form opens to provide the Project-related information.

  • The next screen opens asking for the following details for a new project:

    • Project Name: Give a name to the new project.

    • Project Description: Describe the project.

    • Select Algorithms: Select Algorithms using the drop-down menu.

    • Environment: Allows users to select the environment they want to work in. Currently, the supported Python frameworks are PySpark, TensorFlow, and PyTorch.

      • If the users select the TensorFlow environment, they do not need to install packages like the TensorFlow and Keras explicitly in the notebook. These packages can simply be imported inside the notebook.

      • If the users select the PyTorch environment, they do not need to install packages like the Torch and Torchvision explicitly in the notebook. These packages can simply be imported inside the notebook.

      The user can select an option out of the given choices: 1. Python Tensor Flow, 2. Python PyTorch.

    • Resource Allocation: This allows the users to allocate CPU/ GPU and memory to be used by the Notebook container inside a given project. The currently supported Resource Allocation options are Low, Medium, and High.

    • Idle Shutdown: Idle Shutdown: It allows the users to specify the idle time limit after which the notebook session will get disconnected, and the project will be deactivated. To use the notebook again, the project should be activated. The supported Idle Shutdown options are 30m, 1h, and 2h.

    • External Libraries: Provide the external libraries’ links required for the project.

  • Based on the selection of the Resource Allocation field the following fields appear with pre-selected values:

    • Image Name

    • Image Version

    • Limit

    • Memory

    • Request (CPU)

    • Memory

  • Select the nvidia from the GPU Type field to improve the performance of the project.

  • Click the Save option.

  • The newly created project gets saved, and it appears on the screen.

  • The success of project creation is informed by a notification message.

Please Note:

  • The user can also open the Project list by clicking the View Project option.

    • Click the View Project option.

    • The user gets redirected to the Project list.

  • A project gets the Share, Edit, Delete, Activate/Deactivate actions to be applied on it after getting listed under the Project list.

  • A Data Science Project gets related tabs based on the selection of the environment.

    • PySpark environment currently supports only Notebook, Dataset, and Utility tabs.

    • PyTorch & TensorFlow environments support Notebook, Dataset, Model, Utility, and AutoML tabs.

Pre-requisite: The Data Science Lab projects also get Push to VCS and Pull from VCS functionalities, but they only get enabled for the activated DSL projects.

  • Navigate to the Projects page.

  • Select a project from the list.

  • Click the Activate option.

  • A dialog window appears to confirm the Activation.

  • Click the Yes option.​

  • The project gets activated and a notification message appears to communicate the completion of the action.

  • The Activation option gets changed into the Deactivation option for the concerned project.

  • Click the Create Notebook option from the Notebook tab.

  • A new Notebook gets created; the user gets a notification message informing the same.

  • Click the Back icon.

  • The Notebook gets saved under the Notebook list.

Please Note: The following

  1. Edit the Notebook name by using the Edit Notebook Name icon.

  2. The accessible datasets, models, and artifacts will list down under the Datasets, Models, and Artifacts menus.

  3. Find/Replace menu facilitates the user to find and replace a specific text in the notebook code.

  4. Add a description for the created Notebook by using the same page.

The users can seamlessly upload Notebooks created using other tools and saved in their systems.

  • Navigate to the landing page of an activated Project.

  • Click the Upload Notebook option.

  • Specify a Notebook from the system.

  • Click the Open option to upload the Notebook.

  • The selected Notebook gets uploaded under the Project.

  • The same gets confirmed by a notification message.

  • Another notification message appears to inform the status of the Notebook (it gets saved by default).

  • Click the Back icon.

  • The uploaded Notebook gets listed on the landing page of the Project.

Once the Notebook script is executed successfully, the users can save them as a model. The saved model can be loaded into the Notebook.

Save your Data Science Lab Model

  • Navigate to a Notebook.

  • Write code using the following sequence:

    • Read DataFrame

    • Define test and train data

    • Create a model

  • Execute the script.

  • Get a new cell.

  • Give a model name to specify the model.

  • Execute the cell.

  • After the code gets executed, click the Save Model notebook in a new cell.

  • The saved model gets listed under the Models list.

Load your Data Science Lab Model

  • Click on a new cell and select the model by using the given checkbox to load it.

  • The model gets loaded into a new cell.

Check out the walk-through on the Predict option for a DSL Notebook.

The data scientist can get the predicted array from a loaded DSL model that contains a definite DataFrame.

  • Add a new cell.

  • Click the Predict option.

  • Execute the code.

  • Provide the model and DataFrame.

  • The predicted output of the given DataFrame appears as an array.

  • The default comments on how to define the predicted output for a DS Lab model appears as well.

Please Note:

  • The user can save the artifacts of a predicted model using this option.

  • The saved Artifacts can be downloaded as well.

  • Add a new cell.

  • Click the Save Artifacts.

  • Give proper DataFrame name and Name of Artifacts (with extensions - .csv/.txt/.json).

  • Execute the cell.

  • The Artifacts get saved.

The Data Scientist can deploy a saved DSL model to the Data Pipeline plugin by using the Model tab.

  • Navigate to the Model tab.

  • Select a model from the list.

  • Click the Deploy to Pipeline icon for the model.

  • The Deploy to Pipeline dialog box appears to confirm the action.

  • Click the Yes option.​

  • The selected model gets published and deployed to the Data Pipeline (It disappears from the Unpublished model list).

  • A notification message appears to inform the same.

  1. The published/deployed model gets listed under the Published filter.

  2. The Publish option provided under the Notebook tab and the Deploy to Pipeline option provided under the Model tab perform the same task.

This function gets completed in three steps:1. Publish a Model as an API2. Register an API Client3. Pass the Model values in the Postman. Check-out the below given video to understand the Publish Model as an API Service functionality.

​Publish a Model as an API​

You can publish a DSL model as an API using the Model tab. Only the published models get this option.

  • Navigate to the Model tab.

  • Filter the model list by using the Published filter option.

  • Select a model from the list.

  • Click the Publish as API option.

  • The Update model page opens.

  • Provide Max instance limit.

  • Click the Save and Publish option.

Please Note: Use the Save option to save the data which can be published later.

  • The model gets saved and published as an API service. A notification message appears to inform the same.

​Register an API Client​

  • Navigate to the Admin module.​

  • Click the API Client Registration option.

  • The API Client Registration page opens.

  • Click the New option.

  • Select the Client type as internal.

  • Provide the following client specific information:

    • Client Name

    • Client Email

    • App Name

    • Request Per Hour

    • Request Per Day

    • Select API Type- Select the Model as API option.

    • Select the Services Entitled -Select the published DSL model from the drop-down menu.

  • Click the Save option.

  • The client details get registered.

  • A notification message appears to inform the same.

Please Note: Once the client gets registered open the registered client details using the Edit option to get the Client id and Client Secrete key.

​Pass the Model values in the Postman​

  • Navigate to the Postman.

  • Add a new POST request.

  • Pass the URL with model name (Only the Sklearn models are supported at present).

  • Provide required parameters under the Params tab:a. Client Idb. Client Secret Keyc. App Name

  • Open the Body tab.

  • Select the raw option.

  • Provide the input DataFrame.

  • Click the Send option.

  • The response will appear below.

  • You can save the response by using the Save Response option.

Please Note: The model published as an API service can be easily consumed under various apps.

The Dataset tab offers a list of uploaded Datasets which can be added to a project. The user can get a list of uploaded Data Sets and Data Sandbox from the Data Center module under this tab.The Add Datasets page offers the following Data service options to add as datasets:

  1. Data Service – These are the uploaded data sets from the Data Center module.

  2. Data Sandbox – This option lists all the available Data Sandbox from the Data Center module.

  • Navigate to a Project-specific page and click the Dataset tab (E.g., the given image displays the Dataset tab under the Sample Project).

  • Click the Add Datasets button.

  • The Add Datasets page opens offering two options to choose data:

    • Data service (gets selected by default)

    • Data Sandbox

  • Use the Search space to search through the displayed data service list.

  • Select the required data service(s) using the checkboxes provided next to it.

  • Click the Add option.

  • The selected data service(s) gets added to the concerned project.

  • A notification message appears to inform the same.

  • Open the Dataset tab from a specific project.

  • Click the Add Datasets option.

  • You get redirected to the Add Datasets page.

  • Select the Data Sandbox option from the Data Service drop-down menu.

  • Use the Search space to search for a specific Data Sandbox.

  • Select the required Data Sandbox(es) using the checkboxes provided next to it.

  • Click the Add option.

  • The selected data sandbox(es) gets added to the concerned project.

  • A notification message appears to inform the same.

Pre-requisite: Make sure that the Version control settings for the DSL plugin are configured by your administrator before you use this functionality.

Pushing a Project to the VCS

  • Navigate to the Projects page of the DS Lab plugin.

  • Select an activated project.

  • Click the Push into VCS icon for the project.

  • The Push into Version Controlling System dialog box appears.

  • Provide a Commit Message.

  • Click the Push option.

  • The DSL Project version gets pushed into the Version Controlling System, a notification message appears to inform the same.

Pulling a Project from the VCS

  • Navigate to the Projects page of the DS Lab plugin.

  • Select an activated project.

  • Click the Pull from VCS icon for the project.

  • The Pull from Version Controlling System dialog box opens.

  • Select the version that you wish to pull by using the checkbox.

  • Click the Pull option.

  • The pulled version of the selected Project gets updated in the Project list.

  • A notification message appears to inform the same.

Please Note: The Push to and Pull from VCS functionalities will not be enabled for a deactivated project.

Pre-requisite: Make sure that the Version control settings for the DSL plugin are configured by your administrator before you use this functionality.

Pushing a Notebook to the VCS

  • Navigate to the Notebook list of a Project.

  • Select a Notebook.

  • Click the Push into VCS icon for the Notebook.​

  • The Push into Version Controlling System dialog box appears.

  • Provide a Commit Message.

  • Click the Push option.​

  • The Notebook version gets pushed into the Version Controlling System and the Notebook list gets updated with the latest version.

  • A notification message appears to inform the success of the action.​

Pulling a Notebook from the VCS

  • Navigate to the Notebook list given under a Project.

  • Select a Notebook.

  • Click the Pull from VCS icon for the Notebook.

  • The Pull from Version Controlling System dialog box opens.

  • Select the version that you wish to pull by using the checkbox.

  • Click the Pull option.

  • The pulled version of the selected Notebook gets updated in the Notebook list.

  • A notification message appears to inform the success of the action.

Last updated

#64: Data Science Lab Workflow Page merge2

Change request updated