This section of the document describes the actions attributed to the added data inside a Data Science Lab project.
The Data Preview option displays a sample of the actual data for the user to understand the data values in a better way.
Navigate to the Dataset list inside a Project.
Select either a Data Sandbox or Dataset from the displayed list.
Click the Preview icon for the selected data entity.
The Preview Data Sandbox or Preview Dataset page opens based on the selected data.
This action helps users to visualize the detailed profile of data to know about data quality, structure, and consistency. A data profile is a summary of the characteristics of a dataset. It is created as a preliminary step in data analysis to better understand the data before performing an in-depth analysis.
Check out the illustration provided at the beginning to get the full view of the Data Profile page.
Navigate to the Data list page.
Select a Dataset from the list. It can be anything from a Dataset, Data Sandbox file, or Feature Store.
Click the Data Profile icon.
The Data Profile drawer opens displaying the Data Set information, Variable Types, Warnings, Variables, Correlation chart, missing values, and sample.
The users can create a supervised learning (Auto ML) experiment using the Create Experiment option.
Check out the illustration to create an auto ML experiment.
Navigate to the Dataset List page.
Select a Dataset from the list.
Click the Create Experiment icon.
Please Note: An experiment contains two steps:
Configure: Enter the Experiment name, Description, and Target column.
Select Experiment Type: Select an algorithm type from the drop-down menu.
A Classification experiment can be created for discrete data when the user wants to predict one of the several categories.
A Regression experiment can be created for continuous numeric values.
A Forecasting experiment can be created to predict future values based on historical data.
The Configure tab opens (by default) while opening the Create Experiment form.
Provide the following information:
Provide a name for the experiment.
Provide Description (optional).
Select a Target Column.
Select a Data Preparation from the drop-down menu.
Use the checkbox to select a Data Preparation from the displayed drop-down.
Select columns that need to be excluded from the experiment.
Use the checkbox to select a field to be excluded from the experiment.
Please Note: The selected fields will not be considered while training the Auto ML model experiment.
Click the Next option.
The user gets redirected to the Select Experiment Type tab.
Select a prediction model using the checkbox.
Based on the selected experiment type a validation notification message appears.
Click the Done option.
A notification message appears.
The user gets redirected to the Auto ML list page.
The newly created experiment gets added to the list with the Status mentioned as Started.
Data Preparation involves gathering, refining, and converting raw data into refined data. It is a critical step in data analysis and machine learning, as the quality and accuracy of the data used directly impact the accuracy and reliability of the results. The data preparation ensures that the data is accurate, complete, consistent, and relevant to the analysis. The data scientist can make more informed decisions, extract valuable insights, and unveil concealed trends and patterns within the raw data with the help of the Data Preparation option.
Navigate to the Data tab.
Select a Dataset from the list.
Click the Data Preparation icon.
The Preparation List window displays the preparation based on the selected Excel file. The user may use any of the displayed data preparation from the list.
The user can select a sheet name from the given drop-down menu.
Click the Data Preparation option to create a new preparation.
The Data Preparation page opens displaying the dataset in the grid format.
Click the Auto Prep option to apply the default set of transforms under the Auto Prep.
The Transformation List window opens.
Select or dis-select the transforms using the given checkboxes.
Click the Proceed option.
The selected AutoPrep transforms are applied to the dataset. Provide a name for the Data Preparation.
Click the SAVE option.
A notification message informs the users that the data preparation has been saved.
The user gets redirected to the Preparation List window.
Click the Refresh icon.
The newly created Data Preparation gets added to the Preparation List.
Navigate to the Data tab.
Select a Dataset from the list.
Click the Delete icon.
A dialog box opens to ensure the deletion.
Click the Yes option.
A notification message appears to assure about the completion of the deletion action.
The concerned Data set will be removed from the list.
Please Note: The Preview, Create Experiment, and Data Preparation Actions are not supported for the Datasets based on a Feature Store.
This page describes the steps to add data to your DSL project.
Pre-requisites:
The users must have permission to access the Data Center module of the Platform.
The users must have the required data sets listed under the Data Center module.
Check out the illustration to understand the steps for adding Datasets to a DSL Project.
Open a Project.
Click the Data tab to open it.
Click the Add Data option from the Data tab.
The Add Data page opens offering two options to choose data:
Data service (the default selection)
Data Sandbox Files
Feature Stores
Go ahead with the Data Sets option from the Data Source drop-down menu.
Use the Search space to search through the displayed data service list.
Select the required data set(s) using the checkboxes provided next to it.
Click the Add option.
The selected data set(s) gets added to the concerned project.
A notification message appears to inform the same.
Pre-requite: The user must configure the Sandbox Settings to access the Data Sandbox option under the Data Science Lab.
Check out the illustration to understand the steps for uploading and adding Datasandbox to a DSL Project.
Open a DSL Project.
Click on the Data tab.
Click the Add Data option.
The user gets redirected to the Add Data page.
Select the Data Sandbox option from the Data Source drop-down menu.
Click the Upload option to upload a Data Sandbox file.
The user gets redirected to the Upload Data Sandbox page.
Provide a Sandbox Name.
Provide a Description of the Data Sandbox.
Click the Choose File option to select a file.
Choose a file from your system and upload it. The supported files are
Click the Save option to begin the file upload.
Wait till the uploaded file gets loaded 100%.
The uploaded sandbox file gets added under the Add Datasets page.
A notification message appears to indicate that the file has been uploaded.
The user gets redirected to the Add Data page.
Select the Data Sandbox option from the Data Source drop-down menu.
Use the search space to search a specific data sandbox.
Select Data Sandbox files using the checkbox given next to the Sandbox entry (The uploaded Data Sandbox file appears at the top of the list).
Click the Add option that appears after selecting the uploaded Sandbox file.
The user gets redirected to the Dataset tab where the added dataset file gets listed.
A notification message appears to inform that the selected Dataset (in this case, the selected Data Sandbox file) has been updated.
Please Note: The users get a search bar to search across the multiple Datasets options on the Add Datasets page.
Check out the illustration to understand the steps for adding Feature Stores to a DSL Project.
Navigate to a DSL Project.
Click the Data tab to open it.
Click the Add Data option from the Data tab.
The Add Data page opens offering three options to choose data.
Select the Feature Stores option from the Data Source drop-down menu.
Use the Search space to search through the displayed data service list.
Select the required feature store(s) using the checkboxes provided next to it.
Click the Add option.
A notification message appears to inform the same.
The selected feature store(s) gets added to the concerned project.
Check out the illustration to understand adding a Feature Store with Data Preparation.
Navigate to the Data Science Lab module.
Click the Create option provided for the Feature Store.
The Create Feature Store page opens.
Provide a name to the Feature Set.
Select a connector from the drop-down menu.
Select a query from the table info. / Metadata list or write an SQL Query.
Click the Validate option.
A notification message ensures that the query is validated.
The Preview of the data appears below.
Click the Data Prep option.
The user gets redirected to the Data Preparation page.
Navigate to the Transforms tab.
Choose a transform from the list. Here, the Label Encoding transform is selected from the ML category.
A warning appears to remind the users that if the SQL query is changed, the applied data preparations or transformations will be lost.
The Data Prep option will have a green mark suggesting that the Data Preparation is applied to the selected Feature Store.
Click the Create option.
A notification ensures that the Feature store job is initiated.
The user gets redirected to the Feature Stores page.
The newly created feature store gets added at the top of the list.
Open a Project.
The Workspace tab opens by default.
Open the Data tab.
Click the Add Data icon.
The Add Data page opens.
Select Feature Stores as an option using the Data Source filter menu.
The list of the available Feature Stores will be listed.
Select a feature store using the checkbox.
Click the Add option.
A notification appears stating that the feature store has been added.
The recently added feature store appears under the Data section of the selected project.
Add a new code cell and put a checkmark in the given checkbox next to the recently added Feature Store as data for the project.
The Data gets loaded in the code cell.
Run the code cell with the loaded feature store.
The data preview appears below the code cell.
This section focuses on how to add or upload datasets to your DSL Projects. The Dataset tab lists all the added Data to a Project.
The Add Data option provided under the Data tab redirects the users to add various types of data to a DSL Project. The users can also upload sandbox files or create feature stores using this functionality.
Please Note: Users can add Datasets by using the Data tab or Notebook page provided under the Workspace tab.
Open a Data Science Lab Project.
Click on the Data tab from the opened Project.
The Data tab opens displaying the Add Data option.
The Add Data page opens the uploaded and added Data Sources for the selected DSL Project.
The Add Data page offers the following Data source options to add as datasets:
Data Sets – These are the uploaded data sets from the Data Center module.
Data Sandbox – This option lists all the available/ uploaded Data Sandbox files.
Feature Store – This option lists all the available Feature Stores under the selected DSL Project.
Please Note: Refer to the section of the module for more details.