Data Science Lab
  • What is Data Science Lab?
  • Accessing the Data Science Lab Module
  • Data Science Lab Quick Start Flow
  • Project
    • Environments
    • Creating a Project
    • Project List
      • View
      • Keep Multiple Versions of a Project
      • Sharing a Project
      • Editing a Project
      • Activating a Project
      • Deactivating a Project
      • Deleting a Project
    • Tabs for a Data Science Lab Project
      • Tabs for TensorFlow and PyTorch Environment
        • Notebook
          • Ways to Access Notebook
            • Create
            • Import
              • Importing a Notebook
              • Pull from Git
          • Notebook Page
            • Preview Notebook
            • Notebook Cells
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Renaming a Notebook
            • Resource Utilization Graph
            • Notebook Taskbar
            • Notebook Operations
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Algorithms
              • Transforms
              • Utility Notebook Operation
              • Models
                • Model Explainer
                • Registering & Unregistering a Model
                • Model Filter
              • Artifacts
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Notebook Actions
          • Notebook List
            • Notebook List Actions
              • Export
                • Export to Pipeline
                • Export to GIT
              • Register as Job
              • Notebook Version Control
              • Sharing a Notebook
              • Deleting a Notebook
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Create Experiment
            • Data Preparation
            • Delete
        • Utility
          • Pull from Git (Utility)
        • Model
          • Model Explainer
          • Share a Model
          • Import Model
          • Export to GIT
          • Register a Model
          • Unregister A Model
          • Register a Model as an API Service
            • Register a Model as an API
            • Register an API Client
            • Pass Model Values in Postman
          • AutoML Models
        • Auto ML
          • Creating AutoML Experiments
            • Creating an Experiment
          • AutoML List Page
            • View Report
              • Details
              • Models
                • View Explanation
                  • Model Summary
                  • Model Interpretation
                    • Classification Model Explainer
                    • Regression Model Explainer
                    • Forecasting Model Explainer
                  • Dataset Explainer
            • Delete
      • Tabs for PySpark Environment
        • Notebook
          • Ways to Access Notebook
            • Create
            • Import
              • Importing a Notebook
          • Notebook Page
            • Preview Notebook
            • Notebook Cells
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Renaming a Notebook
            • Resource Utilization Graph
            • Notebook Taskbar
            • Notebook Operations
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Utility
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Notebook Actions
          • Notebook List
            • Notebook List Actions
              • Export
                • Export to Pipeline
                • Export to GIT
              • Register as Job
              • Notebook Version Control
              • Sharing a Notebook
              • Deleting a Notebook
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Data Preparation
            • Delete
        • Utility
  • Repo Sync Project
    • Environments
    • Creating a Repo Sync Project
    • Project List
      • View
      • Project Migration
      • Keep Multiple Versions of a Project
      • Sharing a Project
      • Editing a Project
      • Activating a Project
      • Deactivating a Project
      • Deleting a Project
    • Tabs for a Data Science Lab Project
      • Tabs for TensorFlow and PyTorch Environment
        • Notebook
          • Accessing the Notebook Tab
          • Adding a Folder or File
          • Notebook Page
            • Preview File
            • .ipynb Cells
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Resource Utilization Graph
            • Notebook Taskbar
            • Operations for an .ipynb File
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Algorithms
              • Transforms
              • Models
                • Model Explainer
                • Registering & Unregistering a Model
                • Model Filter
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Actions Icons for .ipynb File
          • File Options
            • Export
            • Register
            • Delete
          • Git Console
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Create Experiment
            • Data Preparation
            • Delete
        • Model
          • Import Model
          • Model Explainer
          • Share a Model
          • Export to GIT
          • Register a Model
          • Unregister A Model
          • Register a Model as an API Service
            • Register a Model as an API
            • Register an API Client
            • Pass Model Values in Postman
          • AutoML Models
        • Auto ML
          • Creating AutoML Experiments
            • Creating an Experiment
          • AutoML List Page
            • Experiment Status
            • Actions
              • View Report
                • Details
                • Models
                  • View Explanation
                    • Model Summary
                    • Model Interpretation
                      • Classification Model Explainer
                      • Regression Model Explainer
                      • Forecasting Model Explainer
                    • Dataset Explainer
              • Delete
      • Tabs for PySpark Environment
        • Notebook
          • Accessing the Notebook Tab
          • Adding a Folder or File
          • Notebook Page
            • Preview a File
            • Cells for .ipynb Files
              • Using a Code Cell
              • Using a Markdown Cell
              • Using an Assist Cell
            • Resource Utilization Graph
            • Notebook Taskbar
            • Operations for an .ipynb File
              • Datasets
                • Copy Path (for Sandbox files)
              • Secrets
              • Files
              • Variable Explorer
              • Writers
              • Find and Replace
            • Actions for .ipynb Files
            • File Options
              • Export
              • Register
              • Delete
            • Git Console
        • Dataset
          • Adding Data Sets
            • Data Sets
            • Data Sandbox
          • Dataset List Page
            • Preview
            • Data Profile
            • Data Preparation
            • Delete
Powered by GitBook
On this page
  1. Repo Sync Project
  2. Tabs for a Data Science Lab Project
  3. Tabs for PySpark Environment
  4. Dataset
  5. Dataset List Page

Data Preparation

Data preparation is the process of collecting, cleaning, and transforming raw data into a format that can be easily analyzed and used for various applications.

PreviousData ProfileNextDelete

Last updated 1 year ago

Data Preparation involves gathering, refining, and converting raw data is a critical step in data analysis and machine learning, as the quality and accuracy of the data used directly impact the accuracy and reliability of the results. The data preparation is to ensure that the data is accurate, complete, consistent, and relevant to the analysis. By using this action, the data scientist can make more informed decisions, extract valuable insights, and unveil concealed trends and patterns within the raw data.

Please Note: The BDB Data Science Lab provides the Data Preparation icon on the Dataset List page to instantly access the Data Preparation framework.

Go through the given illustration on how to access the Data Preparation framework from the Dataset List page.

  • Navigate to the Dataset list page.

  • Select a Dataset from the list.

  • Click the Data Preparation icon.

  • The Data Preparation page opens displaying the dataset in the grid format.

  • Open the Transforms tab to get the list of the Data Transforms.

  • Apply the required transforms on the data set. E.g., Auto Prep option is applied on the displayed Dataset (The user can also use any transformation using the Transforms tab).

    • Click the Auto Prep option.

    • The Transformation List appears with the pre-selected transforms, modify the selection.

    • Click the Proceed option.

  • Provide a name for the Data Preparation.

  • Click the Back icon to go back.

  • While clicking the Back icon, a notification message appears to inform the users that the data preparation has been saved.

  • The user gets redirected to the Dataset list page.

  • Select the Dataset for which the Preparation was saved.

  • Click the Data Preparation icon for the same Dataset.

  • While opening the Dataset it redirects to the Preparation List displaying the saved data preparation.

  • Click the Create Preparation option to create a new Data Preparation.

Please Note:

  • The details on how to use the Data Preparation option are described in the Data Preparation section under the Data Center.

Refer the page to get an overview of the Data Science Lab module in nutshell.

Data Science Lab Quick Start Flow
Accessing Data Preparation Framework
Data Preparation icon for a Dataset
Applying Auto Prep Transforms and saving the Data Preparation
Notification message while saving the Data Preparation