Data Management Actions
The actions associated with data added to a Data Science Lab (DSLab) project are collectively referred to as Data Management Actions or Data Operations. These actions allow users to explore, prepare, manage, and organize data within the workspace.
Data Operations
Preview
Provides a quick view of the dataset contents without modifying the underlying data.
Displays the first few rows of the dataset, allowing users to inspect values and structure.
Useful for verifying data correctness before further processing.
Data Preparation
Opens tools to clean, transform, and prepare the dataset for analysis or model training.
Includes operations such as handling missing values, filtering records, encoding categorical variables, and scaling features.
Ensures that the dataset is ready for machine learning workflows.
Data Profile
Generates a statistical and structural profile of the dataset.
Provides summary statistics, distributions, and data quality metrics.
Helps users understand dataset characteristics, detect anomalies, and identify potential preprocessing steps.
Copy Path
Copies the internal path or reference of the dataset for use in notebooks or scripts.
Enables users to access the dataset programmatically using functions such as
get_data()
.Supports reproducibility by allowing consistent referencing across notebooks.
Delete
Removes the selected dataset from the project workspace.
Prompts a confirmation dialog to prevent accidental deletion.
Once confirmed, the dataset is permanently removed and no longer accessible within the project.
Notes
These operations are accessible from the ellipsis provided next to the added data within a workspace for any added data.
Proper use of these operations ensures that datasets are well-managed, clean, and ready for modeling tasks.
Dataset operations may vary slightly depending on the type of data source (e.g., Data Center datasets, Data Sandbox files, or Feature Store entries).