Create Data Preparation

Users can access the Data Preparation landing page using this option from the Data Sandbox list page.

Data Sandbox files serve as an isolated workspace where users can upload, explore, and prepare datasets before moving them into enterprise data pipelines. Applying Data Preparation steps on a Data Sandbox file transforms raw uploaded files—such as CSV, Excel, Parquet, or JSON—into high-quality, analysis-ready data suitable for modeling, reporting, and further processing.

1. File Upload & Initial Exploration

Users upload a dataset into the Sandbox and perform initial inspection:

Previewing columns and sample rows
Detecting schema and automatic data types
Checking file integrity, row counts, and basic patterns

This step provides a quick understanding of the structure and quality of the uploaded file.

2. Data Profiling

The Sandbox automatically scans the dataset to generate profiling statistics:

Missing value percentages
Min/Max/Mean values
Unique counts and frequency distributions
Data type mismatches and anomalies

Profiling helps identify what transformations or cleaning steps are required.

3. Data Cleaning Operations

Cleaning steps can be applied directly on the Sandbox file, including:

Removing duplicate records
Handling missing values (fill, drop, or replace)
Standardizing date/time formats
Fixing inconsistent or invalid entries
Trimming whitespace and normalizing text fields
Converting data types (string → integer, float → date, etc.)

These steps convert raw files into consistent, trustworthy data.

4. Transformations & Derivations

Users can apply a wide range of transformations to refine and derive new insights:

Column creation (KPIs, ratios, flags, classifications)
Aggregation and grouping
Filtering based on business rules
Splitting or merging columns
Conditional logic (IF/ELSE)
Reordering or dropping columns
Renaming fields for clarity and standardization

This step reshapes the Sandbox file into a usable analytical dataset.

5. Data Enrichment

The Sandbox allows enriching the file using:

Reference/master data available in the environment
Lookup mappings (regions, codes, taxonomy values)
Historical data from previous Sandbox sessions
Lightweight AI/ML-generated features (optional)

Enrichment adds depth and business relevance to the dataset.

6. Validation & Quality Checks

Before publishing, users can run validation rules:

Schema validation
Threshold checks (e.g., sales > 0, dates < today)
Null/duplicate checks
Referential integrity validation (if master data is used)

Validation ensures the Sandbox file is reliable before it moves to production workflows.

7. Publishing the Prepared Sandbox Dataset

Once prepared, the dataset can be:

Exported back to the platform under the Data Sandbox list
Pushed into the Data Pipeline as a source for scheduled workflows
Consumed by Data Agents or Data Science Lab notebooks

It becomes an enterprise-ready dataset that can power dashboards, ML models, and automated processes.

Note:

Refer to the Data Preparation section of this documentation for a comprehensive understanding of the entire process.
Use Launch Data Preparation from the Data Sandbox list section to understand how to access the Data Preparation landing page for a Data Set.

PreviousDeleting a Data Sandbox NextCreate Datastore

Last updated 6 days ago