Data Preparation Landing Page

The Data Preparation Landing Page is the central workspace for exploring, transforming, and cleaning datasets using the Data Grid, transformations, Auto Prep, filtering, and data quality monitoring.

The Data Preparation Landing Page provides a workspace to explore, transform, and clean datasets. Users can interact with data via a Data Grid view, apply transformations, perform Auto Prep, filter, and monitor data quality. The landing page serves as the central hub for preparing datasets before analytics or machine learning workflows.

The Data Grid displays either a sample or a full dataset, depending on the volume, and provides visual indicators, headers, and transformation tools to streamline data preparation.

Best Situations to Use

Use the Data Preparation Landing Page when you want to:

  • Clean and standardize datasets for downstream analytics.

  • Apply transformations to columns, rows, or specific values.

  • Quickly assess data quality using visual indicators.

  • Preview and sample large datasets before applying transformations.

  • Automate repetitive cleaning tasks using Auto Prep.

Data Preparation Landing Page

Key Features

Data Grid

  • Displays datasets in a tabular format.

  • Shows column names, types, and visual summaries via column charts.

  • Supports dropdown context menus on each column with options like:

    • Rename Column

    • Hide Column

    • Delete Column

    • Delete All Others

    • Duplicate Columns

    • Get Character Length

    • Change Data Type (for Integer columns)

Data Type Indicators

  • Integer: Min and Max values

  • String: Number of unique values or categories

  • Date: Min and Max dates

  • Supported types: Integer, Double, String, Date, Timestamp, Long, Email, Boolean, Gender, URL

Handling Repetitive Column Names

  • Excel: Columns with the same name receive _0, _1, _2 suffixes.

  • CSV: Columns with duplicates receive .1, .2, .3 suffixes for subsequent columns.

Data Quality Bar

  • Displays valid, invalid, and blank data with color coding:

    • Dark Blue: Valid Data

    • Orange: Invalid Data

    • Light Blue: Blank Data

  • Color-coded bars appear when a column is selected in the Data Grid.

Settings

Skip Rows

  • Skip rows from a specified index in the dataset.

  • Useful for large datasets (>1,000 rows) to improve performance.

  • Skipped rows are excluded during transformations.

Total Rows

  • Defines the number of rows displayed in the Data Grid (default 1,000).

  • Options: 2K, 3K, 4K, 5K rows.

  • Pagination is automatically adjusted (200 rows per page by default).

Show/Hide Columns

  • Allows users to hide or display specific columns.

  • Columns can also be managed via the column header context menu.

Auto Prep

  • Automated data cleaning for datasets.

  • Performs transformations such as:

    • Cast to Types: Corrects mismatched data types.

    • Remove Special Characters from Metadata: Cleans column headers.

    • Fill Empty Cells: Fills empty cells based on data type (String → NA, Numeric → 0, Date → NaT).

    • Remove Special Characters: Deletes characters like @, #, %, _, etc.

    • Remove Accents: Normalizes accented characters.

    • Delete Rows with Empty or Invalid Cells: Optional for advanced cleaning.

  • Auto Prep steps are listed under the Steps tab and saved as AUTO DATAPREP.

Filter

  • Allows filtering by:

    • Data Types

    • Column Name

    • Row Value

  • Filters are applied via a Filter drawer and can be customized per column or row.

Saving a Data Preparation

  • Save transformations with the Save option.

  • Enabled only after at least one transform or Auto Prep is applied.

  • Unnamed Data Preparations are auto-saved with generated names.

Pagination

  • The Data Grid displays 200 rows per page by default.

  • Adjust via the Total Rows setting.

  • Pagination adapts to dataset size, adding pages as needed.

Key Metrics

Displayed at the bottom of the Data Grid for quick insights:

  • Column Count: Total number of columns

  • Data Type Count: Number of distinct data types

  • Source: Name of the source dataset

  • Sample Row Count: Total number of rows displayed

Best Situations to Use Specific Features

Feature
Best Situation to Use

Data Grid

Preview datasets, inspect columns and types, and visualize column distributions.

Data Quality Bar

Quickly identify invalid or missing data before transformations.

Skip Rows / Total Rows

Optimize performance for large datasets; control visible sample size.

Show/Hide Columns

Focus on relevant columns for analysis; reduce clutter.

Auto Prep

Automate cleaning and standardization for large or messy datasets.

Filter

Quickly isolate rows or columns of interest for targeted analysis.

Save Data Preparation

Persist transformations for reuse or sharing across the platform.

Last updated