Create Data Preparation

Users can create Preparations on top of a Data Set using this option.

Data Preparation is the foundational step in any analytics, machine learning, or data-driven decision workflow. It transforms raw or semi-structured data into a clean, enriched, and analysis-ready dataset that can be reliably consumed by downstream applications like BI dashboards, AI models, and Data Agents.

Creating a Data Preparation layer on top of a dataset typically involves the following processes:

1. Data Ingestion & Profiling

Data is first ingested from source systems—databases, files, APIs, or pipelines—and profiled to understand its structure, data types, missing values, patterns, and data quality issues. Profiling helps identify anomalies, duplicates, outliers, and incorrect data types.

2. Data Cleaning

The dataset undergoes systematic cleaning operations such as:

Removing duplicates
Handling missing values (drop/impute)
Standardizing formats (dates, currencies, phone numbers)
Correcting inconsistent entries
Fixing schema or column-level issues (renaming, type casting)

3. Data Transformation

Transformations are applied to reshape and enhance the dataset, including:

Column derivation (KPIs, flags, ratios, classifications)
Normalization, aggregation, and filtering
Joins and merges with reference/master tables
Splitting or combining columns
Converting raw attributes into meaningful business metrics

4. Data Enrichment

The dataset is augmented using additional internal or external data sources:

Master data (Customer, Product, Location)
Lookup tables (Codes, Categories)
Behavioral or transactional histories
AI-generated features (sentiment, affinity scores, churn likelihood)

This enriches the dataset and improves insight generation.

5. Validation & Quality Checks

Before publishing, validation and rule-based checks ensure:

Schema accuracy
Business rule conformance
Referential integrity
Threshold-based data quality scores

Any failed rule triggers alerts or logs for correction.

6. Publishing the Prepared Dataset

The final, cleaned, transformed, and enriched dataset is stored in a structured zone—often a Gold Layer, DataMart, or Semantic Layer—and can be consumed by:

BI Dashboards
Data Science Models
Data Agents
Pipeline automation
APIs for external applications

It becomes the “single source of truth” for business analytics.

Note:

Refer to the Data Preparation section of this documentation for a comprehensive understanding of the entire process.
Use Launch Data Preparation from the Data Set list section to understand how to access the Data Preparation landing page for a Data Set.

PreviousCreate Data as API NextFunction

Last updated 6 days ago