Columns Transforms

The Column transforms section provides tools for modifying, cleaning, and manipulating data within individual columns of a dataset.

The Column Transforms allow users to manipulate, clean, and enrich individual columns in a dataset. These transforms are accessible via the Transforms tab or the column menu in the Data Grid, and they support operations such as adding blank columns, splitting arrays, type casting, and converting data types.

Best Situations to Use

  • Add new columns for data enrichment or calculation purposes.

  • Split the array columns into individual columns for detailed analysis.

  • Ensure columns have consistent and correct data types.

  • Convert data types to String for downstream processing or standardization.

You can access some Column transforms directly from the Menu icon provided for each column in the Data Grid display of the dataset.

Add Blank Column

Creates a new column with no values, allowing users to populate it with custom data later. Steps:

  1. Navigate to the Data Preparation page with the Data Grid.

  2. Open Transforms > Columns > Add Blank Column.

  3. Provide the following:

    • New Column Name

    • Position of the blank column in the dataset.

  4. Click Submit.

Result: A new empty column is added at the specified position.

Notes:

  • Values can later be updated using transforms like Fill Empty or Search & Replace.

  • The transform can also be accessed via the column menu in the Data Grid.

Array to Column

Splits array-type columns into multiple individual columns. Steps:

  1. Select a column containing array data.

  2. Open Transforms > Columns > Array to Columns.

  3. Click Submit.

Result: Array elements are separated into multiple columns.

  • The number of resulting columns is based on the maximum count of values in the array.

  • Column names are auto-generated (e.g., Column_1, Column_2).

Notes: Useful for handling list or array data types where individual elements need to be analyzed separately.

Cast to Types

Ensures columns have consistent data types by profiling the majority type and converting invalid values to null. Steps:

  1. Select a column with mixed data types (e.g., integers with some strings).

  2. Open Transforms > Columns > Cast to Types.

  3. Click Submit.

Result: Invalid values are removed or converted to null.

  • Example: Column with 4 integers and 1 string → profiled as integer, string converted to null.

Notes:

  • May result in data loss for invalid values.

  • Can also be accessed via the column menu in the Data Grid.

Change to String

Converts the datatype of a column into a String. Steps:

  1. Select a column to convert.

  2. Open Transforms > Columns > Change to String.

  3. Click Submit.

Result: Column datatype is updated to String.

Notes:

  • Date columns cannot be converted to a String.

  • Accessible via the column menu in the Data Grid for convenience.

Collect Set

The Collect Set transform generates a list of all unique values in a selected column, optionally creating a new column for the output. It is useful for summarizing categorical data or preparing values for further processing.

Best Situations to Use

  • Consolidating all distinct values in a column.

  • Creating lookup lists or reference sets for data cleaning.

  • Grouping values for aggregation or analysis.

Steps

  1. Select a column in the dataset.

  2. Open Transforms > Columns > Collect Set.

  3. Enable Create New Column if required.

  4. Select a Partitioning Column (optional).

  5. Click Submit.

Note: Collect Set is also accessible via the column menu in the Data Grid.

Concatenate Arrays

The Concatenate Arrays transform combines two or more array-type columns into a single array column.

Best Situations to Use

  • Merging list-type columns into a unified array.

  • Preparing data for feature engineering or analysis where combined array values are needed.

Steps

  1. Select array-type columns.

  2. Open Transforms > Columns > Concatenate Arrays.

  3. Provide a New Column Name.

  4. Click Submit.

Example:

Array1
Array2
Result Array

[1,2]

[3,4]

[1,2,3,4]

Note: Works only for List data type columns.

Concatenate With

The Concatenate With transform appends prefixes, suffixes, or another column’s values to a selected column.

Best Situations to Use

  • Adding prefixes/suffixes for identifiers or codes.

  • Combining columns into a single value for analysis.

Steps

  1. Select the target column.

  2. Open Transforms > Columns > Concatenate With.

  3. Configure options:

    • Prefix: Value to add before the column value.

    • Use With: Choose either Other Column or Value.

    • Separator: Character to separate values.

    • Suffix: Value to add after the column value.

  4. Click Submit.

Note: Select Use with Other Column to concatenate two columns; select Use with Value to add prefix/suffix.

Copy Column

The Copy Column transform duplicates the contents of a column into a new column.

Best Situations to Use

  • Creating a backup of a column before transformations.

  • Preparing a derived column while preserving original data.

Steps

  1. Select a source column.

  2. Open Transforms > Functions > Copy Column.

  3. Provide a New Column Name.

  4. Click Submit.

Delete All Other

The Delete All Other transform removes all columns except the selected column(s).

Best Situations to Use

  • Focusing analysis on specific columns.

  • Reducing dataset size for targeted transformations.

Steps

  1. Select the column(s) to retain.

  2. Open Transforms > Columns > Delete All Other.

  3. Click Submit.

Note: Also accessible via the column menu in the Data Grid.

Delete Column

The Delete Column transform removes a selected column from the dataset.

Best Situations to Use

  • Removing irrelevant or redundant columns.

  • Cleaning up temporary or intermediate columns.

Steps

  1. Select a column to delete.

  2. Open Transforms > Columns > Delete Column.

  3. Click Submit.

Note: Can also be accessed from the column menu for convenience.

Duplicate Column

The Duplicate Column transform creates an exact copy of a selected column in the dataset.

Best Situations to Use

  • Preserve original data while creating derived or modified columns.

  • Create backup copies of columns before applying transformations.

Steps

  1. Select a column from the dataset.

  2. Open Transforms > Columns > Duplicate Column.

  3. Click Submit.

Note: Also accessible via the column menu in the Data Grid.

Fill Empty

The Fill Empty transform populates null or empty cells using values from the row above or below.

Best Situations to Use

  • Impute missing data in numeric or categorical columns.

  • Prepare clean datasets for analysis or modeling.

Steps

  1. Select a column with empty cells.

  2. Open Transforms > Columns > Fill Empty.

  3. Configure:

    • Create New Column (optional)

    • Use From Above or From Below

  4. Click Submit.

Note: Cells with no reference value above or below remain empty.

Generate Primary Key

The Generate Primary Key transform adds a primary key column to a dataset.

Best Situations to Use

  • Create unique identifiers for rows in a table.

  • Prepare datasets for joins, merges, or relational operations.

Steps

  1. Select the first column from the dataset.

  2. Open Transforms > Columns > Generate Primary Key.

  3. Choose Contiguous (auto-increment from 1) or Non-Contiguous (random unique integers).

  4. Click Submit.

Note: This is a table-based operation applied to all rows.

Get Character Length

The Get Character Length transform adds a new column showing the number of characters in each cell.

Best Situations to Use

  • Analyze text length distributions for string columns.

  • Detect unusually long or short entries for data quality checks.

Steps

  1. Select a column.

  2. Open Transforms > Columns > Get Character Length.

  3. Click Submit.

Note: Spaces are counted as characters; empty cells remain unchanged.

Get JSON Objects

The Get JSON Objects transform extracts parameters from JSON data in a column and outputs them as separate columns.

Best Situations to Use

  • Flatten JSON columns for easier analysis.

  • Extract specific fields from nested JSON structures.

Steps

  1. Select a JSON column.

  2. Open Transforms > Columns > Get JSON Objects.

  3. Provide parameters to extract.

  4. Click Submit.

Note: Use commas to extract multiple nested values.

Keep Column

The Keep Column transform retains selected columns and deletes all other columns from the dataset.

Best Situations to Use

  • Focus on specific columns for analysis.

  • Reduce dataset size by removing irrelevant columns.

Steps

  1. Select the columns to keep.

  2. Open Transforms > Columns > Keep Column.

  3. Click Submit.

Merge Columns

The Merge Columns transform combines two or more columns into a single column, optionally using a separator.

Best Situations to Use

  • Create composite identifiers from multiple columns.

  • Combine multiple text or categorical columns for feature engineering.

Steps

  1. Select columns to merge.

  2. Open Transforms > Columns > Merge Columns.

  3. Provide a New Column Name and Separator (e.g., _).

  4. Click Submit.

Example:

Name
Job
Gender
MergedColumn

Ram

Teacher

Male

Ram_Teacher_Male

Nest Column to Object / Array

The Nest Column to Object/Array transform combines one or more columns into a single column as either an object or an array.

Best Situations to Use

  • Combine multiple related columns into structured data for JSON-like processing.

  • Prepare datasets for APIs or downstream data models that require nested formats.

Steps

  1. Select one or multiple columns.

  2. Open Transforms > Columns > Nest Column to Object/Array.

  3. Provide a New Column Name.

  4. Choose Nest Column to Object or Nest Column to Array.

  5. Click Submit.

Result:

  • Array option: Displays comma-separated values in the new column (List datatype).

  • Object option: Displays values within curly brackets ({}).

Note: Array datatype cannot be loaded directly into the datastore.

Pivot

The Pivot transform reshapes data into a pivot table based on a selected pivot column and group-by column. Aggregations (Sum, Avg, Max, Min, Count) can be applied to numeric data.

Best Situations to Use

  • Convert long-format data into wide-format for analysis or reporting.

  • Aggregate numeric values by groups.

Steps

  1. Select a dataset.

  2. Open Transforms > Columns > Pivot.

  3. Select the Group By column (String datatype).

  4. Select the Pivot column (String datatype).

  5. Choose an aggregation operation for a numeric column.

  6. Click Submit.

Note: Group By and Pivot columns can be interchanged to change the result.

Random Number Between 0 and 1 (RAND)

Generates a random decimal number between 0 and 1. A seed value can be provided to generate consistent numbers.

Best Situations to Use

  • Add random values for simulation or testing purposes.

  • Generate stochastic features for machine learning workflows.

Steps

  1. Navigate to a dataset.

  2. Open Transforms > Columns > Random Number Between 0 and 1.

  3. Provide New Column Name and optional seed value.

  4. Click Submit.

Note: Decimal values have 15 digits of precision. If no seed is provided, system timestamp is used.

Relocate Column

Changes the position of a column in the dataset.

Best Situations to Use

  • Organize columns for better readability.

  • Move important columns to the start or end for downstream workflows.

Steps

  1. Select a column.

  2. Open Transforms > Columns > Relocate Column.

  3. Choose a position: At End, At Start, After Column, Before Column.

  4. Click Submit.

Rename Column

Allows users to rename a column in the dataset.

Best Situations to Use

  • Correct misnamed columns for clarity.

  • Standardize column names across datasets.

Steps

  1. Select a column.

  2. Open Transforms > Columns > Rename Column or use the column menu.

  3. Provide a new name.

  4. Click Submit.

Return Non-Null Column Values

Creates a new column with the first non-null value from a list of specified columns.

Best Situations to Use

  • Combine multiple columns into a single consolidated column.

  • Fill gaps in data using the first available value.

Steps

  1. Select columns to check for null values.

  2. Specify the new column name for the result.

  3. Click Submit.

Sorting

Sorts the dataset based on the selected column in Ascending or Descending order.

Best Situations to Use

  • Organize datasets for better readability or analysis.

  • Prepare data for downstream transformations or visualization.

Steps

  1. Select a column.

  2. Open Transforms > Columns > Sorting.

  3. Choose Ascending or Descending.

  4. Click Submit.

Note: Multiple sorts are applied in the order of execution; the last sort determines the final order.

Last updated