Function Transforms

Function Transforms apply a mathematical or programmatic function to data to convert its format or structure, making it ready for processing.

The Functions transform enables users to execute advanced operations on dataset columns beyond basic manipulation. This includes:

  • Encoding: Converting categorical or text data into numerical formats (e.g., one-hot encoding) for use in machine learning models.

  • Counting: Calculating frequency or count statistics for values within a column.

  • Geospatial Point Generation: Creating latitude and longitude points or other geographical data from existing location information for spatial analysis.

Essentially, these transforms leverage pre-defined computational functions to enrich the dataset and prepare it for complex analytical or modeling tasks.

Base64 Decode

The Base64 Decode transform converts Base64-encoded text into its original string format. The output type is String.

Best Situations to Use

  • Decode Base64-encrypted values in string columns.

  • Convert encoded data for analysis or reporting.

Steps

  1. Select a string column containing Base64 data.

  2. Open Transforms > Functions > Base64 Decode.

  3. Enable Create New Column (optional).

  4. Provide a New Column Name.

  5. Click Submit.

Base64 Encode

The Base64 Encode transform converts input values to Base64-encoded text, optionally padded with =. The input can be of any type, and the output is a String.

Best Situations to Use

  • Encode sensitive or textual data for secure transmission or storage.

  • Prepare data for systems requiring Base64 input.

Steps

  1. Select a string column.

  2. Open Transforms > Functions > Base64 Encode.

  3. Enable Create New Column (optional).

  4. Provide a New Column Name.

  5. Click Submit.

Note: Encoded columns are added at the end of the dataset by default.

Count Distinct

The Count Distinct transform calculates the number of unique values in a column, optionally grouped by other columns.

Best Situations to Use

  • Determine the distinct count of categorical values.

  • Summarize unique entries by group for aggregation or reporting.

Steps

  1. Select a dataset column.

  2. Open Transforms > Functions > Count Distinct.

  3. Provide a New Column Name.

  4. Select Group By Columns (optional).

  5. Click Submit.

Note: Multiple columns can be selected as group-by columns.

Count Occurrences

The Count Occurrences transform counts the occurrences of a specified word, letter, or digit in a string column.

Best Situations to Use

  • Analyze text patterns within a dataset.

  • Generate features for text analytics or NLP.

Steps

  1. Select a string column.

  2. Open Transforms > Functions > Count Occurrences.

  3. Provide a New Column Name.

  4. Enter the Value to count.

  5. Enable Ignore Case if required.

  6. Click Submit.

Create a Geopoint from Latitude and Longitude

The Create Geopoint transform generates a geographical point column from latitude and longitude columns. The output format is POINT(Longitude Latitude).

Best Situations to Use

  • Prepare datasets for geospatial analysis.

  • Generate coordinates for mapping or location-based analytics.

Steps

  1. Select a dataset containing latitude and longitude columns.

  2. Open Transforms > Functions > Create Geopoint from Latitude and Longitude.

  3. Select the Latitude column.

  4. Select the Longitude column.

  5. Provide a New Column Name.

  6. Click Submit.

Distance Between Geo Points

Calculates the distance between two sets of latitude and longitude points. Results can be returned in kilometers or miles.

Best Situations to Use

  • Calculate distances for geospatial analysis or mapping applications.

  • Prepare features for routing, logistics, or location-based analytics.

Steps

  1. Select a dataset with two sets of latitude and longitude columns.

  2. Open Transforms > Functions > Distance Between Geo Points.

  3. Configure:

    • Latitude and Longitude columns for both points

    • Unit: Kilometers or Miles

    • New Column Name

  4. Click Submit.

Exact

The Exact transform performs exact matching between two columns and outputs a Boolean result.

Best Situations to Use

  • Identify duplicate or matching values across columns.

  • Verify data consistency for data quality or integration tasks.

Steps

  1. Select two columns to compare.

  2. Open Transforms > Functions > Exact.

  3. Provide:

    • New Column Name

    • Enable Ignore Case (optional)

  4. Click Submit.

Extract Data with GROK

Extracts structured data from text columns using GROK patterns or regular expressions. Output is a JSON-like structure.

Best Situations to Use

  • Parse log files, semi-structured text, or complex strings.

  • Extract multiple fields from single text entries for analytics.

Steps

  1. Select the column containing GROK-patterned text.

  2. Open Transforms > Functions > Extract Data with GROK.

  3. Provide:

    • New Column Name

    • GROK Pattern (e.g., %{IPORHOST:clientip}--[%{HTTPDATE:timestamp}])

  4. Click Submit.

Example Output:

{
  "clientip": "192.168.1.1",
  "timestamp": "20/Jul/2023:10:00:00 +0000",
  "method": "GET",
  "request": "/index.html",
  "httpversion": "1.1",
  "response": "200"
}

Extract Latitude and Longitude from a Geopoint

Extracts latitude and longitude values from a Geo point column.

Best Situations to Use

  • Split Geo point columns for geospatial calculations or mapping.

  • Generate separate latitude/longitude features for modeling or visualization.

Steps

  1. Select a Geo point column.

  2. Open Transforms > Functions > Extract Latitude and Longitude from a Geopoint.

  3. Click Submit.

Extract N-grams

Generates contiguous sequences of words (N-grams) from a text column. Useful for text analysis and NLP feature creation.

Best Situations to Use

  • Feature engineering for text analytics or machine learning.

  • Analyze word sequences in descriptive or narrative columns.

Steps

  1. Select a string/text column.

  2. Open Transforms > Functions > Extract Ngrams.

  3. Provide:

    • New Column Name

    • Size: Number of words in each N-gram

  4. Click Submit.

Example: For a size of 2, the text “data analysis workflow” → [data analysis, analysis workflow].

Extract User Agent Information

Parses a browser’s User-Agent string to extract information into separate columns: device_family, browser_family, browser_version, os, os_version.

Best Situations to Use

  • Analyze browser or device usage patterns.

  • Enrich web analytics datasets with structured device information.

Steps

  1. Select a column containing User-Agent strings.

  2. Open Transforms > Functions > Extract User Agent Information.

  3. Click Submit.

Extract with Regular Expression (Regex)

Extracts specific patterns or substrings from textual data using regular expressions.

Best Situations to Use

  • Parse emails, phone numbers, or structured strings.

  • Extract multiple fields from semi-structured text for analytics.

Steps

  1. Select a text column.

  2. Open Transforms > Functions > Extract with Regular Expression.

  3. Provide:

    • New Column Name

    • Regex Pattern (e.g., [\w.-]+@[\w.-]+\.\w+)

  4. Click Submit.

Example: "Reach out at [email protected]"["[email protected]"].

Formula-Based Transform

Applies mathematical operations on numeric columns to generate a new calculated column.

Best Situations to Use

  • Create derived metrics from numeric columns.

  • Perform custom calculations not available through standard transforms.

Steps

  1. Open Transforms > Functions > Formula-Based Transform.

  2. Select numeric columns from the Select Columns drop-down.

  3. Create a formula using the selected columns.

  4. Provide a New Column Name.

  5. Click Submit.

Generate Numerical Combination Transformation

Generates all possible combinations of a set of numbers without repetition from a list column.

Best Situations to Use

  • Feature engineering for combinatorial analysis.

  • Generate permutation-based datasets for testing or modeling.

Steps

  1. Select a list-type column.

  2. Open Transforms > Functions > General Numerical Combination Transformation.

  3. Provide New Column Name and Combination Length.

  4. Click Submit.

Get URL Parameters

Extracts query parameters from a URL column into a JSON object with key-value pairs.

Best Situations to Use

  • Parse URL query strings for analytics, tracking, or feature extraction.

Steps

  1. Select a URL or String column.

  2. Open Transforms > Functions > Get URL Parameters.

  3. Click Submit.

Result: A new column is added containing a JSON object of extracted parameters.

Example: http://example.com?color=blue&shape=square{"color": "blue", "shape": "square"}

If Missing

Handles missing values by replacing empty cells with a specified value or column value.

Best Situations to Use

  • Standardize incomplete datasets before analysis or modeling.

  • Fill missing categorical or numeric values consistently.

Steps

  1. Select a column with missing values.

  2. Open Transforms > Functions > If Missing.

  3. Configure:

    • Enable Create New Column (optional).

    • Select the Columns to apply.

    • Specify the value to fill missing cells.

  4. Click Submit.

If NULL

The If NULL transform replaces null values in a column with a specified value or creates a new column for the transformed output.

Best Situations to Use

  • Fill null or missing values in a column consistently.

  • Prepare datasets for analysis or modeling where nulls are not allowed.

Steps

  1. Select a column containing null values.

  2. Open Transforms > Functions > If NULL.

  3. Enable Create New Column (optional).

  4. Specify the value to fill null cells.

  5. Click Submit.

If Then Else

Applies conditional logic to return values based on a test expression:

  • Executes true_expression if the condition is true

  • Executes false_expression if the condition is false

Best Situations to Use

  • Apply custom conditional logic for categorical or numeric columns.

  • Generate derived columns based on logical rules.

Steps

  1. Select a column (binary or categorical).

  2. Open Transforms > Functions > If Then Else.

  3. Provide:

    • New Column Name

    • Logical Expression (e.g., ColumnName == "Value" for strings, ColumnName == 5 for numbers)

    • Then Value

    • Else Value

  4. Click Submit.

Impute with Computed Values

Fills missing or empty values using computed statistics such as mean, median, or mode.

Best Situations to Use

  • Handle missing numeric or categorical data before analysis or modeling.

  • Standardize missing values consistently across a dataset.

Steps

  1. Select a column with numeric values.

  2. Open Transforms > Functions > Impute with Computed Values.

  3. Enable Create New Column (optional).

  4. Provide New Column Name.

  5. Select a Computed Value Method (Mean, Median, Mode).

  6. Click Submit.

IN

Checks whether column values exist within a specified list and returns a Boolean result.

Best Situations to Use

  • Filter or flag rows that match a list of target values.

  • Generate Boolean features for modeling.

Steps

  1. Select a column.

  2. Open Transforms > Functions > IN.

  3. Provide:

    • New Column Name

    • Values (comma-separated)

  4. Click Submit.

Is Even

Checks if an integer value is even.

Best Situations to Use

  • Feature engineering for numerical columns.

  • Create Boolean flags for even/odd categorization.

Steps

  1. Select an integer column.

  2. Open Transforms > Functions > Is Even.

  3. Click Submit.

Is Mismatched

Tests whether a column’s values do not match a specified datatype, returning Boolean results.

Best Situations to Use

  • Validate datatype consistency in datasets.

  • Detect errors before aggregation or modeling.

Steps

  1. Select a column.

  2. Open Transforms > Functions > Is Mismatched.

  3. Provide the datatype to validate against.

  4. Click Submit.

Is Missing

Checks if column values are missing or null, returning Boolean results.

Best Situations to Use

  • Identify incomplete rows or columns.

  • Generate flags for missing data handling.

Steps

  1. Select one or more columns with potential missing values.

  2. Open Transforms > Functions > Is Missing.

  3. Click Submit.

Is NULL

The Is NULL transform checks whether a column contains null values and returns a Boolean result (True if the cell is null, False otherwise).

Best Situations to Use

  • Identify missing or incomplete data in datasets.

  • Generate flags for null values to handle them in further transformations or analysis.

  • Validate data quality before aggregation, modeling, or reporting.

Steps

  1. Navigate to a dataset in the Data Preparation framework.

  2. Open Transforms > Functions > Is NULL.

  3. Select one or multiple columns (supports all data types).

  4. Click Submit.

Is Odd

Checks whether a column’s integer values are odd and returns a Boolean column (True if odd, False if even).

Best Situations to Use

  • Generate feature flags for numeric analyses.

  • Identify and separate odd vs. even values in integer columns.

Steps

  1. Select an integer column.

  2. Open Transforms > Functions > Is Odd.

  3. Click Submit.

Is Valid

The Is Valid transform checks whether the values in a column match a specified datatype and flags them as valid or invalid.

Best Situations to Use

  • Validate the datatype consistency in a column.

  • Detect and flag erroneous or inconsistent data.

Steps

  1. Select a column.

  2. Open Transforms > Functions > Is Valid.

  3. Provide the datatype to validate against.

  4. Click Submit.

Log of Column

Calculates the natural logarithm (ln) of a numeric column.

Best Situations to Use

  • Transform skewed numeric distributions for analysis.

  • Prepare features for statistical modeling or ML algorithms.

Steps

  1. Select a numeric column.

  2. Open Transforms > Functions > Log of Column.

  3. Enable Create New Column (optional).

  4. Click Submit.

Note: Formula format: ln(value), e.g., ln(100).

Merge Long Tail Values

Merges values that occur below a threshold into a single value. Useful for reducing sparsity in categorical columns.

Best Situations to Use

  • Group rare categories for analysis or modeling.

  • Simplify categorical variables with many low-frequency values.

Steps

  1. Select a column.

  2. Open Transforms > Functions > Merge Long Tail Values.

  3. Specify Count Threshold and Value to Merge.

  4. Click Submit.

Normalization

Normalization is scaling a measure or variable to a common range to facilitate comparisons or analysis. It is commonly used when dealing with measures with different units or scales to bring them consistency. It scales numeric columns to a common range for consistent comparisons.

Best Situations to Use

  • Compare features with different scales or units.

  • Prepare numeric data for ML algorithms requiring standardized input.

Methods

There are various methods to normalize measures, and the choice of method depends on the nature of the data and the specific requirements of the analysis. Here are a few commonly used normalization techniques:

1. Min-Max Normalization

  • Formula: (x - min) / (max - min)

  • Rescales values to [0,1].

2. Z-Score Normalization (Standardization)

  • Formula: (x - mean) / standard deviation

  • Standardizes data with the mean 0 and standard deviation 1.

Steps

  1. Select one or more numeric columns.

  2. Open Transforms > Functions > Normalization.

  3. Choose a Normalization Method: Min-Max or Z-Score.

  4. Click Submit.

Note: Min-Max retains relative ordering; Z-Score may produce negative values.

PI

Generates a column containing the mathematical constant π (pi) to 15 decimal places, optionally multiplied by a specified value.

Best Situations to Use

  • Create mathematical constants for calculations or derived columns.

  • Generate features involving circle, geometry, or trigonometry computations.

Steps

  1. Select a column (optional multiplier).

  2. Open Transforms > Functions > PI.

  3. Provide New Column Name.

  4. Enter a value to multiply by π (optional).

  5. Click Submit.

Power of Column

Raises numeric column values to a specified power.

Best Situations to Use

  • Transform numeric data for scaling or feature engineering.

  • Amplify or attenuate values for analysis.

Steps

  1. Select a numeric column.

  2. Open Transforms > Functions > Power of Column.

  3. Enable Create New Column (optional).

  4. Enter the power value.

  5. Click Submit.

Radians to Degree

Converts values in radians to degrees using the formula degrees = radians × 57.2957795.

Best Situations to Use

  • Transform angle measurements for analysis or visualization.

  • Prepare features for geospatial or trigonometric computations.

Steps

  1. Select numeric columns containing radians.

  2. Open Transforms > Functions > Radians to Degree.

  3. Enable Create New Column (optional).

  4. Click Submit.

Round Values for Columns

Rounds numeric values to the nearest integer or specified precision.

Best Situations to Use

  • Simplify numeric columns for reporting or grouping.

  • Prepare numeric features for discrete analysis or modeling.

Steps

  1. Select numeric column(s).

  2. Open Transforms > Functions > Round Values.

  3. Specify precision.

  4. Click Submit.

Sign of Columns

Determines the sign of numeric values:

  • Positive → 1

  • Negative → -1

  • Zero → 0

Best Situations to Use

  • Feature engineering for numeric columns to classify the sign.

  • Data analysis or modeling requiring positive/negative indicators.

Steps

  1. Select numeric column(s).

  2. Open Transforms > Functions > Sign of Columns.

  3. Click Submit.

Simplify Text

Performs text standardization and processing with the following options:

  1. Normalize text: Lowercase, remove punctuation/accents, Unicode NFD normalization.

  2. Stem words: Reduce words to their grammatical root.

  3. Stop words: Remove common words like “the,” “a,” etc.

  4. Sort words alphabetically: Standardize the order of words.

Best Situations to Use

  • Preprocess text for NLP, search, or matching.

  • Standardize textual data for analysis or deduplication.

Steps

  1. Select a string column.

  2. Open Transforms > Functions > Simplify Text.

  3. Provide New Column Name.

  4. Select desired options.

  5. Click Submit.

Split Email

Splits an email address into the local part and the domain part.

Best Situations to Use

  • Extract username and domain for analysis.

  • Prepare features for email-related metrics or validation.

Steps

  1. Select a column containing email addresses.

  2. Open Transforms > Functions > Split Email.

  3. Click Submit.

Note: Invalid email addresses may cause errors.

Split HTTP Query String

Splits a URL’s query parameters into separate columns. Each key-value pair becomes a new column prefixed by the specified output column prefix.

Best Situations to Use

  • Extract parameters from web URLs for analytics.

  • Prepare feature columns from query strings for modeling or reporting.

Steps

  1. Select a URL column with query parameters.

  2. Open Transforms > Functions > Split HTTP Query String.

  3. Provide a prefix for output columns.

  4. Click Submit.

Example: Prefix SplitURL_SplitURL_modal, SplitURL_selectIssue.

Split Invalid

Identifies and extracts invalid or inconsistent values from a column.

Best Situations to Use

  • Detect invalid entries in numeric or string columns.

  • Separate inconsistent data for cleaning or reporting.

Steps

  1. Select a column.

  2. Open Transforms > Functions > Split Invalid.

  3. Configure:

    • Data Type

    • New Column Name

  4. Click Submit.

Split URL

Splits a URL into its components: scheme, domain, path, and query string. Columns are prefixed by the input column name.

Best Situations to Use

  • Parse URL data for analytics or reporting.

  • Extract host, path, or query parameters as separate features.

Steps

  1. Select a URL column.

  2. Open Transforms > Functions > Split URL.

  3. Click Submit.

  • URL_scheme

  • URL_domain

  • URL_path

  • URL_querystring

Note: Columns are prefixed by the input column name.

Sqrt of Column

Computes the square root of numeric column values.

Best Situations to Use

  • Transform skewed distributions.

  • Prepare features for mathematical modeling.

Steps

  1. Select a numeric column.

  2. Open Transforms > Functions > Sqrt of Column.

  3. Enable Create New Column (optional).

  4. Click Submit.

Trigonometric Functions

Applies trigonometric transformations to numeric columns. Supported functions: SIN, COS, TAN, SINH, COSH, TANH.

Best Situations to Use

  • Transform angle measurements or trigonometric features.

  • Generate derived features for analytics or modeling.

Steps

  1. Select numeric column(s).

  2. Open Transforms > Functions > Trigonometric Functions.

  3. Choose a method (SIN, COS, TAN, SINH, COSH, TANH).

  4. Click Submit.

Vectorization

Converts text columns into numeric vectors for machine learning or analytical models. Methods include Count Vectorizer and TFIDF Vectorizer.

Best Situations to Use

  • Prepare text data for NLP or ML models.

  • Transform descriptive text into feature vectors for analysis.

Steps

  1. Select a text column.

  2. Open Transforms > Functions > Vectorization.

  3. Select a Vectorization method: Count or TFIDF.

  4. Click Submit.

Example: "The sun is shining" → Count Vectorizer: [0,0,0,0,0,1,0,...] TFIDF Vectorizer: [0,0,0,0,0,0.4278,0,...

Last updated