Function Transforms
Function Transforms apply a mathematical or programmatic function to data to convert its format or structure, making it ready for processing.
The Functions transform enables users to execute advanced operations on dataset columns beyond basic manipulation. This includes:
Encoding: Converting categorical or text data into numerical formats (e.g., one-hot encoding) for use in machine learning models.
Counting: Calculating frequency or count statistics for values within a column.
Geospatial Point Generation: Creating latitude and longitude points or other geographical data from existing location information for spatial analysis.
Essentially, these transforms leverage pre-defined computational functions to enrich the dataset and prepare it for complex analytical or modeling tasks.
Base64 Decode
The Base64 Decode transform converts Base64-encoded text into its original string format. The output type is String.
Best Situations to Use
Decode Base64-encrypted values in string columns.
Convert encoded data for analysis or reporting.
Steps
Select a string column containing Base64 data.
Open Transforms > Functions > Base64 Decode.
Enable Create New Column (optional).
Provide a New Column Name.
Click Submit.
Base64 Encode
The Base64 Encode transform converts input values to Base64-encoded text, optionally padded with =
. The input can be of any type, and the output is a String.
Best Situations to Use
Encode sensitive or textual data for secure transmission or storage.
Prepare data for systems requiring Base64 input.
Steps
Select a string column.
Open Transforms > Functions > Base64 Encode.
Enable Create New Column (optional).
Provide a New Column Name.
Click Submit.
Count Distinct
The Count Distinct transform calculates the number of unique values in a column, optionally grouped by other columns.
Best Situations to Use
Determine the distinct count of categorical values.
Summarize unique entries by group for aggregation or reporting.
Steps
Select a dataset column.
Open Transforms > Functions > Count Distinct.
Provide a New Column Name.
Select Group By Columns (optional).
Click Submit.
Count Occurrences
The Count Occurrences transform counts the occurrences of a specified word, letter, or digit in a string column.
Best Situations to Use
Analyze text patterns within a dataset.
Generate features for text analytics or NLP.
Steps
Select a string column.
Open Transforms > Functions > Count Occurrences.
Provide a New Column Name.
Enter the Value to count.
Enable Ignore Case if required.
Click Submit.
Create a Geopoint from Latitude and Longitude
The Create Geopoint transform generates a geographical point column from latitude and longitude columns. The output format is POINT(Longitude Latitude)
.
Best Situations to Use
Prepare datasets for geospatial analysis.
Generate coordinates for mapping or location-based analytics.
Steps
Select a dataset containing latitude and longitude columns.
Open Transforms > Functions > Create Geopoint from Latitude and Longitude.
Select the Latitude column.
Select the Longitude column.
Provide a New Column Name.
Click Submit.
Distance Between Geo Points
Calculates the distance between two sets of latitude and longitude points. Results can be returned in kilometers or miles.
Best Situations to Use
Calculate distances for geospatial analysis or mapping applications.
Prepare features for routing, logistics, or location-based analytics.
Steps
Select a dataset with two sets of latitude and longitude columns.
Open Transforms > Functions > Distance Between Geo Points.
Configure:
Latitude and Longitude columns for both points
Unit: Kilometers or Miles
New Column Name
Click Submit.
Exact
The Exact transform performs exact matching between two columns and outputs a Boolean result.
Best Situations to Use
Identify duplicate or matching values across columns.
Verify data consistency for data quality or integration tasks.
Steps
Select two columns to compare.
Open Transforms > Functions > Exact.
Provide:
New Column Name
Enable Ignore Case (optional)
Click Submit.
Extract Data with GROK
Extracts structured data from text columns using GROK patterns or regular expressions. Output is a JSON-like structure.
Best Situations to Use
Parse log files, semi-structured text, or complex strings.
Extract multiple fields from single text entries for analytics.
Steps
Select the column containing GROK-patterned text.
Open Transforms > Functions > Extract Data with GROK.
Provide:
New Column Name
GROK Pattern (e.g.,
%{IPORHOST:clientip}--[%{HTTPDATE:timestamp}]
)
Click Submit.
Example Output:
{
"clientip": "192.168.1.1",
"timestamp": "20/Jul/2023:10:00:00 +0000",
"method": "GET",
"request": "/index.html",
"httpversion": "1.1",
"response": "200"
}
Extract Latitude and Longitude from a Geopoint
Extracts latitude and longitude values from a Geo point column.
Best Situations to Use
Split Geo point columns for geospatial calculations or mapping.
Generate separate latitude/longitude features for modeling or visualization.
Steps
Select a Geo point column.
Open Transforms > Functions > Extract Latitude and Longitude from a Geopoint.
Click Submit.
Extract N-grams
Generates contiguous sequences of words (N-grams) from a text column. Useful for text analysis and NLP feature creation.
Best Situations to Use
Feature engineering for text analytics or machine learning.
Analyze word sequences in descriptive or narrative columns.
Steps
Select a string/text column.
Open Transforms > Functions > Extract Ngrams.
Provide:
New Column Name
Size: Number of words in each N-gram
Click Submit.
Example: For a size of 2, the text “data analysis workflow” → [data analysis, analysis workflow]
.
Extract User Agent Information
Parses a browser’s User-Agent string to extract information into separate columns: device_family
, browser_family
, browser_version
, os
, os_version
.
Best Situations to Use
Analyze browser or device usage patterns.
Enrich web analytics datasets with structured device information.
Steps
Select a column containing User-Agent strings.
Open Transforms > Functions > Extract User Agent Information.
Click Submit.
Extract with Regular Expression (Regex)
Extracts specific patterns or substrings from textual data using regular expressions.
Best Situations to Use
Parse emails, phone numbers, or structured strings.
Extract multiple fields from semi-structured text for analytics.
Steps
Select a text column.
Open Transforms > Functions > Extract with Regular Expression.
Provide:
New Column Name
Regex Pattern (e.g.,
[\w.-]+@[\w.-]+\.\w+
)
Click Submit.
Example: "Reach out at [email protected]"
→ ["[email protected]"]
.
Formula-Based Transform
Applies mathematical operations on numeric columns to generate a new calculated column.
Best Situations to Use
Create derived metrics from numeric columns.
Perform custom calculations not available through standard transforms.
Steps
Open Transforms > Functions > Formula-Based Transform.
Select numeric columns from the Select Columns drop-down.
Create a formula using the selected columns.
Provide a New Column Name.
Click Submit.
Generate Numerical Combination Transformation
Generates all possible combinations of a set of numbers without repetition from a list column.
Best Situations to Use
Feature engineering for combinatorial analysis.
Generate permutation-based datasets for testing or modeling.
Steps
Select a list-type column.
Open Transforms > Functions > General Numerical Combination Transformation.
Provide New Column Name and Combination Length.
Click Submit.
Get URL Parameters
Extracts query parameters from a URL column into a JSON object with key-value pairs.
Best Situations to Use
Parse URL query strings for analytics, tracking, or feature extraction.
Steps
Select a URL or String column.
Open Transforms > Functions > Get URL Parameters.
Click Submit.
Result: A new column is added containing a JSON object of extracted parameters.
Example: http://example.com?color=blue&shape=square
→ {"color": "blue", "shape": "square"}
If Missing
Handles missing values by replacing empty cells with a specified value or column value.
Best Situations to Use
Standardize incomplete datasets before analysis or modeling.
Fill missing categorical or numeric values consistently.
Steps
Select a column with missing values.
Open Transforms > Functions > If Missing.
Configure:
Enable Create New Column (optional).
Select the Columns to apply.
Specify the value to fill missing cells.
Click Submit.
If NULL
The If NULL transform replaces null values in a column with a specified value or creates a new column for the transformed output.
Best Situations to Use
Fill null or missing values in a column consistently.
Prepare datasets for analysis or modeling where nulls are not allowed.
Steps
Select a column containing null values.
Open Transforms > Functions > If NULL.
Enable Create New Column (optional).
Specify the value to fill null cells.
Click Submit.
If Then Else
Applies conditional logic to return values based on a test expression:
Executes true_expression if the condition is true
Executes false_expression if the condition is false
Best Situations to Use
Apply custom conditional logic for categorical or numeric columns.
Generate derived columns based on logical rules.
Steps
Select a column (binary or categorical).
Open Transforms > Functions > If Then Else.
Provide:
New Column Name
Logical Expression (e.g.,
ColumnName == "Value"
for strings,ColumnName == 5
for numbers)Then Value
Else Value
Click Submit.
Impute with Computed Values
Fills missing or empty values using computed statistics such as mean, median, or mode.
Best Situations to Use
Handle missing numeric or categorical data before analysis or modeling.
Standardize missing values consistently across a dataset.
Steps
Select a column with numeric values.
Open Transforms > Functions > Impute with Computed Values.
Enable Create New Column (optional).
Provide New Column Name.
Select a Computed Value Method (Mean, Median, Mode).
Click Submit.
IN
Checks whether column values exist within a specified list and returns a Boolean result.
Best Situations to Use
Filter or flag rows that match a list of target values.
Generate Boolean features for modeling.
Steps
Select a column.
Open Transforms > Functions > IN.
Provide:
New Column Name
Values (comma-separated)
Click Submit.
Is Even
Checks if an integer value is even.
Best Situations to Use
Feature engineering for numerical columns.
Create Boolean flags for even/odd categorization.
Steps
Select an integer column.
Open Transforms > Functions > Is Even.
Click Submit.
Is Mismatched
Tests whether a column’s values do not match a specified datatype, returning Boolean results.
Best Situations to Use
Validate datatype consistency in datasets.
Detect errors before aggregation or modeling.
Steps
Select a column.
Open Transforms > Functions > Is Mismatched.
Provide the datatype to validate against.
Click Submit.
Is Missing
Checks if column values are missing or null, returning Boolean results.
Best Situations to Use
Identify incomplete rows or columns.
Generate flags for missing data handling.
Steps
Select one or more columns with potential missing values.
Open Transforms > Functions > Is Missing.
Click Submit.
Is NULL
The Is NULL transform checks whether a column contains null values and returns a Boolean result (True
if the cell is null, False
otherwise).
Best Situations to Use
Identify missing or incomplete data in datasets.
Generate flags for null values to handle them in further transformations or analysis.
Validate data quality before aggregation, modeling, or reporting.
Steps
Navigate to a dataset in the Data Preparation framework.
Open Transforms > Functions > Is NULL.
Select one or multiple columns (supports all data types).
Click Submit.
Is Odd
Checks whether a column’s integer values are odd and returns a Boolean column (True
if odd, False
if even).
Best Situations to Use
Generate feature flags for numeric analyses.
Identify and separate odd vs. even values in integer columns.
Steps
Select an integer column.
Open Transforms > Functions > Is Odd.
Click Submit.
Is Valid
The Is Valid transform checks whether the values in a column match a specified datatype and flags them as valid or invalid.
Best Situations to Use
Validate the datatype consistency in a column.
Detect and flag erroneous or inconsistent data.
Steps
Select a column.
Open Transforms > Functions > Is Valid.
Provide the datatype to validate against.
Click Submit.
Log of Column
Calculates the natural logarithm (ln) of a numeric column.
Best Situations to Use
Transform skewed numeric distributions for analysis.
Prepare features for statistical modeling or ML algorithms.
Steps
Select a numeric column.
Open Transforms > Functions > Log of Column.
Enable Create New Column (optional).
Click Submit.
Merge Long Tail Values
Merges values that occur below a threshold into a single value. Useful for reducing sparsity in categorical columns.
Best Situations to Use
Group rare categories for analysis or modeling.
Simplify categorical variables with many low-frequency values.
Steps
Select a column.
Open Transforms > Functions > Merge Long Tail Values.
Specify Count Threshold and Value to Merge.
Click Submit.
Normalization
Normalization is scaling a measure or variable to a common range to facilitate comparisons or analysis. It is commonly used when dealing with measures with different units or scales to bring them consistency. It scales numeric columns to a common range for consistent comparisons.
Best Situations to Use
Compare features with different scales or units.
Prepare numeric data for ML algorithms requiring standardized input.
Methods
There are various methods to normalize measures, and the choice of method depends on the nature of the data and the specific requirements of the analysis. Here are a few commonly used normalization techniques:
1. Min-Max Normalization
Formula:
(x - min) / (max - min)
Rescales values to
[0,1]
.
2. Z-Score Normalization (Standardization)
Formula:
(x - mean) / standard deviation
Standardizes data with the mean
0
and standard deviation1
.
Steps
Select one or more numeric columns.
Open Transforms > Functions > Normalization.
Choose a Normalization Method: Min-Max or Z-Score.
Click Submit.
PI
Generates a column containing the mathematical constant π (pi) to 15 decimal places, optionally multiplied by a specified value.
Best Situations to Use
Create mathematical constants for calculations or derived columns.
Generate features involving circle, geometry, or trigonometry computations.
Steps
Select a column (optional multiplier).
Open Transforms > Functions > PI.
Provide New Column Name.
Enter a value to multiply by π (optional).
Click Submit.
Power of Column
Raises numeric column values to a specified power.
Best Situations to Use
Transform numeric data for scaling or feature engineering.
Amplify or attenuate values for analysis.
Steps
Select a numeric column.
Open Transforms > Functions > Power of Column.
Enable Create New Column (optional).
Enter the power value.
Click Submit.
Radians to Degree
Converts values in radians to degrees using the formula degrees = radians × 57.2957795
.
Best Situations to Use
Transform angle measurements for analysis or visualization.
Prepare features for geospatial or trigonometric computations.
Steps
Select numeric columns containing radians.
Open Transforms > Functions > Radians to Degree.
Enable Create New Column (optional).
Click Submit.
Round Values for Columns
Rounds numeric values to the nearest integer or specified precision.
Best Situations to Use
Simplify numeric columns for reporting or grouping.
Prepare numeric features for discrete analysis or modeling.
Steps
Select numeric column(s).
Open Transforms > Functions > Round Values.
Specify precision.
Click Submit.
Sign of Columns
Determines the sign of numeric values:
Positive → 1
Negative → -1
Zero → 0
Best Situations to Use
Feature engineering for numeric columns to classify the sign.
Data analysis or modeling requiring positive/negative indicators.
Steps
Select numeric column(s).
Open Transforms > Functions > Sign of Columns.
Click Submit.
Simplify Text
Performs text standardization and processing with the following options:
Normalize text: Lowercase, remove punctuation/accents, Unicode NFD normalization.
Stem words: Reduce words to their grammatical root.
Stop words: Remove common words like “the,” “a,” etc.
Sort words alphabetically: Standardize the order of words.
Best Situations to Use
Preprocess text for NLP, search, or matching.
Standardize textual data for analysis or deduplication.
Steps
Select a string column.
Open Transforms > Functions > Simplify Text.
Provide New Column Name.
Select desired options.
Click Submit.
Split Email
Splits an email address into the local part and the domain part.
Best Situations to Use
Extract username and domain for analysis.
Prepare features for email-related metrics or validation.
Steps
Select a column containing email addresses.
Open Transforms > Functions > Split Email.
Click Submit.
Split HTTP Query String
Splits a URL’s query parameters into separate columns. Each key-value pair becomes a new column prefixed by the specified output column prefix.
Best Situations to Use
Extract parameters from web URLs for analytics.
Prepare feature columns from query strings for modeling or reporting.
Steps
Select a URL column with query parameters.
Open Transforms > Functions > Split HTTP Query String.
Provide a prefix for output columns.
Click Submit.
Example: Prefix SplitURL_
→ SplitURL_modal
, SplitURL_selectIssue
.
Split Invalid
Identifies and extracts invalid or inconsistent values from a column.
Best Situations to Use
Detect invalid entries in numeric or string columns.
Separate inconsistent data for cleaning or reporting.
Steps
Select a column.
Open Transforms > Functions > Split Invalid.
Configure:
Data Type
New Column Name
Click Submit.
Split URL
Splits a URL into its components: scheme, domain, path, and query string. Columns are prefixed by the input column name.
Best Situations to Use
Parse URL data for analytics or reporting.
Extract host, path, or query parameters as separate features.
Steps
Select a URL column.
Open Transforms > Functions > Split URL.
Click Submit.
URL_scheme
URL_domain
URL_path
URL_querystring
Sqrt of Column
Computes the square root of numeric column values.
Best Situations to Use
Transform skewed distributions.
Prepare features for mathematical modeling.
Steps
Select a numeric column.
Open Transforms > Functions > Sqrt of Column.
Enable Create New Column (optional).
Click Submit.
Trigonometric Functions
Applies trigonometric transformations to numeric columns. Supported functions: SIN, COS, TAN, SINH, COSH, TANH.
Best Situations to Use
Transform angle measurements or trigonometric features.
Generate derived features for analytics or modeling.
Steps
Select numeric column(s).
Open Transforms > Functions > Trigonometric Functions.
Choose a method (SIN, COS, TAN, SINH, COSH, TANH).
Click Submit.
Vectorization
Converts text columns into numeric vectors for machine learning or analytical models. Methods include Count Vectorizer and TFIDF Vectorizer.
Best Situations to Use
Prepare text data for NLP or ML models.
Transform descriptive text into feature vectors for analysis.
Steps
Select a text column.
Open Transforms > Functions > Vectorization.
Select a Vectorization method: Count or TFIDF.
Click Submit.
Example: "The sun is shining"
→
Count Vectorizer: [0,0,0,0,0,1,0,...]
TFIDF Vectorizer: [0,0,0,0,0,0.4278,0,...
Last updated