Advanced
Last updated
Last updated
Find out the clusters based on the pronunciation sound and edit the bulk data in a single click.
Check out the illustration on the Cluster & Edit transform.
Steps to perform the Cluster & Edit transform.
Select a column from the given dataset.
Open the Transforms tab.
Select the Cluster and Edit transform from the Advanced category.
The Cluster & Edit window opens.
The Method drop-down uses the Soundex phonetic algorithm for indexing names by sound as pronounced in English.
The Values found column lists number of values found from the data set related to a specific sound. E.g., In the given image the Values found display 5 categories.
The Replace Value column lists the anticipated replacements of the values found.
Select a value by using the checkbox that needs to be modified or changed. E.g., the 'tesi', 'test', and 'test3' are selected in the given example.
Search for a replace value or enter a value that you wish to be used as a replace value using the drop-down menu from the Replace Value column. For Example,
Click the Submit option.
The selected values from the column get modified in the data set.
This transform helps to execute expressions.
Check out the given illustration to understand the Expression Editor transform.
Steps to perform the Expression Editor transform:
Navigate to a Dataset within the Data Preparation framework.
Navigate to the Transforms tab.
Open the Expression Editor from the Advanced transforms.
The Expression Editor window opens displaying the following columns:
Functions: The first column contains functions for the user to search for a function. By using the double clicks on a function, it gets added to the given space provided for creating a formula.
Columns: The second column lists all the column names available in the selected dataset.
The Formula space is provided to create and execute various formulas/ executions. Click on a Formula name to get the expression on the right side
Use either of the following ways to consume the created expression or formula in the dataset.
Update a selected column by using the Update column option. The selected column will be updated with the chosen expression.
Create a new column with the created expression by selecting the Create a New Column option.
Provide a column name for the New Column.
Click the Submit option to either add a new column or update the selected column based on the executed formula/ expression.
The recently created or updated column with Formula gets added to the dataset.
Anomaly detection is used to identify any anomaly present in the data. i.e., Outlier. Instead of looking for usual points in the data, it looks for any anomaly. It uses the Isolation Forest algorithm.
Check out the given walk-through on the Find Anomaly transform.
Steps to perform the Find Anomaly transform:
Select a dataset within the Data Preparation framework.
Navigate to the Transforms tab.
Select the Find Anomaly transform from the ADVANCED category.
The Find Anomaly window opens.
Configure the following information:
Select Feature Columns: Select one or more columns where you want to find the anomaly.
Maximum Sample Size: The Isolation Forest algorithm takes the training data of a given sample size to find the normal value in the dataset.
Contamination (%): It is the percentage of observations we believe to be outliers. It varies from 0 to 1 (both inclusive).
Anomaly Flag Name: The result is either -1 or 1. 1 means the data is standard, and -1 means the data is an outlier. This information gets stored in the new column with the anomaly flag name.
Click the Submit option after the required details are provided.
The anomaly gets flagged under the column that has been named using the Anomaly Flag Name option.
Please Note: The other needed parameters such as Estimators and seed values are considered based on their default values to run the Isolation Forest logic on the selected dataset sample.
This transform helps to perform SQL queries. The user can customize the data with the
Check out the illustration on the SQL Transform.
Steps to perform the SQL transform:
Select a dataset within the Data Preparation framework.
Navigate to the Transforms tab.
Open the SQL Transform from the Advanced transforms.
The SQL Editor page opens displaying the Functions and Columns from the selected dataset.
Search a function and click it to get the default syntax suggestion in the text space and add it to the text space provided for writing a query.
Select the columns from the Columns list.
Write a query sentence with valid syntax by selecting an SQL function and related columns from the dataset. The displayed example may help to suggest a valid SQL query syntax.
Click the Submit option.
Please Note: Function syntax and small examples are displayed at the bottom of the window with the double-clicks on the function name.
The SQL query gets applied to the Data Set and based on the query the displayed dataset will be customized.
Please Note: The SQL Transform & Expression Editor supports only Pandas SQL Queries.