Data Center
  • Data Center
    • Homepage
    • Data Connectors
      • Create Data Connector
      • Data Connector List
        • Edit Data Connectors
        • Create Option
        • Reconnecting to a Data Connector
        • Sharing a Data Connector
        • Delete a Data Connector
      • Supported Data Connectors
        • Database Connectors
          • MySQL
          • MSSQL
          • Elastic (Beta Release)
          • Oracle
          • ClickHouse
          • Athena
          • Arango DB
          • Hive
          • Cassandra
          • MongoDB
          • MongoDB for BI
          • PostgreSQL
          • Snowflake
          • Pinot
        • File Data Connector
        • API Connectors
          • API Connector
          • Amazon
          • App Store
          • Bing Ads
          • Dropbox
          • FTP Server
          • Facebook
          • Facebook Ads
          • Firebase DB
          • Fitbit
          • Flipkart
          • Google AdWords
          • Google Analytics
          • Google Big Query
          • Google Forms
          • Google Sheet
          • HubSpot
          • Jira
          • Lead Squared
          • Linkedin
          • Linkedin Ads
          • MS Dynamics
          • Mailchimp
          • QuickBooks
          • SalesForce
          • ServiceNow
          • Twitter
          • Twitter Ads
          • Yelp
          • YouTube
          • ZOHO Books
        • Others
          • MS Sql Olap
          • Data Store
          • OData
          • Spark SQL
          • AWS Redshift
          • SAP HANA
    • Data Sets
      • Creating a New Data Set
        • Creating a New Data Set using RDBMS Connector
        • Creating a Data Set using Arango DB Connector
        • Creating a Data Set using the Pinot DB Connector
        • Creating a Data Set using an API Connector
        • Creating a New FTP Data Set
        • Creating a Data Set based on an Elastic Connector
      • Data Set List
        • View Options: Data Sets List Page
        • Data Set List: Actions
          • Reset Filter Option
          • Editing a Data Set
          • Sharing a Data Set
          • Publishing a Data Set
          • Push to VCS
          • Pull from VCS
          • Deleting a Data Set
          • Data Preparation
    • Data Stores
      • Creating a New Data Store
        • Data Store using an RDBMS Connector
        • Data Store using an API Data Connector
      • Data Stores List
        • Edit a Data Store
        • Refresh Data for a Data Store
        • Store Info
        • Sharing a Data Store
        • Adding Synonyms to a Data Store
        • Refresh Synonyms
        • Push to VCS
        • Pull from VCS
        • Delete a Data Store
    • Data Store Meta Data
      • Creating a New Meta Data Store
      • Data Store Meta Data List
        • Editing Meta Data Store
        • Store Details
        • Adding Synonyms to Meta Data Store
        • Refresh Synonyms
        • Sharing a Data Store Metadata
        • Deleting Meta Data Store
    • Data Sheets
      • Creating a New Data Sheet
      • Editing a Data Sheet
      • Refresh Data
      • Data Sheet Info
      • Publishing a Data Sheet
        • Entering Data
        • Applying Filter
        • Deleting a Row
      • Removing a Data Sheet
    • Data Sandbox
      • Creating a New Data Sandbox
      • Data Sandbox List
        • Upload File Status
        • Using the Data Preparation Option
        • Deleting a Data Sandbox
        • Create Data Store
        • Reupload
        • Preview
        • Create Datasheet
    • Data as API
    • Data Preparation
      • Accessing the Data Preparation Option
      • Data Preparation Workspace
        • Data Preparation Landing Page
        • Profile Tab
        • Transforms
          • Advanced
          • Anonymization
          • Columns
          • Conversions
          • Data Cleansing
          • Dates
          • Functions
          • Integer
          • ML
          • Numbers
          • String
        • Steps
      • Data Preparation List
        • Rename
        • Edit
        • Delete
Powered by GitBook
On this page
  • Cluster & Edit
  • Expression Editor
  • Find Anomaly
  • SQL Transform
Export as PDF
  1. Data Center
  2. Data Preparation
  3. Data Preparation Workspace
  4. Transforms

Advanced

PreviousTransformsNextAnonymization

Last updated 3 months ago

Cluster & Edit

Find out the clusters based on the pronunciation sound and edit the bulk data in a single click.

Check out the illustration on the Cluster & Edit transform.

Steps to perform the Cluster & Edit transform.

  • Select a column from the given dataset.

  • Open the Transforms tab.

  • Select the Cluster and Edit transform from the Advanced category.

  • The Cluster & Edit window opens.

  • The Method drop-down uses the Soundex phonetic algorithm for indexing names by sound as pronounced in English.

  • The Values found column lists number of values found from the data set related to a specific sound. E.g., In the given image the Values found display 5 categories.

  • The Replace Value column lists the anticipated replacements of the values found.

  • Select a value by using the checkbox that needs to be modified or changed. E.g., the 'tesi', 'test', and 'test3' are selected in the given example.

  • Search for a replace value or enter a value that you wish to be used as a replace value using the drop-down menu from the Replace Value column. For Example,

  • Click the Submit option.

  • The selected values from the column get modified in the data set.

Expression Editor

This transform helps to execute expressions.

Check out the given illustration to understand the Expression Editor transform.

Steps to perform the Expression Editor transform:

  • Navigate to a Dataset within the Data Preparation framework.

  • Navigate to the Transforms tab.

  • Open the Expression Editor from the Advanced transforms.

  • The Expression Editor window opens displaying the following columns:

    • Functions: The first column contains functions for the user to search for a function. By using the double clicks on a function, it gets added to the given space provided for creating a formula.

    • Columns: The second column lists all the column names available in the selected dataset.

  • The Formula space is provided to create and execute various formulas/ executions. Click on a Formula name to get the expression on the right side

  • Use either of the following ways to consume the created expression or formula in the dataset.

    • Update a selected column by using the Update column option. The selected column will be updated with the chosen expression.

    • Create a new column with the created expression by selecting the Create a New Column option.

      • Provide a column name for the New Column.

  • Click the Submit option to either add a new column or update the selected column based on the executed formula/ expression.

  • The recently created or updated column with Formula gets added to the dataset.

Find Anomaly

Anomaly detection is used to identify any anomaly present in the data. i.e., Outlier. Instead of looking for usual points in the data, it looks for any anomaly. It uses the Isolation Forest algorithm.

Check out the given walk-through on the Find Anomaly transform.

Steps to perform the Find Anomaly transform:

  • Select a dataset within the Data Preparation framework.

  • Navigate to the Transforms tab.

  • Select the Find Anomaly transform from the ADVANCED category.

  • The Find Anomaly window opens.

  • Configure the following information:

    • Select Feature Columns: Select one or more columns where you want to find the anomaly.

    • Maximum Sample Size: The Isolation Forest algorithm takes the training data of a given sample size to find the normal value in the dataset.

    • Contamination (%): It is the percentage of observations we believe to be outliers. It varies from 0 to 1 (both inclusive).

    • Anomaly Flag Name: The result is either -1 or 1. 1 means the data is standard, and -1 means the data is an outlier. This information gets stored in the new column with the anomaly flag name.

  • Click the Submit option after the required details are provided.

  • The anomaly gets flagged under the column that has been named using the Anomaly Flag Name option.

Please Note: The other needed parameters such as Estimators and seed values are considered based on their default values to run the Isolation Forest logic on the selected dataset sample.

SQL Transform

This transform helps to perform SQL queries. The user can customize the data with the

Check out the illustration on the SQL Transform.

Steps to perform the SQL transform:

  • Select a dataset within the Data Preparation framework.

  • Navigate to the Transforms tab.

  • Open the SQL Transform from the Advanced transforms.

  • The SQL Editor page opens displaying the Functions and Columns from the selected dataset.

  • Search a function and click it to get the default syntax suggestion in the text space and add it to the text space provided for writing a query.

  • Select the columns from the Columns list.

  • Write a query sentence with valid syntax by selecting an SQL function and related columns from the dataset. The displayed example may help to suggest a valid SQL query syntax.

  • Click the Submit option.

Please Note: Function syntax and small examples are displayed at the bottom of the window with the double-clicks on the function name.

  • The SQL query gets applied to the Data Set and based on the query the displayed dataset will be customized.

Please Note: The SQL Transform & Expression Editor supports only Pandas SQL Queries.

Cluster & Edit Transform
Expression Editor Transform
Find Anomaly
SQL Transform