Columns

The below-mentioned transforms are available under the Columns category:

Add Blank Column

This transform will create a new blank column in the selected data set. The datatype for the newly added blank column will be as None.

  • Navigate to the Data Preparation page displaying data grid.

  • Open the Transforms tab.

  • Select the Add Blank Column transform from the Column transforms type.

  • Provide the required details:

    • New Column Name

    • Position (of the blank column)

  • Click the Submit option.

  • The empty column gets added in the dataset (at the selected position with the provided name).

Please Note:

  • The user can update values in this column by using the transforms like Fill empty, Search & Replace.

  • The Add Blank Column transform can also be accessed by using the menu icon provided for the each column in the Data Grid display of the dataset.

Cast to Types

It is a table-based operation. The profiling of a column is done based on the data type present in the majority.

Let’s say in column A; there are four integer values and one string value, then the data type of the column gets profiled as the integer despite one string value present in it.

The Cast to Types transform removes the value with the invalid data type. In this case, it converts data with a string data type to the null value.

  • Navigate to the Data Preparation displaying data grid.

  • Select the field that has some invalid values in string data type.

  • Open the Transforms tab.

  • Click on the Cast to Types transform from the COLUMNS category of transforms.

  • The invalid values from the selected column gets removed.

Please Note:

  • There is a possibility of some data loss while applying this transform to any dataset.

  • The Caste to Types transform can also be accessed by using the menu icon provided for the each column in the Data Grid display of the dataset.

Change To String

This transformation helps to change the datatype of a selected column into a String.

  • Navigate to the Data Preparation displaying data grid.

  • Select the field that has some invalid values in string data type.

  • Open the Transform tab.

  • Navigate to the Columns type of transforms.

  • Click on the Change to String transform.

  • The data type of the selected column gets changed into String.

Please Note:

  • It will not support Date columns to change into String.

  • The Change To String transform can also be accessed by using the menu icon provided for the each column in the Data Grid display of the dataset.

Collect Set

The Collect Set transform generates the list of all the unique values of the column based on the selected column. It performs group concatenation.

  • Select a column from the dataset within the Data Preparation framework.

  • Navigate to the Transforms tab.

  • Select the Collect Set transform from the COLUMNS transforms.

  • Enable the Create new column to create a new column with the result of the transformation.

  • Select a Partitioning column using the drop-down menu. E.g., The selected Column is source.

  • Click the Submit option.

  • It generates a list of all unique values under the source_collect_set_1 as displayed in the below image:

Please Note: The Collect Set Transform option can also be accessed from menu icon provided for each column in the Data Grid display of the dataset.

Concatenate with

The users can concatenate a column value with some other column or with some prefix/suffix. To perform the transform, select the column to which data must be concatenated and select the Concatenate with transform. The available options are:

  • Select a column from the Data set.

  • Open the Transforms tab.

  • Select the Concatenate with transform from the COLUMNS category.

  • Prefix: Specify the value to be prefixed to the selected column value

  • Use with:

    • Select the Value to add a Prefix/Suffix

    • Select the Other column to concatenate two columns

  • Separator: Provide a separator to separate the targeted prefix or suffix from the original column values.

  • Add Separator: Select an option either Always or Both values not empty when you want to add separator.

  • Suffix: Provide a value to be used as a suffix to the selected column value.

  • Click the Submit option.

  • A new column gets added based on the configured details.

Please Note: The users must select the Use with Other Column option to concatenate a value with another column and select the Use with Value option to add prefix/suffix.

Delete All Other

The Delete All Other transform is used to delete all other columns than the selected column.

  • Select a column from the Data Grid.

  • Open the Transforms tab.

  • Open the COLUMNS transform category.

  • Select the Delete All Other transform.

  • All the other columns except the selected column get deleted.

Please Note: The Delete All Other transform can also be accessed by using the menu icon provided for the each column in the Data Grid display of the dataset.

Delete Column

The Delete Column deletes any selected column from the dataset.

  • Select a column from the dataset within the Data Preparation framework.

  • Navigate to the Transforms tab.

  • Select and click on the Delete Column transform from the COLUMNS transform category.

  • The selected column gets deleted from the dataset.

Please Note: The Delete Column transform can also be accessed by using the menu icon provided for the each column in the Data Grid display of the dataset.

Delete Columns Except

This transform is to delete the columns other than the selected ones.

  • Navigate to the Data Grid display of the dataset.

  • Open the Transforms tab.

  • Select COLUMNS transform type.

  • Select the Delete Column Except transform.

  • Click the drop-down icon provided for the Select Column option.

  • Use check boxes provided next to the column names to select those columns (You can select multiple columns).

  • The selected columns appear.

  • Click the Submit option to apply the transform.

  • The columns except the selected columns get removed from the Dataset.

Duplicate Columns

The Duplicate Columns transform creates another column containing the duplicate data of the selected column.

  • Select a column from the Dataset in the Data Preparation framework.

  • Open the Transforms tab.

  • Select and click the Duplicate transform from the COLUMNS category.

  • It duplicates the selected column data and inserts it in the dataset.

Please Note: The Duplicate Columns transform can also be accessed by using the menu icon provided for the each column in the Data Grid display of the dataset.

Ends with

The Ends with transform is ideal for matching value based on patterns for the String data type.

  • It returns true if the rightmost set of characters of a column of values matches a pattern.

  • The source value can be a String data type, and the pattern can be a Pattern, regular expression, or a string.

Steps to perform the Ends with transform:

  • Select a String column using the Data Grid.

  • Navigate to the Transforms tab.

  • Open the Columns transforms.

  • Select the Ends with transform.

  • Mode: Select a Mode by using the drop-down icon. The given choices for Mode are: REGEX and String.

    • If the selected Mode is REGEX then pass a specific values in the Value field by following the suggested patterns:

      • In Regex mode for a single column using single value for example, “MimiTolkin” → the user should pass '*kin$' (In the below-given image, it is passed as '.*ope$' for "Europe")

      • In Regex mode for a single column and using multiple values like “MimiTolkin” & “Burger” respectively then → .*(kin|ger)$

    • If the selected Mode is String then pass a specific value from the selected column. [E.g., "Mimitolkin"] (In the below-given image, it is passed as "Europe")

  • The Ignore Case option can be used to make string comparison case-insensitive.

  • Click the Submit option.

  • Result will come as True or False for a single column by checking the words in the column that ends with the given pattern (E.g., All the values that ends with the 'ope' pattern will display True in the newly added column with the REGEX mode. It will also display True for the rows with the "Europe" value with the String mode).

Fill Empty

The Fill Empty transform is used to fill the null/empty value of cell using either above or below values available in the column.

  • Select a column with empty rows from the dataset within the Data Preparation framework.

  • Open the Transforms tab.

  • Configure the Fill Empty transform from the COLUMNS category:

    • Create new column- Click the checkbox to create a new column or else the currently selected column gets updated.

    • Use with-The user can use either of the options from the provided choices:

      • From Above: To fill the empty cells and replace them by the value of the cells given above the empty cells.

      • From Below: To fill the empty cells and replace them by the value of the cells given below the empty cells.

    The Fill Empty transform has been applied to the Gender column by filling the empty cells with the values of the immediate below cells.

As a result, a new column gets created with some of the empty cells filled by the values of the immediate below given cells as shown in the below image:

Please Note: The Fill Empty transform fills the immediate empty cell with either above or below cell's values. It does not fill the cells for which both above and below cells are empty.

Generate Primary Key

It generates the primary key for the table. The user can see the Primary_column added as the first column in the selected Dataset.

Please Note: Generate Primary Key is a table-based operation.

  • Navigate to the dataset in the Data Preparation format.

  • Select the first column from the displayed dataset.

  • Open the Transforms tab.

  • Select the Generate Primary Key transform from the COLUMNS category.

  • Use with: The user gets two options to generate the primary key:

    • Contiguous- It generates the auto-incremented value starting from 1.

    • Non_contiguous- It generates a unique and random integer value.​​

  • Click the Submit option.

  • A new column with primary values gets added to the data grid.

Get Character Length

The transform Get Character Length when applied adds a new column with numbers displaying the length of character present in that cell.

  • Select a column from the dataset.

  • Navigate to the Transforms tab.

  • Click the Get Character Length transform from the COLUMNS category. E.g., The Get Character Length transform has been applied to the designation column.

  • As a result, a new column gets added next to the designation column displaying the character length of each cell value as displayed below:

Please Note:

  • This transform counts the space provided between two words as a character. So, to get the exact count of the character length the cell should not have any space between two words.

  • The empty cells are kept as it is in the column.

  • The Get Character Length transform can also be accessed by using the menu icon provided for the each column in the Data Grid display of the dataset.

Get JSON Objects

The Get JSON Objects transform extracts any parameter from a given column with JSON data. When the parameter in the JSON is specified, the transform extracts all parameter values as columns into a tabular format.

Please Note: To extract the nested JSON the ‘,’ mark can be used to specify the multiple values.

  • Select a column from the dataset within the Data Preparation framework.

  • Open the Transforms tab.

  • Select the GET JSON Object transform from the COLUMNS category.

  • Provide Parameters to be extracted from the selected column.

  • Click the Submit option.

  • The given parameters are extracted into tabular structure as displayed-below:

Merge Columns

This transformation helps to merge one or more columns and create a new column. By using a separator the values will be separated. E.g., In the following examples the used Separator is '_' (underscore).

Name

Job

Gender

MergedColumn

Ram

Teacher

Male

Ram_Teacher_Male

  • Navigate to the Data Preparation grid page.

  • Navigate to the Transforms tab.

  • Open the Columns transform type.

  • Select the Merge Columns transform.

  • Select the Columns using the Select Columns drop-down menu.

  • Select a separator.

  • Provide name for the New Column.

  • Click the Submit option.

  • As a result, a new column gets added to the data grid with the merged values of the selected columns.

Pivot

When applied, the Pivot transform converts the data into a Pivot table based on the selected Pivot Column and Group of the selected columns.

The pivot data transform is a data manipulation operation used to reshape a dataset, primarily in the context of data analysis and data visualization. It involves converting data from a "long" format to a "wide" format, or vice versa, based on the structure of the original dataset.

  • Navigate to the Data Preparation page with a dataset.

  • Open the Transforms tab.

  • Select the Pivot transform from the COLUMNS category of transforms.

  • Select a column to be used for Group By function (It supports String datatype).

  • Select a Pivot column (It supports String datatype).

  • Choose an aggregation operation out of Avg, Count, Max, Min, and Sum options with an integer column.

  • Click the Submit option.

Transform Selection: In the given sample data, the Students Names is selected as the Pivot Column, and Max aggregation is selected for the Marks column. The selected Group by column is Subjects.​​

As a result, it returns the Max Marks for both the subjects get displayed for each Student in the following manner.

Please Note: The user can change the data display by interchanging the selected columns used for Group by and Pivot.

Random Number Between Zero and One

The RAND transform generates a random real number between 0 and 1.

  • The function accepts an optional integer parameter (seed value), which causes the same set of random numbers to be generated with each execution. When the browser is refreshed, the random numbers remain consistent when the seed value is present.

  • This function generates values of Decimal type with fifteen digits of precision after the decimal point.

Steps to perform the Random Number between Zero and One transform:

  • Navigate to the Data Preparation landing page with a dataset.

  • Open the Transforms tab.

  • Select the COLUMNS category of transforms.

  • Select the Random Number between Zero and One transform .

  • Provide the New Column Name.

  • Provide seed value (use any integer parameter).

  • The output will generate random numbers between 0 & 1 in the new column.

Please Note: If none is provided, a seed is generated based on the system timestamp.

Relocate of Column

This transform helps to relocate/ change of the position of a column in the data preparation. It will be helpful to the user when there are more columns and the user can change the position as per the given options:

  • At End

  • At Start

  • After Column

  • Before Column

Steps to perform the Relocate of Column transform:

  • Navigate to the Data Preparation landing page with a dataset.

  • Open the Transforms tab.

  • Select the COLUMNS category of transforms.

  • Select the Relocate of Column transform .

  • Select a Position option using the drop-down icon.

    • The supported positions are: At End, At Start, After Column, Before Column

  • Click the Submit option.

  • As per the selected position the column gets displayed in the Data Grid.

Remove Trail And Lead Quotes

This transform removes leading and trailing quotes from a text. It can be applied to both single quotes (') and double quotes (").

Steps to perform the transform:

  • Navigate to the Data Preparation landing page.

  • Select a String datatype Column where single or double quotes in the values as trail & lead part.

  • Open the Transform tab.

  • Navigate to the Columns transform type.

  • Click the Remove Trail and Lead Quotes transform.

  • As a result, the quotes will get removed and changes will be reflected in the same source column.

Remove Trail And Lead Whitespace

This transform removes Trail and all Whitespaces found at the beginning and end of the text.

Steps to perform the transform:

  • Select a String datatype Column where whitespace is present in the values as trail & lead part.

  • Click on the Remove trail and lead whitespace transform.

  • Whitespace will get removed from the trail and lead places and changes will be reflected in the same source column.

Remove Whitespace

This transforms removes all whitespace from a string, including leading and trailing whitespace and all whitespace within the string.

Steps to perform the transform:

  • Navigate to the Data Preparation landing page displaying the Data.

  • Select a String datatype Column where whitespace is present in the values as trail, between & lead part.

  • Open the Column

  • Click the Remove Whitespace transform.

    • Use checkbox to select the Create New Column option.

    • Provide a name for the New Column.

  • Click the Submit option.

  • Whitespace will get removed from the lead, in-between, and trail spaces of the content of the source column and displayed as a new column in the data set.

Removing Whitespace from the Source Column

The users can avoid creating a new column while using the Remove Whitespace transform in that case, the whitespaces will get removed from the source column itself.

Rename Column

The Rename Column transform allows the user to rename the selected column.

  • Select a column from the data grid that needs to be renamed.

  • Open the Transforms tab.

  • Choose the Rename Column transform.

Or

  • Select a column from the data grid that needs to be renamed

  • Click the Menu icon provided next to the Column name.

  • Select the Rename Column option from the context menu.

  • The Rename Column dialog box opens (In both the scenarios).

  • Provide a name that you wish to use as a rename for the selected column.

  • Click the Submit option.

  • The column gets renamed.

Return Non-Null Column Values

The transform returns the first non-null value from the list of columns specified to a new column. To perform the transform, select the columns which must be checked for null and specify a column name for the result.

  • Select Column: Select the columns to be checked for null

  • Column name: The name for the new result column returns

The Return Non Null Column Values transform has been applied to the monthly_salary and cur_monthly_payment columns. As a result, the Payment column gets added to the dataset with the values based on the applied transform.

Sorting

The Sorting transformation helps to sort the column in Ascending/ Descending order for the ease of the user in data preparation. There are two options are given:

  • Ascending

  • Descending

Steps to use the Sorting transform.

  • Navigate to a column using the grid view of the Data Preparation.

  • Open the Transform tab.

  • Select the Column transform category.

  • Click on the Sorting transform.

  • Select order using the drop-down option.

  • Click the Submit option.

  • Order of the entire dataset gets changed based on the selected source column.

Please Note: If the Sorting transform is used multiple time, the dataset will be ordered based on the last selected sorting order.

Split Email

This transformation splits an e-mail address into the local part (before the @) and the domain part (after the @).

Steps to perform the transform:

  • Select a Column containing the Email values.

  • Click on the Split Email transform.

  • It will create two new columns as local_part and domain_part.

Please Note: If the input does not contain a valid email address, it will throw an error.

Split HTTP Query String

This transformation splits the elements of an HTTP query string. The query string is the part of the URL that comes after the `?` in the string.

Consider the following URL as an example in this context:

URL: https://bdbizviz.atlassian.net/jira/software/c/projects/BDPC/boards/75?modal=detail&selectedIssue=BDPC-541

Query String: modal=detail&selectedIssue=BDPC-541

Output Column prefix: SplitURL_

Then the Output comes as: SplitURL_modal SplitURL_selectIssue

detail BDPC-541

Steps to perform the Split HTTP Query String transform:

  • Select a URL column with query string.

  • Open the Transforms tab.

  • Navigate to the Column transforms.

  • Select the Split HTTP Query String transform.

  • Provide a Prefix for the Output Column.

  • Click the Submit option.

  • New columns will get created with the given prefix and the key of the HTTP Query string chunk.

    • Given Prefix for the output column: Split HTTP_

      Then the Output comes as: Split HTTP_modal Split HTTP_selectIssue

Split URL

This transform splits the elements of an URL into multiple columns. A valid URL contains the following format scheme ://hostname[:port][/path][?query string][#anchor]

Output -> It will split into multiple columns prefixed by the input column name.

E.g., URL -> https://www.google.com/search?q=query#results

Then Output as:

URL_scheme

URL_domain

URL_path

URL_querystring

http

www.google.com

/search

q=query

Steps to perform the Split URL transformation:

  • Select a URL column.

  • Click the Split URL transform.

  • The source column with URL will get splitted into four columns displaying URL_scheme, URL_domain, URL_path, and URL_querystring.

Please Note: The output of the Split URL will be displayed into multiple columns prefixed by the input column name.

Starts with

The STARTS WITH function is ideal for matching based on patterns for a String data type. It returns true if the leftmost set of characters of a column of values matches a pattern. The source value can be a String data type, and the pattern can be a Pattern, regular expression, or a string.

Steps to perform the transformation:

  • Select a column from the displayed dataset.

  • Select the Starts with transform from the Column category.

  • Select a Mode using the drop-down menu. The given choices are: REGEX and String.

  • Provide name for the new column.

  • Provide value in the proper format.

Please Note:

  • Pass the values in the below-given format for the REGEX mode

    • for single value ^(value),

    • for multiple values ^(value1|value2)

  • The user can use the value without any format while using the String mode. E.g., Pass only abc to get the result for the columns starting with the abc letters.

  • Use checkbox to enable Ignore Case to make string comparison case-insensitive.

  • Click the Submit option.

  • Result will come as True or False for a single column after checking whether the configured words come in the beginning of the concerned row of the selected column.

Check out the below given walk-through on the Starts with transform.

Last updated