Data Cleansing

Delete Rows with Invalid Cell

  • The transform deletes any row which has an invalid value in the selected column. The transform does not have form.

  • When we do the transform on the column,

  • It deletes all rows marked by the invalid values as displayed below:

Delete Rows with Empty Cell

  • The transform deletes any row which has a blank value in the selected column. The transform does not have a form.

  • When we perform the transform on the Bol column,

  • It deletes all the rows which have an empty value in that column returning the data as below:

Delete Rows with Negative Values

  • It deletes the rows which have a negative value in the selected column. This transform does not have a form.

  • When this transform is applied to experience column, it deletes all rows with negative, as displayed below:

  • It deletes the row with the negative value and returns the data as displayed below:

The rows with the negative values get removed.

Delete Rows on Matching Value

Delete the rows on matching the condition specified for that column. Operators include contains, equals, starts with, ends with, and regex match.

  • Operator: Select the operator required for matching from the list.

  • Value: The value or pattern to be searched for in the selected column.

The row with given value from the id column should get deleted.

The row with -7 value has gets deleted from the id column.

Clear Cells on Matching Value

Clear the cell value on matching the condition specified. Operators include contains, equals, starts with, end with, and regex match. Transform applies in the same column.

  • Operator: Select the operator required for matching from the list.

  • Value: The value or pattern to be searched for in the selected column.

The value selected in the sal1 column clears the cell with 0 value.

It removes the values from the cells with 0 value and returns blanks cells:

Fill Empty Cells with Text

It helps to fill the empty cells of a selected column with a value or a value from another column if the destination column is empty.

Use with: Specify whether to fill with a value or another column value.

Column/ Value: The value with which the column must be filled, or the column with which the value must be replaced.

When the transform is applied to the below data on the empty cell of the Bol column,

it fills the empty cell with the given text:

Fill Cells with Value

It fills the selected column with a value or a value from another column.

Use with: Specify whether to fill with a value or another column value

Column/ Value: The value with which the column must be filled, or the column with which the value must be replaced.

When the above transform is applied to the below data on the column timecol2, it fills the column with the selected value that is 30.

Please Note: The user can also fill the column value with another column's values if the selected option is Column. E.g., the following image mentions the values of the timecol3 is provided for the timecol2.

Flag Duplicates in Tables

This transform adds a new Boolean column based on duplicate rows in the table. For original value it gives false, and for the duplicate value, it provides true value.

When applied on a column it inserts a new column named with the 'flag_' prefix and mentions the duplicated values as true.

Remove Numbers

It removes any number present in the selected column. We can either add a new column with the transformed value or overwrite the same column.

When the Remove Numbers, transform gets performed on a selected column,

It removes numbers from the selected column.

Remove Letters

It removes any letter present in the selected column. The users can either add a new column with the transformed value or overwrite the same column.

The selected column when applied the Remove Letters transform.

It removes the numbers from the newly inserted column.

Remove Special Characters

It removes any special character present in the selected column. Only letters, numbers, and spaces are retained. We can either add a new column with the transformed value or overwrite the same column.

When the transform Remove Special Characters gets performed on the selected column, the punctuations get removed from the column.

Flag Duplicates in Columns

This transform adds a new Boolean column based on duplicate values in the column. For original value it gives false, and for the duplicate value, it provides true value.

When the Flag Duplicates in Columns is applied to dup_uppercase_1

It inserts a new column by flagging the duplicated values.

Remove Duplicates from Table

It removes all duplicate rows from the table.

Remove Duplicates from Column

It removes duplicate values from the selected columns. This transform can be performed on a single as well as on multiple columns.

The duplicated values get removed from the column.

Find Anomaly

Anomaly detection is used to identify any anomaly present in the data. i.e., Outlier. Instead of looking for usual points in the data, it looks for any anomaly. It uses the Isolation Forest algorithm.

The anomaly gets stored in the new column under the anomaly flag name (In this case, it is displayed under the outlier column).

Last updated