Data Cleansing
Last updated
Last updated
The transform deletes any row which has an invalid value in the selected column. The transform does not have form.
When we do the transform on the column,
It deletes all rows marked by the invalid values as displayed below:
The transform deletes any row which has a blank value in the selected column. The transform does not have a form.
When we perform the transform on the Bol column,
It deletes all the rows which have an empty value in that column returning the data as below:
It deletes the rows which have a negative value in the selected column. This transform does not have a form.
When this transform is applied to experience column, it deletes all rows with negative, as displayed below:
It deletes the row with the negative value and returns the data as displayed below:
The rows with the negative values get removed.
Delete the rows on matching the condition specified for that column. Operators include contains, equals, starts with, ends with, and regex match.
Operator: Select the operator required for matching from the list.
Value: The value or pattern to be searched for in the selected column.
The row with given value from the id column should get deleted.
The row with -7 value has gets deleted from the id column.
Clear the cell value on matching the condition specified. Operators include contains, equals, starts with, end with, and regex match. Transform applies in the same column.
Operator: Select the operator required for matching from the list.
Value: The value or pattern to be searched for in the selected column.
The value selected in the sal1 column clears the cell with 0 value.
It removes the values from the cells with 0 value and returns blanks cells:
It helps to fill the empty cells of a selected column with a value or a value from another column if the destination column is empty.
Use with: Specify whether to fill with a value or another column value.
Column/ Value: The value with which the column must be filled, or the column with which the value must be replaced.
When the transform is applied to the below data on the empty cell of the Bol column,
it fills the empty cell with the given text:
It fills the selected column with a value or a value from another column.
Use with: Specify whether to fill with a value or another column value
Column/ Value: The value with which the column must be filled, or the column with which the value must be replaced.
When the above transform is applied to the below data on the column timecol2, it fills the column with the selected value that is 30.
Please Note: The user can also fill the column value with another column's values if the selected option is Column. E.g., the following image mentions the values of the timecol3 is provided for the timecol2.
This transform adds a new Boolean column based on duplicate rows in the table. For original value it gives false, and for the duplicate value, it provides true value.
When applied on a column it inserts a new column named with the 'flag_' prefix and mentions the duplicated values as true.
It removes any number present in the selected column. We can either add a new column with the transformed value or overwrite the same column.
When the Remove Numbers, transform gets performed on a selected column,
It removes numbers from the selected column.
It removes any letter present in the selected column. The users can either add a new column with the transformed value or overwrite the same column.
The selected column when applied the Remove Letters transform.
It removes the numbers from the newly inserted column.
It removes any special character present in the selected column. Only letters, numbers, and spaces are retained. We can either add a new column with the transformed value or overwrite the same column.
When the transform Remove Special Characters gets performed on the selected column, the punctuations get removed from the column.
This transform adds a new Boolean column based on duplicate values in the column. For original value it gives false, and for the duplicate value, it provides true value.
When the Flag Duplicates in Columns is applied to dup_uppercase_1
It inserts a new column by flagging the duplicated values.
It removes all duplicate rows from the table.
It removes duplicate values from the selected columns. This transform can be performed on a single as well as on multiple columns.
The duplicated values get removed from the column.
Anomaly detection is used to identify any anomaly present in the data. i.e., Outlier. Instead of looking for usual points in the data, it looks for any anomaly. It uses the Isolation Forest algorithm.
The anomaly gets stored in the new column under the anomaly flag name (In this case, it is displayed under the outlier column).