# Data Cleansing

![](https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FgRzExEBe5Un0AWL1Ls4e%2Fimage.png?alt=media\&token=9531cb34-e16c-4e20-a89d-2066ec09c7e3)

## **Delete Rows with Invalid Cell**

* The transform deletes any row which has an invalid value in the selected column. The transform does not have form.
* When we do the transform on the column,&#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FfqK7gOGkIaVMUOSNhWl3%2Fimage.png?alt=media&#x26;token=01e09147-4461-46db-add1-ed790fadaa92" alt=""><figcaption></figcaption></figure>

* It deletes all rows marked by the invalid values as displayed below:

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FDiXtOtyBU9xa5co66ZH2%2Fimage.png?alt=media&#x26;token=1749cc32-31d1-4024-a432-dae2a1876288" alt=""><figcaption></figcaption></figure>

## **Delete Rows with Empty Cell**

* The transform deletes any row which has a blank value in the selected column. The transform does not have a form.                                 &#x20;
* When we perform the transform on the ***Bol*** column,&#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FWoLDh0bR5bygUvWxlU4U%2Fimage.png?alt=media&#x26;token=517d3409-24e9-4b11-b92f-361bb48304b7" alt=""><figcaption><p>There are 2 empty rows in the Bol column</p></figcaption></figure>

* It deletes all the rows which have an empty value in that column returning the data as below:

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FAl4OdtryIUbf2BEuJIv9%2Fimage.png?alt=media&#x26;token=9fb0e1b8-7521-4e94-a238-29e32fe89cfd" alt=""><figcaption></figcaption></figure>

## **Delete Rows with Negative Values**

* It deletes the rows which have a negative value in the selected column. This transform does not have a form.
* When this transform is applied to experience column, it deletes all rows with negative, as displayed below:                              &#x20;
* It deletes the row with the negative value and returns the data as displayed below:     &#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2F49tQ9ZTcSzyAEUOGr7yl%2Fimage.png?alt=media&#x26;token=ac3071bf-f146-48f8-908e-5ed4a477ae77" alt=""><figcaption><p>There are 2 rows with negative values in the id column.</p></figcaption></figure>

The rows with the negative values get removed.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FKCkIpYhGzipRCs8KIGGP%2Fimage.png?alt=media&#x26;token=7eeaeb98-d2e0-4005-8c88-4840a137445e" alt=""><figcaption><p>2 rows with negative values are removed.</p></figcaption></figure>

## **Delete Rows on Matching Value**

Delete the rows on matching the condition specified for that column. Operators include contains, equals, starts with, ends with, and regex match.

* **Operator:** Select the operator required for matching from the list.
* **Value:** The value or pattern to be searched for in the selected column.

&#x20;   The row with given value from the ***id*** column should get deleted.     &#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FUW3AymuxEraowrbY6rD8%2Fimage.png?alt=media&#x26;token=a049ba5a-869b-4981-bba3-eeeb4e621b46" alt=""><figcaption></figcaption></figure>

The row with ***-7*** value has gets deleted from the ***id*** column.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FU9woC6OytuSurweIgsea%2Fimage.png?alt=media&#x26;token=04d763aa-4196-4db1-bf8b-6c77a16e9cc6" alt=""><figcaption></figcaption></figure>

## **Clear Cells on Matching Value**

Clear the cell value on matching the condition specified. Operators include contains, equals, starts with, end with, and regex match. Transform applies in the same column.&#x20;

* **Operator:** Select the operator required for matching from the list.
* **Value:** The value or pattern to be searched for in the selected column. &#x20;

The value selected in the ***sal1*** column clears the cell with ***0*** value.                        &#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2F6TescCce2KoAVgbc3RzR%2Fimage.png?alt=media&#x26;token=53dba3fe-75e9-4b58-8923-53dee4929035" alt=""><figcaption></figcaption></figure>

It removes the values from the cells with 0 value and returns blanks cells:

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FQ4yvSTljoYho7OWSjGR6%2Fimage.png?alt=media&#x26;token=0803ff1e-b049-42d1-be28-2ded566203fb" alt=""><figcaption></figcaption></figure>

## **Fill Empty Cells with Text**

It helps to fill the empty cells of a selected column with a value or a value from another column if the destination column is empty.                              &#x20;

**Use with:** Specify whether to fill with a value or another column value.

**Column/ Value:** The value with which the column must be filled, or the column with which the value must be replaced.

When the transform is applied to the below data on the empty cell of the ***Bol*** column,&#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FhkO63a802f4RnODJDWGO%2Fimage.png?alt=media&#x26;token=d7bf1250-0d89-4380-8273-83c95c57a7e0" alt=""><figcaption></figcaption></figure>

&#x20; it fills the empty cell with the given text:               &#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FRfXBK3MyUV24sHEz44p2%2Fimage.png?alt=media&#x26;token=9b30113c-03a4-40dc-8776-e34ce1e21e5d" alt=""><figcaption></figcaption></figure>

## **Fill Cells with Value**

It fills the selected column with a value or a value from another column.                           &#x20;

**Use with:** Specify whether to fill with a value or another column value

**Column/ Value:** The value with which the column must be filled, or the column with which the value must be replaced.&#x20;

When the above transform is applied to the below data on the column ***timecol2***, it fills the column with the selected value that is 30.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FPTgT6TyYryq86TqxvBa5%2Fimage.png?alt=media&#x26;token=4c6d0f2b-e218-427e-8797-e6532ad41dc7" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> The user can also fill the column value with another column's values if the selected option is **Column**. E.g., the following image mentions the values of the **timecol3** is provided for the **timecol2**.*

{% endhint %}

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FzkVu7W8ekP3NUNSfgoAP%2Fimage.png?alt=media&#x26;token=fc259667-f056-4832-b1d0-4743eaf16d6e" alt=""><figcaption></figcaption></figure>

## **Flag Duplicates in Tables**

This transform adds a new Boolean column based on duplicate rows in the table. For original value it gives false, and for the duplicate value, it provides true value.

When applied on a column it inserts a new column named with the '***flag***\_' prefix and mentions the duplicated values as ***true***.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FW83fqhkwRbX8L0rSzxG9%2Fimage.png?alt=media&#x26;token=dfe1acf2-d913-46d2-bc6b-2679ac2aa1a6" alt=""><figcaption><p>The duplicated values get notified as <em><strong>True</strong></em> in the <em><strong>flag_duplicates_table</strong></em> column </p></figcaption></figure>

## **Remove Numbers**

It removes any number present in the selected column. We can either add a new column with the transformed value or overwrite the same column.                                                &#x20;

When the ***Remove Numbers***, transform gets performed on a selected column,    &#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FK2nEDbMzfqqXy4DaZTCJ%2Fimage.png?alt=media&#x26;token=423d7c2f-eccf-4777-9e30-a8150402abc7" alt=""><figcaption></figcaption></figure>

It removes numbers from the selected column.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2F0fG8s32Vb3WGdptZMNKw%2Fimage.png?alt=media&#x26;token=c93974fe-aff5-4e45-9dca-8e4647328003" alt=""><figcaption></figcaption></figure>

## **Remove Letters**

It removes any letter present in the selected column. The users can either add a new column with the transformed value or overwrite the same column.                           &#x20;

The selected column when applied the ***Remove Letters*** transform.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FYsukQQX4bHamOZ90TWow%2Fimage.png?alt=media&#x26;token=672411ff-6669-4bd2-8982-a536f47b58e3" alt=""><figcaption></figcaption></figure>

It removes the numbers from the newly inserted column.&#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FTENE5n7BKj12qIfnNPxn%2Fimage.png?alt=media&#x26;token=6a88ebd5-6e48-4e2a-9e20-7d2c9b7f0ff0" alt=""><figcaption></figcaption></figure>

## **Remove Special Characters**

It removes any special character present in the selected column. Only letters, numbers, and spaces are retained. We can either add a new column with the transformed value or overwrite the same column.                        &#x20;

When the transform ***Remove Special Characters*** gets performed on the selected column, the punctuations get removed from the column.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FdEVBLW6xVOxugJxqvhyT%2Fimage.png?alt=media&#x26;token=73368369-11f0-4aba-9b44-064baed0c621" alt=""><figcaption><p>Special character gets removed from the newly added column</p></figcaption></figure>

## **Flag Duplicates in Columns**

This transform adds a new Boolean column based on duplicate values in the column. For original value it gives false, and for the duplicate value, it provides true value.

When the Flag Duplicates in Columns is applied to dup\_uppercase\_1&#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2Fe9UuRRVWjQQEEIXxY6aG%2Fimage.png?alt=media&#x26;token=6ba78213-069b-4a9e-a4e8-1d52b23a58e6" alt=""><figcaption></figcaption></figure>

It inserts a new column by flagging the duplicated values.

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2Fig8Ae92o4osNJJOPvgNY%2Fimage.png?alt=media&#x26;token=b1fa75aa-3adf-446a-a46d-03b701f27819" alt=""><figcaption></figcaption></figure>

## **Remove Duplicates from Table**

It removes all duplicate rows from the table.

## **Remove Duplicates from Column**

It removes duplicate values from the selected columns. This transform can be performed on a single as well as on multiple columns.&#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2Fkl8gGKfBJLJnZo159DCN%2Fimage.png?alt=media&#x26;token=663d9910-fff1-4dd3-9c37-b31d55511801" alt=""><figcaption></figcaption></figure>

&#x20; The duplicated values get removed from the column.  &#x20;

<figure><img src="https://4128708524-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FqpN7KxYcqaaJWydM1GGY%2Fimage.png?alt=media&#x26;token=fa7b7368-7edf-4926-a356-f4e0e0c0d4b7" alt=""><figcaption></figcaption></figure>

## **Find Anomaly**

Anomaly detection is used to identify any anomaly present in the data. i.e., Outlier.  Instead of looking for usual points in the data, it looks for any anomaly. It uses the **Isolation Forest** algorithm.

The anomaly gets stored in the new column under the anomaly flag name (In this case, it is displayed under the ***outlier*** column).

&#x20;
