> For the complete documentation index, see [llms.txt](https://docs.bdb.ai/data-center-3/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.bdb.ai/data-center-3/data-center/data-preparation/data-preparation-workspace/transforms/data-cleansing.md).

# Data Cleansing

## **Clear Cells on Matching Value** <a href="#clear-cells-on-matching-value" id="clear-cells-on-matching-value"></a>

Clear the cell value on matching the condition specified. Operators include contains, equals, starts with, end with, and regex match. Transform applies in the same column.

The ***Clear Cells on Matching Value*** data transform is a process used to remove or delete the contents of specific cells in a dataset or spreadsheet based on a given condition or matching value. This transformation is commonly employed to clean or manipulate data by selectively clearing cells that meet certain criteria.&#x20;

Here's an overview of how the ***Clear Cells on Matching Value*** data transform works:

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Clear Cells on Matching Value*** transform from the ***Data Cleansing*** category.
* **Operator:** Select the operator required for matching from the list.
* **Value:** The value or pattern to be searched for in the selected column.
* Click the ***Submit*** option.

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> The supported Operators for this Transform are: equals, starts with, end with, and regex.*
{% endhint %}

The value selected in the ***Quantity*** column clears the cell with ***0*** value.

<figure><img src="/files/nUg0BQbKbLZHKw5OXPWr" alt=""><figcaption></figcaption></figure>

It removes the values from the cells with 7 value and returns blanks cells:

<figure><img src="/files/2BQYUL7Ql3uQm3WOcot4" alt=""><figcaption></figcaption></figure>

## **Delete Rows on Matching Value** <a href="#delete-rows-on-matching-value" id="delete-rows-on-matching-value"></a>

Delete the rows on matching the condition specified for that column.&#x20;

The ***Delete Rows on Matching Value*** transform is a data manipulation process used to remove rows from a dataset based on a specified condition or matching value. This transformation allows the user to selectively delete rows that meet certain criteria.&#x20;

Here's an overview of how the ***Delete Rows on Matching Value*** transform works:

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Delete Rows on Matching Value*** transform from the ***Data Cleansing*** category.
* **Operator:** Select the operator required for matching from the list.
* **Value:** The value or pattern to be searched for in the selected column.
* Click the ***Submit*** option.

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> The supported operators are: contains, equals, starts with, ends with, and regex match.*
{% endhint %}

The row with given value from the ***Sl. No.*** column gets deleted.

<figure><img src="/files/wBQnfjcgD8AD57IhOozY" alt=""><figcaption><p>Applying the <strong>Delete Rows on Matching Value transform on the Sl. No. column</strong></p></figcaption></figure>

The row with ***5*** value has got deleted from the ***id*** column.

<figure><img src="/files/8MPCBQutmmupQoKeTljY" alt=""><figcaption><p>The selected Row get deleted</p></figcaption></figure>

## **Delete Rows with Empty Cell** <a href="#delete-rows-with-empty-cell" id="delete-rows-with-empty-cell"></a>

The ***Delete Rows with Empty Cell*** transform is a data manipulation operation commonly used in spreadsheet software or data processing tools to remove rows that contain empty cells within a specified column or across multiple columns. The transform helps clean and filter data by eliminating rows that lack essential information or have incomplete records.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Click the ***Delete Rows with Empty Cell*** transform from the ***Data Cleansing*** category.

<figure><img src="/files/QmfbbI50CIJywsbjZ6hW" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> This transform does not have a form to configure, it gets applied by clicking the transform name.*
{% endhint %}

* It deletes all the rows with empty cell in that column returning the data as below:

<figure><img src="/files/hU5GMDCFqQG99o9iFlxY" alt=""><figcaption></figcaption></figure>

## **Delete Rows with Invalid Cell** <a href="#delete-rows-with-invalid-cell" id="delete-rows-with-invalid-cell"></a>

The ***Delete Rows with Invalid Cell*** transform is a data manipulation operation used to remove rows that contain invalid or inconsistent data in a specified column or across multiple columns. This transform helps clean and filter data by eliminating rows that do not meet specific validation criteria or fail to comply with predefined rules.

Refer to the following steps on how the ***Delete Rows with Invalid Cell*** transform works:

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Delete Rows with Invalid Cell*** transform from the ***Data Cleansing*** category.

<figure><img src="/files/IpJV5ly6k79M9I561a5H" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> This transform does not have a form to configure, it gets applied by clicking the transform name.*
{% endhint %}

* The transform deletes rows with an invalid value in the selected column.&#x20;

<figure><img src="/files/iAMpos0drc3XZdWRFbbQ" alt=""><figcaption></figcaption></figure>

## **Delete Rows with Negative Values** <a href="#delete-rows-with-negative-values" id="delete-rows-with-negative-values"></a>

The ***Delete Rows with Negative Values*** transform is a data manipulation operation used to remove rows that contain negative values in a specified column. This transform helps to filter and clean data by eliminating rows that have undesired negative values.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Delete Rows with Negative Values*** transform from the ***Data Cleansing*** category.

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> This transform does not have a form to configure, it gets applied by clicking the transform name.*
{% endhint %}

<figure><img src="https://859511478-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2F49tQ9ZTcSzyAEUOGr7yl%2Fimage.png?alt=media&#x26;token=ac3071bf-f146-48f8-908e-5ed4a477ae77" alt=""><figcaption><p>There are 2 rows with negative values in the id column.</p></figcaption></figure>

* It deletes the row with the negative value and returns the data as displayed below:

<figure><img src="https://859511478-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FKCkIpYhGzipRCs8KIGGP%2Fimage.png?alt=media&#x26;token=7eeaeb98-d2e0-4005-8c88-4840a137445e" alt=""><figcaption><p>2 rows with negative values are removed.</p></figcaption></figure>

## **Fill Cells with Value** <a href="#fill-cells-with-value" id="fill-cells-with-value"></a>

The ***Fill Cells with Value*** transform is a data manipulation operation used to replace empty or missing cells with a specified value in a column or across multiple columns. This transform helps to ensure data consistency and completeness by filling in gaps or replacing missing values.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Fill Cells with Value*** transform from the ***Data Cleansing*** category.
* **Use with:** Specify whether to fill with a value or another column value&#x20;
* **Column/ Value:** The value with which the column must be filled, or the column with which the value must be replaced. When the above transform is applied to the below data on the column ***timecol2***, it fills the column with the selected value that is 30.
* Click the ***Submit*** option.

<figure><img src="https://859511478-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FPTgT6TyYryq86TqxvBa5%2Fimage.png?alt=media&#x26;token=4c6d0f2b-e218-427e-8797-e6532ad41dc7" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>* *The user can also fill the column value with another column's values if the selected option is **Column**. E.g., the following image mentions the values of the **timecol3** is provided for the **timecol2**.​*
{% endhint %}

* It fills the selected column with a value or a value from another column.&#x20;

<figure><img src="https://859511478-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FzkVu7W8ekP3NUNSfgoAP%2Fimage.png?alt=media&#x26;token=fc259667-f056-4832-b1d0-4743eaf16d6e" alt=""><figcaption></figcaption></figure>

## **Fill Empty Cells with Text** <a href="#fill-empty-cells-with-text" id="fill-empty-cells-with-text"></a>

It helps to fill the empty cells of a selected column with a value or a value from another column if the destination column is empty.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Fill Empty Cells with Text*** transform from the ***Data Cleansing*** category.
* **Use with:** Specify whether to fill with a value or another column value.
* **Column/ Value:** The value with which the column must be filled, or the column with which the value must be replaced.
* Click the ***Submit*** option.

When the transform is applied to the below data on the empty cell of the ***Bol*** column,

<figure><img src="https://859511478-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGDmsjfjJBNqow7Fo97cO%2Fuploads%2FhkO63a802f4RnODJDWGO%2Fimage.png?alt=media&#x26;token=d7bf1250-0d89-4380-8273-83c95c57a7e0" alt=""><figcaption></figcaption></figure>

It fills the empty cell with the chosen text:

<figure><img src="/files/FI9XYtGa9DcTcug1PCcO" alt=""><figcaption></figcaption></figure>

## **Flag Duplicates in Columns** <a href="#flag-duplicates-in-columns" id="flag-duplicates-in-columns"></a>

This transform adds a new Boolean column based on duplicate values in the column. For original value it gives false, and for the duplicate value, it provides true value.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Flag Duplicates in Columns*** transform from the ***Data Cleansing*** category.
* Select the column that contains duplicate values.
* Click the ***Submit*** option.

<figure><img src="/files/TZaOLhjj2t0U4mxoTQSm" alt=""><figcaption></figcaption></figure>

* It inserts a new column by flagging the duplicated values as ***true***.

<figure><img src="/files/V8lW8yXoewRnJ9drFNSh" alt=""><figcaption></figcaption></figure>

## **Flag Duplicates in Table** <a href="#flag-duplicates-in-tables" id="flag-duplicates-in-tables"></a>

This transform adds a new Boolean column based on duplicate rows in the table. For original value it gives false, and for the duplicate value, it provides true value.

* Select a column that contains duplicate values.
* Navigate to the ***Transforms*** tab.
* Click the ***Flag Duplicates in Table*** transform from the ***Data Cleansing*** category.

<figure><img src="/files/HkF1ULhm5RRtInOyQk9O" alt=""><figcaption></figcaption></figure>

* When applied on a column it inserts a new column named with the '***flag***\_' prefix and mentions the duplicated values as ***true***.
  * The duplicated values get notified as ***true*** in the ***flag\_duplicates\_table*** column.

<figure><img src="/files/hb6Vixwweky04EDanE2E" alt=""><figcaption></figcaption></figure>

## **Remove Duplicates from Column** <a href="#remove-duplicates-from-column" id="remove-duplicates-from-column"></a>

It removes duplicate values from the selected columns. This transform can be performed on a single as well as on multiple columns.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Remove Duplicates*** ***from Column*** transform from the ***Data Cleansing*** category.
* Select the column that contains duplicate values.
* Click the ***Submit*** option.

<figure><img src="/files/pe5PqCT5V8HVsxmBNg05" alt=""><figcaption></figcaption></figure>

* The duplicated values get removed from the column.

<figure><img src="/files/ypCVfZpgxin3W7Wb6Dc3" alt=""><figcaption></figcaption></figure>

## **Remove Duplicates from Table** <a href="#remove-duplicates-from-table" id="remove-duplicates-from-table"></a>

It removes all duplicate rows from the table.​

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Remove Duplicates from Table*** transform from the ***Data Cleansing*** category.
* The duplicated data from the selected column gets deleted.

## **Remove Letters** <a href="#remove-letters" id="remove-letters"></a>

It removes any letter present in the selected column. The users can either add a new column with the transformed value or overwrite the same column.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Remove Letter*** transform from the ***Data Cleansing*** category.
* Enable create new column to create a new column with the transform result.
* Click the ***Submit*** option.

<figure><img src="/files/5YW8W5eqdyPTmcUtR3ip" alt=""><figcaption></figcaption></figure>

* It removes the numbers from the newly inserted column.

<figure><img src="/files/776JaR0R381FxdLSxiSJ" alt=""><figcaption></figcaption></figure>

## **Remove Numbers** <a href="#remove-numbers" id="remove-numbers"></a>

It removes any number present in the selected column. We can either add a new column with the transformed value or overwrite the same column.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Remove Letter*** transform from the ***Data Cleansing*** category.
* Enable create new column to create a new column with the transform result.
* Click the ***Submit*** option.

When the ***Remove Numbers***, transform gets performed on a selected column,

<figure><img src="/files/jzxontyIj0cA4E2DRg6l" alt=""><figcaption></figcaption></figure>

It removes numbers from the selected column.

<figure><img src="/files/DZ94oNnk76yDuJb7FXVU" alt=""><figcaption></figcaption></figure>

## **Remove Special Characters from Metadata** <a href="#remove-special-characters" id="remove-special-characters"></a>

This transform helps to remove the special characters from the metadata (column headers) of a dataset & make it useful in other modules.

{% hint style="success" %}
*Check out the given illustration on how to Remove Special Characters from Metadata.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FIctE5LjGWDD6zEdW4vpJ%2Fuploads%2FNTytFtiWC8VBWTreIQuR%2FDataCleansing_RemovespecialCharafromMetadata.mp4?alt=media&token=24c80db6-283e-4689-b48e-1173c21d13f3>" %}
***Remove Special Characters from Metadata***
{% endembed %}

## **Remove Special Characters** <a href="#remove-special-characters" id="remove-special-characters"></a>

It removes any special character present in the selected column. Only letters, numbers, and spaces are retained. We can either add a new column with the transformed value or overwrite the same column.

When the transform ***Remove Special Characters*** gets performed on the selected column, the punctuations get removed from the column.

* Select a column.
* Navigate to the ***Transforms*** tab.
* Select the ***Remove Special Characters*** transform from the ***Data Cleansing*** category.
* Enable the ***Create new column*** option.
* Click the ***Submit*** option.
* The special character gets removed from the newly added column.

<figure><img src="/files/FEsbJaB3hUHGiJVi3dzd" alt=""><figcaption></figcaption></figure>

## **Remove Trail And Lead Quotes**

This transform removes leading and trailing quotes from a text. It can be applied to both single quotes (') and double quotes (").

Steps to perform the transform:

* Navigate to the Data Preparation landing page.
* Select a String datatype Column where single or double quotes in the values as trail & lead part.
* Open the ***Transform*** tab.
* Navigate to the ***Data Cleansing*** transform type.
* Click the ***Remove Trail and Lead Quotes*** transform.&#x20;
* As a result, the quotes will get removed and changes will be reflected in the same source column.

## **Remove Trail And Lead Whitespace**

This transform r*emoves **Trail** and **all Whitespaces*** found at the beginning and end of the text.

Steps to perform the transform:

* Select a String datatype Column where whitespace is present in the values as trail & lead part.
* Click on the ***Remove trail and lead whitespace*** transform.&#x20;
* Whitespace will get removed from the trail and lead places and changes will be reflected in the same source column.

## **Remove Whitespace**

This transforms removes ***all whitespace*** from a string, including leading and trailing whitespace and all whitespace within the string.

Steps to perform the transform:

* Navigate to the Data Preparation landing page displaying the Data.
* Select a String datatype Column where whitespace is present in the values as trail, between & lead part.
* Open the Column
* Click the ***Remove Whitespace*** transform.&#x20;
  * Use checkbox to select the Create New Column option.
  * &#x20;Provide a name for the New Column.
* Click the ***Submit*** option.
* Whitespace will get removed from the lead, in-between, and trail spaces of the content of the source column and displayed as a new column in the data set.

### Removing Whitespace from the Source Column <a href="#rename-column" id="rename-column"></a>

The users can avoid creating a new column while using the ***Remove Whitespace*** transform in that case, the whitespaces will get removed from the source column itself.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.bdb.ai/data-center-3/data-center/data-preparation/data-preparation-workspace/transforms/data-cleansing.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
