# Anonymization

Anonymization is a type of information sanitization whose intent is privacy protection. It is a data processing technique that removes or modifies personally identifiable information.

The below-mentioned transforms are available under the Dates category:![](https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FyHrswR1BHsVGyYwayBcN%2Fimage.png?alt=media\&token=2bf06c1d-9ed7-459d-b1d3-07d99afdae7a)

## **Data Hashing** <a href="#data-hashing" id="data-hashing"></a>

Data Hashing is a technique of using an algorithm to map data of any size to a fixed length. Every hash value is unique.&#x20;

Data Hashing is a data transformation technique used to convert raw data into a fixed-length representation in the form of a hash value. This transformation is often employed as part of the data preprocessing stage before using the data for various purposes such as analysis, machine learning, or storage. The main objective of data hashing as a data transform is to provide a more efficient and secure way to handle and process sensitive or large datasets.

{% hint style="success" %}
*Check out the given illustration on how to use Data Hashing transform.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FK0OLWco0GmBCRiPn1kKI%2FData%20Hashing_Shap%202%20for%20the%20Anonymization.mp4?alt=media&token=b0a53d67-e94e-472e-a9ba-7cbab9a69822>" %}
***Anonymization Transform***
{% endembed %}

Steps to perform the ***Anonymization*** Transform:

* Navigate to a dataset within the Data Preparation framework, and select a column.
* Select one column that needs to be protected.
* Select the ***Transforms*** tab.
* Select the ***Data Hashing*** transform from the ***Anonymization*** category.&#x20;

  <figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FQty1j5mbflueNiRo8Mg9%2Fimage.png?alt=media&#x26;token=8e80e395-f185-4f88-9ebf-faf13a460ba8" alt=""><figcaption></figcaption></figure>
* The Data Hasing window opens.
* Use the drop-down menu to get the available hashing options.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FhQEAJoHG8SpEMMUVqLd4%2Fimage.png?alt=media&#x26;token=b0325475-4c5f-42c0-821f-1c1a52c98175" alt=""><figcaption></figcaption></figure>

* Select a Hash option (E.g., Sha2 hash option has been selected).
* &#x20;Set the ***Hash Value*** using the drop-down menu. The default Hash Value is 256 for the Sha2 hash option.&#x20;
* Click the ***Submit*** option.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FT1V51siGJLZFUVSMcap2%2Fimage.png?alt=media&#x26;token=03389c5f-8ff2-4467-931a-e9947d368965" alt=""><figcaption></figcaption></figure>

* Data in the selected column will be transformed using the hashed format.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2F9BfFXL3yfsXlXs4uQYRA%2Fimage.png?alt=media&#x26;token=6e5c88be-6b35-4685-9743-e4547eef0338" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>*&#x20;

* A suitable hashing algorithm is chosen based on the specific requirements and security considerations as ***Hash Options***. *The supported Hash options are **Hash**, **Sha-1**, **Sha-2,** and **MD-5**.*

![](https://content.gitbook.com/content/Kg5pfnNkTs1b1YNYX7rD/blobs/YnhXgPnC68GCFxnOSqD6/image.png)&#x20;

* The hash options displayed in the UI map to the following actual hashing algorithms on the backend:
  * Sha1 (UI) → SHA-256 (Backend)
  * Sha2 (UI) → SHA-512 (Backend)
  * Hash (UI) → MD-5 (Backend)
  * MD5 (UI) → MD-5 (Backend)
    {% endhint %}

<details>

<summary>Data Hashing with Sha1 Hash Option</summary>

* Select a column from the given dataset within the ***Data Preparation*** framework.
* Open the ***Transforms*** tab.&#x20;
* Select the ***Data Hashing*** transform from the ***ANONYMIZATION*** category.
* Select a column from data grid for transformation.
* Select ***Sha1*** as Hash Option.&#x20;
* Click the ***Submit*** option.

&#x20;      ![](https://content.gitbook.com/content/Kg5pfnNkTs1b1YNYX7rD/blobs/mBvf4jVL18r5OLZCB6uD/image.png)

* The selected column gets converted based on the hashing option.

&#x20;      ![](https://content.gitbook.com/content/Kg5pfnNkTs1b1YNYX7rD/blobs/284q6YAEXh8XwWsUUaWc/image.png)

</details>

<details>

<summary>Data Hashing with Sha2 Hash Option</summary>

* Select a column from the given dataset within the ***Data Preparation*** framework.
* Open the ***Transforms*** tab.&#x20;
* Select the ***Data Hashing*** transform from the ***ANONYMIZATION*** category.
* Select a column from data grid for transformation.
* Select ***Sha2*** as Hash Option.&#x20;
* Select a Hash Value from the drop-down (The supported values are 256, 384, and 512).&#x20;
* Click the ***Submit*** option.

&#x20;      ![](https://content.gitbook.com/content/Kg5pfnNkTs1b1YNYX7rD/blobs/THEEDqcFvlXuBJmoAHe3/image.png)

* The selected column gets converted based on the hashing option.

&#x20;      ![](https://content.gitbook.com/content/Kg5pfnNkTs1b1YNYX7rD/blobs/yjoy083wc1MdBtORxPgd/image.png)

</details>

<details>

<summary>Data Hashing with MD5 Hash Option</summary>

* Select a column from the given dataset within the ***Data Preparation*** framework.
* Open the ***Transforms*** tab.&#x20;
* Select the ***Data Hashing*** transform from the ***ANONYMIZATION*** category.
* Select a column from data grid for transformation.
* Select ***MD5*** as Hash Option.&#x20;
* Click the ***Submit*** option.

&#x20;     ![](https://content.gitbook.com/content/Kg5pfnNkTs1b1YNYX7rD/blobs/pzCF7zgR89NDKvbl82E1/image.png)

* The selected column gets converted based on the hashing option.

&#x20;      ![](https://content.gitbook.com/content/Kg5pfnNkTs1b1YNYX7rD/blobs/2z6futG752LtreYALgp3/image.png)

</details>

## **Data Masking** <a href="#data-masking" id="data-masking"></a>

Data masking transform is the process of hiding original data with modified content. It is a method of creating a structurally similar but inauthentic version of actual **data.**

{% hint style="success" %}
*Check out the given walk-through on the Data Masking transform.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2Flc1HnNgfaqqvYhGiGaBN%2FData%20Masking_Anonymization.mp4?alt=media&token=1db5c757-e64f-4b79-906d-c7c761022aec>" %}
Data Masking
{% endembed %}

Steps to perform the ***Data Masking*** Transform:

* Navigate to a dataset within the Data Preparation framework, and select a column.
* Select one column that needs to be protected.
* Select the ***Transforms*** tab.
* Select the ***Data Masking*** transform from the ***Anonymization*** category.&#x20;

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FkBvcpiLGamTiyeop7YFY%2Fimage.png?alt=media&#x26;token=41068ab2-b08e-4911-8646-8d2399ba4a70" alt=""><figcaption></figcaption></figure>

* The ***Data Masking*** dialog box opens.
* Provide the ***Start Index*** and ***End Index*** to mask the selected data.
* Click the ***Submit*** option.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2F3NQYJW4jPZT2WRFyGtPY%2Fimage.png?alt=media&#x26;token=e28c96ad-5a7d-41f9-958e-5dab1d7870ce" alt=""><figcaption></figcaption></figure>

* The image displays how the ***Data Masking*** transform (when applied to the selected dataset) converts the selected data:​

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FfjcogHtsEQ8BuCELWlew%2Fimage.png?alt=media&#x26;token=63f7d9a8-2da9-4ba4-bd9b-6d6f5eddd1a1" alt=""><figcaption></figcaption></figure>

## **Data Variance** <a href="#data-variance" id="data-variance"></a>

The Data Variance transform allows the users to apply data variance to the Numeric and Date columns.

### Applying the Data Variance transform to a Number Column

{% hint style="success" %}
*Check out the illustration on how to use the Data Variance on a numeric column.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FxUrjs2jm7g8ucBW1GcF3%2FData%20Variance%20on%20Numeric%20Column.mp4?alt=media&token=70b591f1-f21b-48e9-ba89-c496f619e94f>" %}
***Data Variance on a Numeric column***
{% endembed %}

* Select a numeric column within the Data Preparation framework.
* Open the ***Transforms*** tab.
* Select the ***Data Variance*** transform from the ***Anonymization*** category.&#x20;

  <figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2Fsft4OlvQgDu6ppupXPWr%2Fimage.png?alt=media&#x26;token=a69cf59a-74ba-4502-865c-51ca145b104d" alt=""><figcaption></figcaption></figure>
* The ***Data Variance*** dialog box opens.
* Configure the following information:
  * Select ***Numeric*** as the ***Value Type***.
  * Select an Operator using the drop-down option.
  * Set percentage.&#x20;
  * Provide a comment in the given section.
* Click the ***Submit*** option.​

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FSe6aasYKauUVrzR8y5Bf%2Fimage.png?alt=media&#x26;token=099ed677-c2e0-4c93-b852-db9618a8d9af" alt=""><figcaption></figcaption></figure>

* The data of the selected column gets transformed based on the set of numeric values.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2F51p7nw7m1ouOPfJJr3vg%2Fimage.png?alt=media&#x26;token=4a424088-3447-4eef-aceb-5c6e0b973a98" alt=""><figcaption></figcaption></figure>

### Applying the Data Variance transform to a Date Column

{% hint style="success" %}
*Check out the illustration on how to use the Data Variance on a date column.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FCeTBvy2N7WMChyhnxQlb%2FData%20Variance%20on%20Date%20Column.mp4?alt=media&token=6bcfda48-6aa3-409a-b30e-21ce6cc76738>" %}
Data Variance Transform on a Date Column
{% endembed %}

* Select a column containing Date values from the given dataset within the Data Preparation framework.
* Open the ***Transforms*** tab.
* Select the ***Data Variance*** transform from the ***Anonymization*** category.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FbWfsZrXzACn6yzYvIxdi%2Fimage.png?alt=media&#x26;token=854f30d3-3c81-44f1-a954-44b1ff16b294" alt=""><figcaption></figcaption></figure>

* The ***Data Variance*** dialog box opens.
* Configure the following information:
  * Select ***Date*** as the ***Value Type***.
  * Select an ***Input Format*** from the drop-down list.
  * Select a ***Start Date*** using the Calendar option.
  * Select an ***End Date*** using the Calendar option.
  * Provide a comment in the given section.
* Click the ***Submit*** option.​

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2F1LyVvLwNjX6nUvvx9oEW%2Fimage.png?alt=media&#x26;token=3572bc67-9589-4a08-8f9b-1a029bb38194" alt=""><figcaption></figcaption></figure>

* The selected Date column will display random dates from the selected date range.

  <figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FjMl3WFZjUFnXHVZ7bpta%2Fimage.png?alt=media&#x26;token=6b47e32f-699a-4e0e-8cc9-171b4acfd839" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> The **Data Variance** transform also provides space to add description while configuring the transformation information.*
{% endhint %}

## **Hashing Anonymization (using Salt and Pepper technique)**

This transformation using the ***Salt and Pepper*** technique is a method to protect sensitive data by introducing random noise or fake data points into a dataset while preserving its statistical properties.

{% hint style="success" %}
*Check out the illustration on the Hashing Anonymization transform.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2Fo7Se60DVlzWvsJFoEBEx%2FHashing%20Anonymization%20(Salt%20Pepper).mp4?alt=media&token=fa371cbb-9fbd-4d4d-950f-af83ed005f57>" %}
Hasing Anonymization (using Salt and Pepper Technique)
{% endembed %}

* Navigate to a dataset within the Data Preparation framework, and select a column.
* Select one column that needs to be protected.
* Select the ***Transforms*** tab.
* Select the ***Hashing Anonymization*** (using salt & pepper technique) transform from the ***Anonymization*** category.&#x20;

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FrQcQQAgmJr1kCGe0qzhI%2Fimage.png?alt=media&#x26;token=65b75a93-a88d-4425-925d-6aa1fef674e8" alt=""><figcaption></figcaption></figure>

* The ***Hashing Anonymization (Using the salt & pepper technique)*** dialog window opens.
* Provide a value using the ***Set Values*** space.
* Select a field using the ***Set Fields*** drop-down menu.
* Select a hashing option using the ***Hash Option*** drop-down menu.
* Click the ***Submit*** option.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2FrnjbW33hzntADltnv0K3%2Fimage.png?alt=media&#x26;token=c9f83b38-3eb7-4ea6-8c57-5b9acaa99e25" alt=""><figcaption></figcaption></figure>

* The target column data will be displayed after applying the selected hashing option.

<figure><img src="https://2657181281-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKg5pfnNkTs1b1YNYX7rD%2Fuploads%2Fv5gRv6eP9KyOcFYo70Zi%2Fimage.png?alt=media&#x26;token=b0a6318f-a44c-4bf3-ae88-9a2464280953" alt=""><figcaption></figcaption></figure>

{% hint style="info" %} <mark style="color:green;">Please Note:</mark>&#x20;

1. *The first user-provided value (entered in the "Set Values" field) acts as the pepper.*
2. *Selected column values will act as the salt.*
3. *The hash options displayed in the UI map to the following actual hashing algorithms on the backend:*
   1. Sha1 (UI) → SHA-256 (Backend)
   2. Sha2 (UI) → SHA-512 (Backend)
   3. Hash (UI) → MD-5 (Backend)
   4. MD5 (UI) → MD-5 (Backend)
      {% endhint %}
