Data Loss Protection

The Data Loss Protection (DLP) Component helps secure sensitive information in pipelines by applying masking, hashing, redaction, or generalization techniques. This ensures that personally identifiable information (PII), financial records, or other confidential data is protected before downstream processing or sharing.

Key Capabilities

  • Protect sensitive fields such as PII, dates, financial records, and IDs.

  • Apply multiple protection techniques: Redaction, Masking, Hashing, and Date Generalization.

  • Configure rules at a column level to ensure fine-grained data control.

  • Maintain compliance with data privacy and governance policies.

Configuration Overview

All Data Loss Protection configurations are grouped into three sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

Configuring Meta Information

Column Name

  • Enter the column name containing sensitive data to be protected.

Rule Type

Select the rule type to determine how data will be protected. Available options:

1. Redaction

  • Removes or substitutes all or part of a field’s value.

  • Example: CreditCardNumber → XXXX-XXXX-XXXX-1234.

2. Masking

  • Replace sensitive data with a masking character.

  • Configuration Options:

    • Masking Character – The character to replace sensitive values (e.g., * or #).

    • Characters to Ignore – Characters exempt from masking (e.g., - in phone numbers).

    • Type – Choose:

      • Full → Entire field masked.

      • Partial → Mask only part of the field (e.g., first 6 digits of SSN).

  • Example: 987654321 → *****4321.

3. Hashing

  • Transforms input values into a fixed-length cryptographic hash.

  • Supported algorithms:

    • SHA-256

    • SHA-384

    • SHA-512

  • Example: password123 → ef92b778bafe... (SHA-256).

4. Date Generalization

  • Protects sensitive date values by reducing precision.

  • Options:

    • Year – Keep only year (e.g., 2024-07-15 → 2024).

    • Month – Keep only year and month (e.g., 2024-07-15 → 2024-07).

    • Quarter – Generalize by quarter (e.g., 2024-07-15 → Q3 2024).

    • Week – Generalize by week number (e.g., 2024-07-15 → Week 28, 2024).

Example Use Cases

  • Redaction – Mask customer names in audit reports.

  • Masking – Hide middle digits of credit card or phone numbers.

  • Hashing – Protect user passwords or API keys.

  • Date Generalization – Reduce granularity of patient birth dates for compliance.

Best Practices

  • Use masking or redaction for display-only fields.

  • Use hashing for authentication-related fields (irreversible protection).

  • Use date generalization for compliance with anonymization requirements.

  • Test rules in a non-production environment before applying to live data.