Data Loss Protection
The Data Loss Protection (DLP) Component helps secure sensitive information in pipelines by applying masking, hashing, redaction, or generalization techniques. This ensures that personally identifiable information (PII), financial records, or other confidential data is protected before downstream processing or sharing.
Key Capabilities
Protect sensitive fields such as PII, dates, financial records, and IDs.
Apply multiple protection techniques: Redaction, Masking, Hashing, and Date Generalization.
Configure rules at a column level to ensure fine-grained data control.
Maintain compliance with data privacy and governance policies.
Configuration Overview
All Data Loss Protection configurations are grouped into three sections:
Basic Information
Meta Information
Resource Configuration
Configuring Meta Information
Column Name
Enter the column name containing sensitive data to be protected.
Rule Type
Select the rule type to determine how data will be protected. Available options:
1. Redaction
Removes or substitutes all or part of a field’s value.
Example:
CreditCardNumber → XXXX-XXXX-XXXX-1234
.
2. Masking
Replace sensitive data with a masking character.
Configuration Options:
Masking Character – The character to replace sensitive values (e.g.,
*
or#
).Characters to Ignore – Characters exempt from masking (e.g.,
-
in phone numbers).Type – Choose:
Full → Entire field masked.
Partial → Mask only part of the field (e.g., first 6 digits of SSN).
Example:
987654321 → *****4321
.
3. Hashing
Transforms input values into a fixed-length cryptographic hash.
Supported algorithms:
SHA-256
SHA-384
SHA-512
Example:
password123 → ef92b778bafe...
(SHA-256).
4. Date Generalization
Protects sensitive date values by reducing precision.
Options:
Year – Keep only year (e.g.,
2024-07-15 → 2024
).Month – Keep only year and month (e.g.,
2024-07-15 → 2024-07
).Quarter – Generalize by quarter (e.g.,
2024-07-15 → Q3 2024
).Week – Generalize by week number (e.g.,
2024-07-15 → Week 28, 2024
).
Example Use Cases
Redaction – Mask customer names in audit reports.
Masking – Hide middle digits of credit card or phone numbers.
Hashing – Protect user passwords or API keys.
Date Generalization – Reduce granularity of patient birth dates for compliance.
Best Practices
Use masking or redaction for display-only fields.
Use hashing for authentication-related fields (irreversible protection).
Use date generalization for compliance with anonymization requirements.
Test rules in a non-production environment before applying to live data.