Ingestion Process

Requirement

Evaluation

Remarks

Change data capture

medium

Integration with thirdparty tools like Debezium can be provided

Scheduled ingestion

very high

Yes, it is avaliable off-the-shelf.

Minimise fields

very high

Field minimization can be defined through the Data Preparation tool, and published to be used in the live Data Pipelines.

Filter by lookup

very high

Yes, it is a standard component.

Filter by consent

very high

It can be achieved via API integration with consent system, or through consent database lookup.

Anonymise fields

very high

Standard anonymization available via Data Preparation option or the Spark SQL component.

Compose Ingestion Processors

very high

Drag and drop based low-code platform

Ingestion Fault tolerant

very high

Ability to track faults and initiate sub process

Bootstrap + updates

very high

Can define pipeline to load historic data and subsequent updates as per the data load strategy

Reports + Metrics

high

Data Pipeline generates metric report about every process, like Memory used, CPU used, no. of records processed, etc.

performance impact threshold

high

Configurable compute resource allocation and instances to scale up

Secrety Management integration

very high

All secrets are stored in the Kubernestes secrets, platform provides direct integration with this.

Data Catalogue integration

very high

Platform automatically generate data catalog from the underlying meta data.

Visual interface

very high

Pipeline studio has drag and drop based visual interface, based on No-code/low-code approach.

Ingestion Manifest file

very high

It is achievable via internal metadata.

CI/CD Pipelines Integration

high

Yes, it provides facility to check-in and check-out Pipeline definitions and metadata to GIT Lab.

Ingestion Access Management

very high

Data Pipeline supports RBAC.

Ingestion Audit Logs

very high

Logs can be pushed to thirdparty log monitoring systems like Datadog, Promethues, etc.

PreviousSeamless Handling of Data ops and ML ops NextBuilding a path from ingestion to analytics

Last updated 2 years ago