Deployment Considerations

Scheduling

Use the scheduling option in the pipelines /jobs

Consider time zones for daily batch processing
Implement overlap handling for late-arriving data

Environment Management

Maintain separate configurations for dev/staging/prod

Monitoring and Alerting

Set up monitoring for pipeline failures
Monitor S3 storage costs and usage
Track data freshness and completeness

Amazon S3

Parquet files stored in s3://<bucket>/<prefix>/<table_name>/insertion_date=<YYYY-MM-DD>/file.parquet
Each table has data partitioned by insertion_date

Last updated 2 months ago