Deployment Considerations

Scheduling

Use the scheduling option in the pipelines /jobs

  • Consider time zones for daily batch processing

  • Implement overlap handling for late-arriving data

Environment Management

  • Maintain separate configurations for dev/staging/prod

Monitoring and Alerting

  • Set up monitoring for pipeline failures

  • Monitor S3 storage costs and usage

  • Track data freshness and completeness

Amazon S3

  • Parquet files stored in s3://<bucket>/<prefix>/<table_name>/insertion_date=<YYYY-MM-DD>/file.parquet

  • Each table has data partitioned by insertion_date

Last updated