Deployment Considerations
Scheduling
Use the scheduling option in the pipelines /jobs
Consider time zones for daily batch processing
Implement overlap handling for late-arriving data
Environment Management
Maintain separate configurations for dev/staging/prod
Monitoring and Alerting
Set up monitoring for pipeline failures
Monitor S3 storage costs and usage
Track data freshness and completeness
Amazon S3
Parquet files stored in s3://<bucket>/<prefix>/<table_name>/insertion_date=<YYYY-MM-DD>/file.parquet
Each table has data partitioned by insertion_date
Last updated