Data Load to Gold layer (PostgreSQL to PostgreSQL)

The data load from the Silver layer to the Gold layer is managed through parameterized ETL jobs. These jobs are designed to follow a controlled and auditable execution process.

To facilitate this, we maintain three core tables:

  1. etl_parameter_lookup - parameter table, for sequential job execution

  2. master_etl_audit_tbl – As audit master

  3. main_etl_audit_tbl- As the audit table tracks the load of each table

etl_parameter_lookup

This parameter table maintains a list of Gold Layer tables scheduled for loading. The ETL job reads from this table and executes the loads sequentially, following the specified order. Separate jobs are configured for dimension tables, fact tables, and aggregate tables, ensuring structured and organized data processing.

Column Name

Data Type

Nullable

id

serial4

NO

gold_layer_table_name

varchar(100)

NO

gold_layer_table_schema

varchar(50)

NO

gold_layer_table_type

varchar(50)

NO

entity_description

varchar(255)

YES

load_sequence

int4

NO

load_type

varchar(50)

NO

extract_key

varchar(225)

YES

load_key

varchar(225)

YES

dependency_tables

text

YES

full_load_query

text

NO

incremental_load_query

text

YES

parameter_format

varchar(50)

YES

db_connection_type

varchar(100)

YES

active

bool

YES

created_at

timestamp

YES

last_updated

timestamp

YES

main_etl_audit_tbl

This table captures detailed audit information for each table load during a given ETL run. It logs metadata such as the start time, end time, row counts, and load status (success or failure).The load mechanism is incremental and fault-tolerant:

  • Each table’s next load is determined based on the etl_end_time of the most recent successful run.

  • If a table load fails in a previous cycle, the subsequent job will automatically resume or retry from the last valid etl_end_time.

  • This ensures no data loss and enables efficient error recovery in a multi-table pipeline.

Column Name

Data Type

Nullable

id

serial4

NO

process_id

int4

YES

data_source

varchar(50)

YES

source_table_name

_text

YES

destination_table_name

varchar(100)

YES

operation

varchar(50)

YES

time_from

timestamp

YES

time_to

timestamp

YES

etl_start_time

timestamp

YES

etl_end_time

timestamp

YES

input_row

int4

YES

output_row

int4

YES

operation_result

varchar(50)

YES

errorcode

varchar(50)

YES

result

text

YES

master_etl_audit_tbl

This table serves as the master audit log for each ETL job execution cycle. For example, if the ETL process runs hourly, a single entry is recorded in this table for each hourly load, summarizing the overall status of that run.

Column Name

Data Type

Nullable

process_id

serial4

NO

schedule_name

varchar(100)

NO

schedule_timestamp

timestamp

NO

status

varchar(20)

NO

start_time

timestamp

NO

end_time

timestamp

YES

tables_processed

int4

NO

tables_failed

int4

NO

tables_successful

int4

NO

error_message

text

YES

The gold layer was designed with a focus on optimizing reporting performance. Entities in the gold layer were organized into dimension tables, fact tables, and aggregate tables to support efficient analytical queries.

Date and DateTime Dimension Tables:

In addition to the entities sourced from the customer system, a date dimension table and a datetime dimension table were introduced.

These tables are designed to support custom formatting of date and time fields, enable date-time-based analysis, facilitate joining fact tables to calendar attributes, and simplify reporting. The metadata for these tables is provided below. They can be customized to store dates in any format required across the solution.

These tables help eliminate the need for complex date formatting functions in reporting queries. For example, instead of using expressions like:

which generates a formatted output like "Week 1-2022" from the date 2022-01-06

which generates a "2022-01" format for monthly aggregations, the required format can be precomputed and stored in the date dimension table.

This approach improves query readability and performance while ensuring consistency across reports.

Table: date_dim

Field Name

Data Type

Field Description

date_id

int

e.g., 20250506 (YYYYMMDD)

full_date

date

Calendar date

day_of_week

int

ISO day of week (1 = Monday, 7 = Sunday)

day_name

varchar(10)

Name of the day (e.g., "Monday")

day_of_month

int

Day of the month (1–31)

day_of_year

int

Day number in the year (1–365/366)

week_of_year

int

ISO week number (1–53)

month_number

int

Month number (1–12)

month_name

varchar(10)

Name of the month (e.g., "January")

quarter_number

int

Quarter number (1–4)

year

int

Calendar year

fiscal_period

varchar(20)

Fiscal period (e.g., "FY2025-Q2")

weekly_label

varchar(20)

Label for weekly granularity (e.g., "Week 24-2025")

monthly_label

varchar(20)

Label for monthly granularity (e.g., "June 2025")

quarterly_label

varchar(20)

Label for quarterly granularity (e.g., "Q2 2025")

record_last_updated

timestamp

Datetime when the entry was made

Table: datetime_dim

Field Name

Data Type

Field Description

datetime_id

int

e.g., 2025050613 (YYYYMMDDHH)

datetime_val

datetime

Actual timestamp at hour-level granularity

date_val

date

date portion of the datetime

hour_val

int

0 to 23

day_of_week

varchar(20)

Monday, Tuesday, etc.

day_of_week_num

int

1 (Monday) to 7 (Sunday)

is_weekend

boolean

TRUE if Saturday or Sunday

is_holiday

boolean

TRUE if it’s a holiday (if you maintain a calendar)

week_number

int

ISO week number

month

int

Month number (1–12)

month_name

varchar(20)

January, February, etc.

quarter

int

1 to 4

year

int

Year portion

weekly_label

varchar(20)

Label for weekly granularity (e.g., "Week 24-2025")

monthly_label

varchar(20)

Label for monthly granularity (e.g., "June 2025")

quarterly_label

varchar(20)

Label for quarterly granularity (e.g., "Q2 2025")

Last updated