Monitoring
Monitoring of Data Ingestion
Monitoring setup can be configured and enabled using open-source tools like Prometheus, Fluentd (for log aggregation of applications), Grafana (for visualization and alerting based on data), Zabbix, and licensed tools like Datadog (DevOps can share more details on each of these options).

Email Alerts
It can be configured based on the tool's setup to notify you in scenarios such as:
A pod that is supposed to be up and running goes down.
A component that is not processing or receiving data within its expected interval.
Logs showing errors, either system-related or programmatic (Errors from external data sources, APIs, or connectors).
Monitoring the Frequency & Volume
To monitor the frequency and volume of data loaded per day/per week, we can identify whether there is an issue in the data source (Data source down, credentials expired).
Track historical trends in ingestion performance to identify recurring problems, optimize resource usage, and predict future resource needs.
We can achieve this by creating a monitoring dashboard that tracks the count of records loaded to the main collections/table for around 1 or 2 weeks. This gives the loaded record count at a glance.
Pipeline Health Monitoring
Error Detection: Automatically detect failures or anomalies in the ingestion pipeline. These could be due to network issues, data format mismatches, missing data, or even connectivity problems. Data format mismatches and format issues must be handled with the failure events/auditing logic.
We can have standard audit templates for the audit mechanism, which can track the jobs and pipelines
Alerting: Real-time alerts (via email, SMS, or integrations with incident management systems like PagerDuty) when issues arise in the ingestion process.
Retries and Backoff Logic: Ensuring that failed tasks or ingestion jobs are automatically retried or managed according to a backoff strategy. This can be accomplished by creating a proper auditing mechanism to track:
The records that came as part of an ingestion Vs how much was written to the destination.
The timestamp/ID of the last record that was written.
A unique ID for each batch that is written, enabling a reprocessing of that batch /verification of the source and destination.
Data Quality Monitoring
Schema Validation: Ensure the ingested data matches the expected schema in terms of structure, types, and constraints (e.g., mandatory fields are not missing, columns required for transformation are not missing).
Completeness: Check whether the data ingested is complete, especially for critical datasets. Missing or incomplete data can cause downstream issues.
Dashboard Monitoring
Usage Metrics
Audit reports help you track who is using your reports and how often.
This includes information about:
Users - Active, Blocked, etc.
Information related to Licenses - Total Licenses, Expiry Licenses, Allocated Licenses
Group Details - Like Number of users, no. of documents
Group-wise document access, user status
User interactions - Interactive sessions, session duration, documents viewed, Latest login, etc.
Performance Monitoring (Query and Report Performance)
Monitor the return time of services with dev tools.
Check whether any service has failed.
Last updated