Deployment Pre-requisites

BDB uses Terraform as a pre-requisite for creating infrastructure before starting the platform deployment.

BDB Core platform requires Kubernetes cluster as a prerequisite. The BDB Data Platform uses Kubernetes versions from 1.27 to 1.30. This version of Kubernetes does not require docker installation like previous Kubernetes versions. You can use Kubernetes services like EKS/GKE/AKS on cloud environments & bare metal on on-prem deployment. All other required services for the BDB Data Platform deployment can be installed on bare metal/virtual machines or deployed on Kubernetes.

Other services that need to be deployed and configured before deploying the BDB Core Platform are:

  • Repository Database -- The BDB team collaborates with the customer IT team to finalize the types of databases to be used for different components. Some of the commonly used databases are MySQL & MongoDB.

  • Monitoring & Alert Mechanism -- The BDB team collaborates with the customer IT team to finalize the monitoring & alerts framework based on the ownership of infrastructure. A responsibility matrix will be outlined to define what each team will monitor, the alert flow mechanism & reporting methodology. Some of the commonly used monitoring & alert tools are as follows:

    • Fluentd – BDB pushes the platform logs to Fluentd, which can be written to Prometheus / data dog / ELK / any other log collector for real-time metrics monitoring. Refer to the Fluentd deployment guide for installation & configuration.

    • Prometheus – Along with the log, Prometheus can scrape k8s time series metrics. This can be used for getting alerts and usage metrics. Refer to the Prometheus deployment guide for installation & configuration.

    • Grafana -- Grafana is an open-source interactive data visualization platform, which allows users to see their data via charts and graphs that are unified into one dashboard (or multiple dashboards) for easier interpretation and understanding. Refer to the Grafana deployment guide for installation & configuration.

    • DataDog -- Datadog is an observability service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform. It supports multiple cloud service providers including Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, Red Hat OpenShift & others.

    • Zabbix -- Zabbix is an open-source software tool to monitor IT infrastructure such as networks, servers, virtual machines, and cloud services. Zabbix collects and displays basic metrics. It is a distributed monitoring software that allows continuous monitoring in real time of servers and other network equipment. It generates the information through data notifications and fully customizable rules.

  • Data Pipeline

    • Kafka – Used to build real-time streaming data pipelines and applications that adapt to the data streams. Refer to the Kafka deployment guide for installation & configuration.

    • Repository Database – It is used as the Pipeline metadata store. It can be installed on bare metal or on Kubernetes using a helm chart. Refer to the MongoDB deployment guide for installation & configuration.

    • Spark operator – It will be used for submitting spark. This will be deployed on Kubernetes using Helm chart. Refer to the Spark Operator deployment guide for installation & configuration.

  • Data Story

    • Elasticsearch / MongoDB / ClickHouse – one of these can be configured for data caching.

Last updated