# Part II — BDB Platform Architecture

## BDB Platform Tiered Architecture

The BDB platform is engineered as a unified, four-tier architecture governed by a singular, centralized compliance model. The platform integrates data lifecycle management—from heterogeneous source ingestion to secure user consumption—into a cohesive operational stack.

<table data-header-hidden><thead><tr><th width="195.4000244140625"></th><th width="239.20001220703125"></th><th></th></tr></thead><tbody><tr><td><strong>Architectural Tier</strong></td><td><strong>Integrated Modules</strong></td><td><strong>Core Functional Objective</strong></td></tr><tr><td>1. Sources &#x26; Ingestion</td><td>Data Center, Pipeline, Jobs, 100+ Native Connectors</td><td>Establishes high-throughput connectivity across cloud, on-premises, and SaaS environments. Supports batch, micro-batch, real-time streaming, and Change Data Capture (CDC) within a unified Pipeline interface. Features native connectivity for Snowflake, Microsoft Fabric (all seven workloads), Databricks, Google BigQuery, PostgreSQL, Oracle, SAP, standard SaaS protocols (REST, SOAP, OData for Salesforce, D365 F&#x26;O, Planhat, Gradual), IoT protocols (MQTT, OPC-UA), and distributed event streams (Apache Kafka, AWS Kinesis).</td></tr><tr><td>2. Engineering &#x26; Data Science</td><td>Apache Hudi Lakehouse, DS Lab, Spark Jobs, Ray, Notebooks</td><td>Implements an open transactional data lakehouse utilizing native Apache Hudi, enabling ACID compliance, automated schema evolution, and historical time-travel queries. The Data Science (DS) Lab provides managed notebook environments equipped with Python 3.12, PySpark, Ray distributed computing, and machine learning frameworks (CatBoost, XGBoost) alongside model explainability dashboards. Universal data preparation workflows are executed natively within this single tier.</td></tr><tr><td>3. Kinetic Semantic Layer</td><td>Business Objects, Vocabulary Taxonomies, Data Catalog, Data Quality, Multi-Step Actions, Stewardship Workflows</td><td>Functions as the operational governance and query execution core of the platform. Business Objects programmatically map physical database columns to unified canonical definitions. Vocabulary Taxonomies align controlled business terms with disparate source values. Data Quality (DQ) validation policies and corresponding Trust Scores are enforced directly at the Business Object level. Multi-Step Actions provide secure, governed write-back primitives, while Stewardship Workflows orchestrate multi-party approval routing across all semantic assets.</td></tr><tr><td>4. Consumption</td><td>Self-Service BI, Governed Dashboards, Data Agents, Satellite Applications, Pixel-Perfect Reports, REST/OData APIs</td><td>Standardizes data delivery by forcing all consumer surfaces to query the Kinetic Semantic Layer rather than raw database tables. Delivers ad-hoc analytical discovery via Self-Service BI, executive reporting through Governed Dashboards, conversational AI interfaces via autonomous Data Agents, and role-specific operational workflows through modular Satellite Applications. Extends data accessibility to external systems through secure REST and OData API endpoints.</td></tr></tbody></table>

## Hybrid AI Architecture and Cognitive Execution Model

The BDB platform utilizes a decoupled, hybrid artificial intelligence architecture designed to isolate cognitive reasoning from data execution. At the core of this framework is a specialized Planning Agent that leverages Large Language Models (LLMs) exclusively for natural language processing, intent interpretation, and execution path generation.

Crucially, all actual data manipulation, retrieval, and transactional write-back operations are offloaded to deterministic execution engines—such as compiled SQL, Apache Spark jobs, or native BDB platform APIs. Raw enterprise data never passes through or resolves within the hidden layers of the LLM.

```
   [ User Query ] ---> ( Natural Language Intent ) ---> [ BDB PLANNING AGENT (LLM) ]
                                                                   |
                                                                   v (Generates Plan)
   [ Operational Output ] <--- ( Deterministic Results ) <--- [ EXECUTION ENGINES ]
                                                            (SQL / Spark / BDB APIs)
                                                                   |
                                                      (Kinetic Semantic Layer Enforced)
```

### Provider-Agnostic LLM Orchestration

Unlike legacy data catalogs and governance platforms that restrict organizations to a single, locked-in LLM vendor (e.g., Microsoft Purview’s dependency on Azure OpenAI, or Atlan and Collibra's rigid OpenAI integrations), BDB delivers complete infrastructure sovereignty.

Organizations can configure, swap, and segment LLM engines on a per-deployment or per-function basis to meet strict data residency, compliance, and sovereignty constraints:

* Self-Hosted & Private Compute: Deploy open-weights models (such as Llama or Mistral) entirely within private cloud environments or on-premises infrastructure.
* Enterprise Managed APIs: Integrate with premium third-party model providers, including Anthropic (Claude), OpenAI, Azure OpenAI, and AWS Bedrock.
* Hybrid Topology: Assign distinct LLM providers to different platform functions based on cost, latency, or regulatory requirements.

{% hint style="info" icon="arrow-right-to-bracket" %}
**Enterprise Privacy Guardrail:** Under no circumstances is customer metadata or transactional context transmitted externally for model training or vendor optimization.
{% endhint %}

### Core Architectural Guarantees

This hybrid design enforces three systemic security and operational guardrails across the platform:

* **Zero Algorithmic Hallucination:** Because data retrieval is executed deterministically by underlying query engines rather than generated probabilistically by a language model, the system is structurally incapable of fabricating data values or altering analytical outputs.
* **Zero Raw Data Egress:** Only high-level schema metadata and the user's natural language intent are passed to the cognitive planning layer. Raw data rows, cell values, and compiled query result sets remain locked within the customer’s secure data plane and never egress via the LLM API.
* **Immutable Transaction Logging:** Every cognitive interaction—from the initial natural language request to the generated execution plan and subsequent data action—is captured, timestamped, and stored natively within the centralized BDB Catalog audit log for continuous compliance inspection.

## Functional Deep-Dive: The Kinetic Semantic Layer

The foundational governance and execution engine of the BDB platform is driven by the Kinetic Semantic Layer. Within this architecture, the Business Object serves as the primary, atomic unit of configuration. Unlike traditional systems that isolate metadata documentation from technical deployment, every Business Object natively consolidates six distinct structural concerns into a single, cohesive artifact.

This consolidation ensures that business logic, physical infrastructure mapping, and data compliance policies are tightly coupled and enforced programmatically at runtime.

### Structural Composition of a Business Object

| **Architectural Concern**     | **Encapsulated Metadata & Logic**                                                                                                                                                                                                                                          | **Operational Mechanics**                                                                                                                                                                        |
| ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 1. **Business Definition**    | The authoritative, natural-language description defining a specific corporate concept (e.g., *“An active subscriber is a customer with a minimum of one active subscription state within the preceding 30-day period.”*).                                                  | Authored directly by domain Subject Matter Experts (SMEs) and managed through formal, multi-party Stewardship approval workflows.                                                                |
| 2. **Physical Bindings**      | The declarative mapping that connects abstract business concepts to concrete physical columns across underlying source databases and storage layers.                                                                                                                       | A single Business Object can bind simultaneously to multiple semantic structures across heterogeneous data connectors, serving as the technical foundation for cross-system data reconciliation. |
| 3. **Measures & Dimensions**  | Compiled numeric Key Performance Indicators (KPIs) and categorical dimensions derived from source attributes using the native SQL dialect of the underlying target data source.                                                                                            | Expressions are programmatically validated and shadow-executed against live data streams prior to commits to guarantee syntax and computation validity.                                          |
| 4. **Structural Lineage**     | The automated, column-level dependency mapping that traces data velocity and transformations from original source assets down to the final calculated metric.                                                                                                              | Compiled continuously during query compilation, visualized natively within the centralized Catalog user interface, and exposed programmatically via system APIs.                                 |
| 5. **Quality & Trust Scores** | Embedded Data Quality (DQ) validation policies spanning completeness, uniqueness, value range boundary constraints, referential integrity, ingestion freshness, statistical drift, and pattern formatting.                                                                 | Computes a dynamic Trust Score that aggregates telemetric signals across source data lineage, metadata freshness, execution accuracy, pipeline completeness, and end-user feedback.              |
| 6. **Pre-Authorized Actions** | Multi-Step Actions that serve as secure, governed write-back primitives. These include programmatic pre-conditions (trigger checks), runtime input validation routines, parameterized mapping, Role-Based Access Control (RBAC) constraints, and immutable audit trailing. | Executed either as native database operations (direct SQL DML) or encapsulated within modular, private Micro-functions (Python environments integrated with authenticated external APIs).        |

### Execution-Time Enforcement Mechanics

This unified structure is precisely what allows data governance to be enforced programmatically at the query execution layer, rather than relying on manual policy adherence.

```
                                  [ USER / APPLICATION QUERY ]
                                               |
                                               v
                        +-----------------------------------------------+
                        |            REVENUE BUSINESS OBJECT            |
                        |  - Canonical SQL Expression                   |
                        |  - Embedded Data Quality (DQ) Rules           |
                        |  - Integrated Role-Based Access (RBAC)        |
                        +----------------------+------------------------+
                                               |
                      +------------------------+------------------------+
                      |                                                 |
                      v                                                 v
         [ AUTONOMOUS DATA AGENT ]                           [ SELF-SERVICE BI ]
  Executes an identical SQL expression via            Renders identical metric with embedded
  natural-language planning orchestration.           Trust Scores and active lineage paths.
```

By **binding these six concerns inside the execution path, BDB guarantees absolute consistency** across disparate organizational touchpoints:

* When an autonomous Data Agent processes the conversational request *“What is our Revenue this month?”*, it programmatically evaluates the *Revenue* Business Object and extracts its single, canonical SQL expression.
* When an analyst leverages Self-Service BI to construct an executive dashboard tracking revenue metrics, the visualization engine consumes that exact same Business Object—inheriting identical SQL structures, access controls, quality rules, and lineage trails.

Because the technical calculation and the governance metadata are housed within the same execution artifact, there is no separate semantic layer, metrics catalog, or BI repository to drift out of synchronization. Performance, logic, and compliance remain perfectly aligned across the enterprise ecosystem.

### Platform Standardization: The Multi-Step Action Primitive

The BDB platform unifies all operational, governance, and data lifecycle management processes under a single architectural framework: the Multi-Step Action. Rather than stitching together separate, disparate workflow engines for different tasks, BDB uses this universal workflow primitive across the entire platform. This design allows implementers to learn a single configuration logic and apply it uniformly across all organizational data tracks.

#### Operational Implementations of the Primitive

<table data-header-hidden><thead><tr><th width="300.39996337890625"></th><th></th></tr></thead><tbody><tr><td><strong>Target Workflow Process</strong></td><td><strong>Standardized Execution Path</strong></td></tr><tr><td>Data Product Certification</td><td>Data Steward proposes asset certification -> Assigned Data Owner reviews -> System executes automated validation of Data Quality (DQ) thresholds -> Multi-party approval gate resolves -> Platform applies official certification badge -> Immutable audit log entry generated.</td></tr><tr><td>Data Quality Exception Management</td><td><p>Data Steward proposes a localized DQ exception -> Core platform captures mandatory business justification -> System triggers specific approval gate -> Exception applied with a defined expiration timestamp </p><p>-> Ingestion engine performs automated exclusion during metric re-evaluation -> Immutable audit log entry generated.</p></td></tr><tr><td>Schema Change Impact Gating</td><td>Ingestion pipeline detects an upstream schema modification -> System runs an automated blast-radius impact analysis across downstream assets -> Affected Data Owners automatically notified -> Technical approval gate triggers -> New pipeline version published to production -> Immutable audit log entry generated.</td></tr><tr><td>Semantic Model Modification</td><td><p>Data Steward proposes a semantic definition update -> Platform maps affected dashboards, reporting layouts, and AI agents </p><p>-> Compliance approval gate triggers -> Centralized Business Object definition updates -> Downstream consumer surfaces dynamically refresh -> Immutable audit log entry generated.</p></td></tr><tr><td>Medallion Architecture Promotion</td><td>System initiates Bronze-to-Silver layer promotion -> Automated DQ validation gate executes Data Steward reviews and signs off  -> Dynamic platform recalculation of the asset Trust Score -> Silver data layer published to consumption plane -> Immutable audit log entry generated.</td></tr><tr><td>Empirical Data Owner Nomination</td><td><p>Operational usage telemetry identifies a high-frequency data consumer -> Machine Learning engine computes suitability scoring </p><p>->Active Data Steward reviews the candidate profile -> Nomination approved via system gate -> Centralized Data Catalog ownership mapping updates -> Immutable audit log entry generated.</p></td></tr></tbody></table>

{% hint style="info" icon="sparkle" %}
**Architectural Takeaway:** By consolidating the platform's orchestration layer into a single workflow engine, BDB minimizes operational complexity, accelerates implementation timelines, and guarantees that every state change—whether technical, administrative, or AI-driven—adheres to an identical compliance and auditing standard.&#x20;
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bdb.ai/bdb-user-documentation/bdb-data-management-capabilities/part-ii-bdb-platform-architecture.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
