Data Object Architecture and Dependencies

This page offers an overview of the Data Center module's features and functions, ensuring you can utilize its maximum capacity.

The BDB Data Center empowers data engineers to connect to diverse data sources and build a robust, governed data ecosystem. It supports the creation and reuse of Data Connectors, and provides additional capabilities including Datasets, Data Stores, Data as API, Micro-Functions, Feature Stores, Data Sandbox, Data Preparation, and Widgets—all designed to accelerate analytics, operationalize data products, and enable AI/ML at scale.

Key Capabilities

Bring Your Own Data (BYOD): Create Data Connectors for each source—no disruptive migrations.
Virtualization & Reuse: Author Datasets once; reuse across Widgets, Dashboards, DS Lab, Data as API, and more.
Operational Data Products: Expose curated Data as API (REST) with governance and DQ controls.
Programmable Automations: Use Micro-Functions for CRUD, external API calls, and workflow triggers.
File Ingestion & Collaboration: Upload/manage files via Data Sandbox; track through Online Data Sheet.
Visualization-Ready Stores: Materialize report-ready tables in Data Store for self-service BI.
ML Feature Governance: Centralize engineered features in the Feature Store for consistent training/serving.
Transform & Validate: Profile, cleanse, transform, and monitor quality with Data Preparation.
Composable UI: Bind Widgets to Datasets; configure interactions, exports, tooltips, indicators, and scripts.

Components

Data Center encapsulates a variety of data objects, each serving a specific purpose:

Data Connector

Seamlessly integrates with existing infrastructure, embracing a bring your own data approach and avoiding bulk migrations. Supports 40+ native connectors for structured, unstructured, and streaming data.

Supported data sources include:

Databases: MySQL, MSSQL, Elastic, Oracle, Hive, ClickHouse, Athena, DynamoDB, Synapse, ArangoDB, Hive, Cassandra, MongoDB, MongoDB Data BI, PostgreSQL, Pinot, Snowflake
APIs: Google Analytics, Google Big Query Connector, Google Form, Google Sheet, Jira
Files: Flat files, FTP/SFTP servers, CSV, Excel, JSON
Others: SAP Hana, MS SQL OLAP, OData, Spark SQL, AWS Redshift

Highlights

Secure credentials & connection pooling
Incremental & full loads; streaming where supported
Centralized cataloging and lineage at the connector level
The My Connectors section of the Data Center module provides a centralized view of all created Data Connectors and their associated objects.

Datasets

A virtualization layer that lets data engineers:

Write database-native queries (push-down when possible)
Publish and reuse queries across Widgets, Dashboards, Data Science Lab (DS Lab), Data as API, and more
Parameterize queries for dynamic filtering and row-level access

Typical uses: Consistent, governed query logic; semantic views for BI; DS Lab feature sourcing.

Data as API

Expose datasets as RESTful APIs for external apps and authorized users.

Key capabilities

Build, test, and deploy custom API services
Enable data monetization strategies
Centralize data quality management (cleanse & enrich before exposure)
Provide secure, scalable access via authenticated REST endpoints

Micro-Functions

Lightweight, Python-based operations used to perform:

CRUD (Create, Read, Update, Delete)
External API calls
Workflow triggers & automation

Essential components of the BDB Data Agent, enabling event-driven workflows and real-time actions.

Data Sandbox

A secure, network-accessible space to upload and store files. Files are accessible by:

DS Lab
Data Preparation tools
Data Pipelines

Also supports the Online Data Sheet for tracking and managing data snapshots.

Data Store

Primarily used by the Report Module (self-service visualization).

Functions

Define metadata stores for report-ready data in configured databases
Create a Data Store Table using SQL for quick loads (ideal for small/ad-hoc datasets)
For large volumes and deltas, prefer dedicated Data Engineering pipelines

Feature Store

A centralized registry for AI/ML features supporting:

Reusability & consistency of features across teams
Improved governance & standardization for training and deployment
Versioning and serving semantics for online/offline parity

Data Preparation

Design, automate, and monitor transformations that make raw data analytics-ready.

Capabilities

Profiling & Quality: Null checks, uniqueness, ranges, referential integrity
Cleansing & Standardization: Type casting, date normalization, trimming, deduplication
Transformations: Joins, aggregations, pivots/unpivots, conditional rules, calculated fields
Enrichment: Lookups, reference data merges, geocoding (where available)
Scheduling & Orchestration: Run one-off or scheduled jobs; integrate with Pipelines
Lineage & Audit: Track source→target mappings; store run metadata and quality outcomes
Outputs: Persist to Datasets, Data Store, or feed Feature Store for ML

Best practices

Keep transformations idempotent and parameterized
Apply data quality gates before publication
Use versioned recipes; promote Dev → UAT → Prod

Widgets

Reusable visual components (charts, KPIs, grids, filters, maps, legends, etc.) that bind to Datasets and support advanced configuration.

Key concepts

Binding: Map Dataset fields to Category (dimensions) and Series (measures)
Formatting: Units, precision, currency, number formatter, position
Interactivity: Filters, drill-through, drill highlighter, indicators, tooltips (default/custom), data labels
Export: Context menu for PDF/Excel/CSV; plugin scripts for programmatic exports
Scripting: Component-level and connection-level scripting for UI logic and back-end actions
Version Control: Push/Pull to VCS; publish to Portal; share with user/groups

Recommended workflow

Create Dataset → parameterize as needed
Bind Widget fields and configure properties (legend, axes, labels)
Add indicators/alerts/tooltips; set export options
Use scripting for dynamic behaviors (show/hide, cross-filter, reload connections)
Preview, Save, Publish (Portal), and/or Share

Security & Governance (Cross-cutting)

RBAC/ABAC on connectors, datasets, APIs, and dashboards
Data Quality rules and approval workflows before publication.
Audit & Lineage across connectors → datasets → stores/APIs/widgets
Secrets Management: No credentials in scripts; use platform vault/connection objects.

Performance & Operations

Prefer predicate push-down in Datasets; avoid broad select.
Use incremental loads and partitioning in the Data Store for scale.
Cache read-heavy Datasets where appropriate; schedule Data Preparation jobs during off-peak.
Monitor API latency/throughput; scale instances horizontally if needed.

Quick Start

Connect: Create Data Connector(s) (e.g., PostgreSQL + CSV).
Model: Build Datasets (with parameters & DQ checks).
Prepare: Use Data Preparation to clean/transform and publish outputs.
Store/Serve: Materialize in Data Store and/or expose via Data as API.
Visualize: Bind Widgets to Datasets; modify the display and export for building customized insights.
Automate: Wire Micro-Functions and Data Agent triggers for event-driven flows.
Govern: Share, version (VCS), publish to Portal; monitor quality and usage.

PreviousData Center NextAccess the Data Center