Data Object Architecture and Dependencies

This page offers an overview of the Data Center module's features and functions, ensuring you can utilize its maximum capacity.

The BDB Data Center empowers data engineers to connect to diverse data sources and build a robust, governed data ecosystem. It supports the creation and reuse of Data Connectors, and provides additional capabilities including Datasets, Data Stores, Data as API, Micro-Functions, Feature Stores, Data Sandbox, Data Preparation, and Widgets—all designed to accelerate analytics, operationalize data products, and enable AI/ML at scale.

Key Capabilities

  • Bring Your Own Data (BYOD): Create Data Connectors for each source—no disruptive migrations.

  • Virtualization & Reuse: Author Datasets once; reuse across Widgets, Dashboards, DS Lab, Data as API, and more.

  • Operational Data Products: Expose curated Data as API (REST) with governance and DQ controls.

  • Programmable Automations: Use Micro-Functions for CRUD, external API calls, and workflow triggers.

  • File Ingestion & Collaboration: Upload/manage files via Data Sandbox; track through Online Data Sheet.

  • Visualization-Ready Stores: Materialize report-ready tables in Data Store for self-service BI.

  • ML Feature Governance: Centralize engineered features in the Feature Store for consistent training/serving.

  • Transform & Validate: Profile, cleanse, transform, and monitor quality with Data Preparation.

  • Composable UI: Bind Widgets to Datasets; configure interactions, exports, tooltips, indicators, and scripts.

Components

Data Center encapsulates a variety of data objects, each serving a specific purpose:

Data Connector

Seamlessly integrates with existing infrastructure, embracing a bring your own data approach and avoiding bulk migrations. Supports 40+ native connectors for structured, unstructured, and streaming data.

Supported data sources include:

  • Databases: MySQL, MSSQL, Oracle, Hive, Cassandra, MongoDB, PostgreSQL, Snowflake

  • APIs: Salesforce, Mailchimp, LinkedIn, Twitter, Google Analytics, Jira

  • Files: Flat files, FTP/SFTP servers, CSV, Excel, JSON

Highlights

  • Secure credentials & connection pooling

  • Incremental & full loads; streaming where supported

  • Centralized cataloging and lineage at the connector level

  • The My Connectors section of the Data Center module provides a centralized view of all created Data Connectors and their associated objects.

Datasets

A virtualization layer that lets data engineers:

  • Write database-native queries (push-down when possible)

  • Publish and reuse queries across Widgets, Dashboards, Data Science Lab (DS Lab), Data as API, and more

  • Parameterize queries for dynamic filtering and row-level access

Typical uses: Consistent, governed query logic; semantic views for BI; DS Lab feature sourcing.

Data as API

Expose datasets as RESTful APIs for external apps and authorized users.

Key capabilities

  • Build, test, and deploy custom API services

  • Enable data monetization strategies

  • Centralize data quality management (cleanse & enrich before exposure)

  • Provide secure, scalable access via authenticated REST endpoints

Micro-Functions

Lightweight, Python-based operations used to perform:

  • CRUD (Create, Read, Update, Delete)

  • External API calls

  • Workflow triggers & automation

Essential components of the BDB Data Agent, enabling event-driven workflows and real-time actions.

Data Sandbox

A secure, network-accessible space to upload and store files. Files are accessible by:

  • DS Lab

  • Data Preparation tools

  • Data Pipelines

Also supports the Online Data Sheet for tracking and managing data snapshots.

Data Store

Primarily used by the Report Module (self-service visualization).

Functions

  • Define metadata stores for report-ready data in configured databases

  • Create a Data Store Table using SQL for quick loads (ideal for small/ad-hoc datasets)

  • For large volumes and deltas, prefer dedicated Data Engineering pipelines

Feature Store

A centralized registry for AI/ML features supporting:

  • Reusability & consistency of features across teams

  • Improved governance & standardization for training and deployment

  • Versioning and serving semantics for online/offline parity

Data Preparation

Design, automate, and monitor transformations that make raw data analytics-ready.

Capabilities

  • Profiling & Quality: Null checks, uniqueness, ranges, referential integrity

  • Cleansing & Standardization: Type casting, date normalization, trimming, deduplication

  • Transformations: Joins, aggregations, pivots/unpivots, conditional rules, calculated fields

  • Enrichment: Lookups, reference data merges, geocoding (where available)

  • Scheduling & Orchestration: Run one-off or scheduled jobs; integrate with Pipelines

  • Lineage & Audit: Track source→target mappings; store run metadata and quality outcomes

  • Outputs: Persist to Datasets, Data Store, or feed Feature Store for ML

Best practices

  • Keep transformations idempotent and parameterized

  • Apply data quality gates before publication

  • Use versioned recipes; promote Dev → UAT → Prod

Widgets

Reusable visual components (charts, KPIs, grids, filters, maps, legends, etc.) that bind to Datasets and support advanced configuration.

Key concepts

  • Binding: Map Dataset fields to Category (dimensions) and Series (measures)

  • Formatting: Units, precision, currency, number formatter, position

  • Interactivity: Filters, drill-through, drill highlighter, indicators, tooltips (default/custom), data labels

  • Export: Context menu for PDF/Excel/CSV; plugin scripts for programmatic exports

  • Scripting: Component-level and connection-level scripting for UI logic and back-end actions

  • Version Control: Push/Pull to VCS; publish to Portal; share with user/groups

Recommended workflow

  1. Create Dataset → parameterize as needed

  2. Bind Widget fields and configure properties (legend, axes, labels)

  3. Add indicators/alerts/tooltips; set export options

  4. Use scripting for dynamic behaviors (show/hide, cross-filter, reload connections)

  5. Preview, Save, Publish (Portal), and/or Share

Security & Governance (Cross-cutting)

  • RBAC/ABAC on connectors, datasets, APIs, and dashboards

  • Data Quality rules and approval workflows before publication.

  • Audit & Lineage across connectors → datasets → stores/APIs/widgets

  • Secrets Management: No credentials in scripts; use platform vault/connection objects.

Performance & Operations

  • Prefer predicate push-down in Datasets; avoid broad select.

  • Use incremental loads and partitioning in the Data Store for scale.

  • Cache read-heavy Datasets where appropriate; schedule Data Preparation jobs during off-peak.

  • Monitor API latency/throughput; scale instances horizontally if needed.

Quick Start

  1. Connect: Create Data Connector(s) (e.g., PostgreSQL + CSV).

  2. Model: Build Datasets (with parameters & DQ checks).

  3. Prepare: Use Data Preparation to clean/transform and publish outputs.

  4. Store/Serve: Materialize in Data Store and/or expose via Data as API.

  5. Visualize: Bind Widgets to Datasets; modify the display and export for building customized insights.

  6. Automate: Wire Micro-Functions and Data Agent triggers for event-driven flows.

  7. Govern: Share, version (VCS), publish to Portal; monitor quality and usage.

Last updated