Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • Create
        • Creating a New Pipeline
          • Adding Components to Canvas
          • Connecting Components
            • Events [Kafka and Data Sync]
          • Memory and CPU Allocations
        • Creating a New Job
          • Page
          • Job Editor Page
          • Spark Job
            • Readers
              • HDFS Reader
              • MongoDB Reader
              • DB Reader
              • S3 Reader
              • Azure Blob Reader
              • ES Reader
              • Sandbox Reader
              • Athena Query Executer
            • Writers
              • HDFS Writer
              • Azure Writer
              • DB Writer
              • ES Writer
              • S3 Writer
              • Sandbox Writer
              • Mongodb Writer
              • Kafka Producer
            • Transformations
          • PySpark Job
          • Python Job
          • Python Job (On demand)
          • Script Executer Job
          • Job Alerts
        • Register as Job
        • Exporting a Script From Data Science Lab
        • Utility
        • Git Sync
      • Overview
        • Jobs
        • Pipeline
      • List Jobs
      • List Pipelines
      • Scheduler
      • Data Channel & Cluster Events
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push & Pull Pipeline
        • Update Pipeline Components
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Delete Pipeline
        • Pipeline Component Configuration
        • Pipeline Failure Alert History
        • Format Flowchart
        • Zoom In/Zoom Out
        • Update Component Version
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Pipeline Monitoring
    • Job Monitoring
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • GCS Reader
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader (Docker)
      • Athena Query Executer
      • Big Query Reader
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • GCS Monitor
      • Sqoop Executer
      • OPC UA
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
      • Synthetic Data Generator
    • Transformations
      • SQL Component
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
      • PySpark Script
    • Scheduler
    • Alerts
      • Alerts
      • Email Component
    • Job Trigger
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
  • Steps to Configure the Big Query Reader Component
  • Basic Information
  • Meta Information
  • Sample Spark SQL query for Big Query Reader:
  • Saving the Component Configuration
Export as PDF
  1. Components
  2. Readers

Big Query Reader

The Big Query Reader Component is designed for efficient data access and retrieval from Google Big Query, a robust data warehousing solution on Google Cloud. It enables applications to execute complex SQL queries and process large datasets seamlessly. This component simplifies data retrieval and processing, making it ideal for data analysis, reporting, and ETL workflows.

All component configurations are classified broadly into the following sections:

Steps to Configure the Big Query Reader Component

  • Navigate to the Data Pipeline Editor.

  • Expand the Reader section provided under the Component Pallet.

  • Drag and drop the Big Query Reader component to the Workflow Editor.

  • Click on the dragged Big Query Reader to get the component properties tabs.

Basic Information

It is the default tab to open for the component while configuring it.

  • Invocation Type: Select an invocation mode from the ‘Real-Time’ or ‘Batch’ using the drop-down menu.

  • Deployment Type: It displays the deployment type for the reader component. This field comes pre-selected.

  • Batch Size (min 1): Provide the maximum number of records to be processed in one execution cycle (Min limit for this field is 1).

  • Failover Event: Select a failover Event from the drop-down menu.

  • Container Image Version: It displays the image version for the docker container. This field comes pre-selected.

Meta Information

  • Open the Meta Information tab and fill in all the connection-specific details for the Big Query Reader.

  • Read using: The 'Service Account' option is available under this field, so select it.

Follow these steps to download a JSON from Big Query:
  • Open the Big Query Console.

  • Click the API & Services inside the Navigation pane.

  • Click on Credentials.

  • Create Credentials (Service Account).

  • After creating the credentials click on the Key.

  • Go to Keys in the header, click the Create New option, and download JSON.

  • Dataset Id: Mention the Dataset ID from Big Query which is to be read.

  • Table Id: Mention the Table ID from Big Query which is to be read.

  • Location (*): Mention the location according to your Project.

  • Limit: Set a limit for the number of records to be read.

  • Query: Enter an SQL Query.

Sample Spark SQL query for Big Query Reader:

Select * from project_id.dataset_id.table_id limit 10

Saving the Component Configuration

  • A notification message appears to inform about the component configuration success.

PreviousAthena Query ExecuterNextWriters

Last updated 5 months ago

Upload JSON(*): Upload credential file downloaded from Google Big Query using the Upload icon. You may need to download a JSON from Big Query to upload it here.

Click the Save Component in Storage icon after doing all the configurations to save the reader component.

Meta Information tab