Big Query Reader

The Big Query Reader component allows you to efficiently retrieve and process data from Google BigQuery, a fully managed data warehouse on Google Cloud Platform (GCP). It enables applications to run SQL queries on large datasets for data analysis, reporting, and ETL workflows.

Configuration Sections

The Big Query Reader component configurations are organized into the following sections:

  • Basic Information

  • Meta Information

Steps to Configure Big Query Reader

  1. Navigate to the Data Pipeline Editor.

  2. Expand the Readers section in the component palette.

  3. Drag and drop the Big Query Reader component into the workflow editor.

  4. Click the component to open its configuration tabs.

Basic Information Tab

Parameter
Description
Example
Required

Invocation Type

Select execution mode: Real-Time or Batch.

Batch

Yes

Deployment Type

Displays the deployment type (pre-selected).

Kubernetes

No

Batch Size

Maximum number of records processed in one execution cycle (min = 1).

1000

Yes

Failover Event

Failover event for error handling.

retry_event

Optional

Container Image Version

Docker container version for the component (pre-selected).

v1.2.3

No

Meta Information Tab

Parameter
Description
Example

Read Using

Authentication method. Only Service Account is supported.

Service Account

Upload JSON (*)

Upload the service account credentials JSON downloaded from GCP.

credentials.json

Dataset ID

BigQuery dataset ID.

sales_dataset

Table ID

Table ID within the dataset.

transactions

Location (*)

BigQuery dataset location.

US

Limit

Maximum number of records to retrieve.

500

Query

SQL query to execute in BigQuery.

SELECT * FROM project.dataset.table LIMIT 10

Note: Fields marked with (*) are mandatory.

Downloading Service Account Credentials

To obtain the JSON credentials file for BigQuery access:

  1. Go to the Google Cloud Console.

  2. Navigate to IAM & Admin > Service Accounts.

  3. Select or create a service account.

  4. Under Keys, click Add Key > Create New Key.

  5. Select JSON and download the credentials file.

  6. Upload the JSON file in the Meta Information tab.

Example Queries

1. Select All Records

SELECT * 
FROM project_id.dataset_id.table_id 
LIMIT 10;

2. Aggregate Query

SELECT region, SUM(sales) AS total_sales
FROM project_id.dataset_id.transactions
GROUP BY region;

3. Join Query

SELECT c.customer_id, c.name, o.order_id, o.amount
FROM project_id.dataset_id.customers c
JOIN project_id.dataset_id.orders o 
  ON c.customer_id = o.customer_id
LIMIT 100;

Saving the Component Configuration

  • After completing the configuration, click the Save Component in Storage icon.

  • A confirmation message appears once the configuration is successfully saved.

Notes

  • Ensure that the service account used has the required BigQuery Data Viewer and BigQuery Job User roles.

  • Use Limit or filtered queries to optimize performance and reduce costs.

  • Queries must comply with BigQuery SQL syntax.