Big Query Reader
The Big Query Reader component allows you to efficiently retrieve and process data from Google BigQuery, a fully managed data warehouse on Google Cloud Platform (GCP). It enables applications to run SQL queries on large datasets for data analysis, reporting, and ETL workflows.
Configuration Sections
The Big Query Reader component configurations are organized into the following sections:
Basic Information
Meta Information
Steps to Configure Big Query Reader
Navigate to the Data Pipeline Editor.
Expand the Readers section in the component palette.
Drag and drop the Big Query Reader component into the workflow editor.
Click the component to open its configuration tabs.
Basic Information Tab
Invocation Type
Select execution mode: Real-Time
or Batch
.
Batch
Yes
Deployment Type
Displays the deployment type (pre-selected).
Kubernetes
No
Batch Size
Maximum number of records processed in one execution cycle (min = 1).
1000
Yes
Failover Event
Failover event for error handling.
retry_event
Optional
Container Image Version
Docker container version for the component (pre-selected).
v1.2.3
No
Meta Information Tab
Read Using
Authentication method. Only Service Account
is supported.
Service Account
Upload JSON (*)
Upload the service account credentials JSON downloaded from GCP.
credentials.json
Dataset ID
BigQuery dataset ID.
sales_dataset
Table ID
Table ID within the dataset.
transactions
Location (*)
BigQuery dataset location.
US
Limit
Maximum number of records to retrieve.
500
Query
SQL query to execute in BigQuery.
SELECT * FROM project.dataset.table LIMIT 10
Note: Fields marked with
(*)
are mandatory.
Downloading Service Account Credentials
To obtain the JSON credentials file for BigQuery access:
Go to the Google Cloud Console.
Navigate to IAM & Admin > Service Accounts.
Select or create a service account.
Under Keys, click Add Key > Create New Key.
Select JSON and download the credentials file.
Upload the JSON file in the Meta Information tab.
Example Queries
1. Select All Records
SELECT *
FROM project_id.dataset_id.table_id
LIMIT 10;
2. Aggregate Query
SELECT region, SUM(sales) AS total_sales
FROM project_id.dataset_id.transactions
GROUP BY region;
3. Join Query
SELECT c.customer_id, c.name, o.order_id, o.amount
FROM project_id.dataset_id.customers c
JOIN project_id.dataset_id.orders o
ON c.customer_id = o.customer_id
LIMIT 100;
Saving the Component Configuration
After completing the configuration, click the Save Component in Storage icon.
A confirmation message appears once the configuration is successfully saved.
Notes
Ensure that the service account used has the required BigQuery Data Viewer and BigQuery Job User roles.
Use Limit or filtered queries to optimize performance and reduce costs.
Queries must comply with BigQuery SQL syntax.