Athena Query Executer

The Athena Query Executer component allows you to run SQL queries directly on external tables created in Amazon Athena. Amazon Athena is an interactive query service that enables analysis of data stored in Amazon S3 using standard SQL.

With this component, users can connect to Athena, execute queries against defined databases and tables, and retrieve results for downstream processing in the pipeline.

Configuration Sections

The Athena Query Executer configurations are organized into the following sections:

Basic Information
Meta Information
Resource Configuration
Connection Validation

Meta Information Tab

Parameter

Description

Example

Region

AWS region where the S3 bucket is located.

us-east-1

Access Key

AWS Access Key for the account.

AKIA…

Secret Key

AWS Secret Key for the account.

********

Database Name

Name of the Athena database containing the table.

analytics_db

Table Name

Name of the external table in Athena.

employee

Data Source

Data source name configured in Athena (points to S3).

AwsDataCatalog

Workgroup

Athena workgroup name. Segregates query execution and history.

primary

Query Location

Path where Athena query results are stored in S3.

s3://my-query-results/

Limit

Maximum number of records to retrieve.

1000

Query

Spark SQL query to execute on Athena.

SELECT * FROM employee LIMIT 10;

Example Queries

1. Select Data

-- Select data from employee table
SELECT * 
FROM employee 
LIMIT 10;

2. Insert Data

-- Insert data into table_2 by selecting from table_1
INSERT INTO table_2
SELECT * 
FROM table_1;

3. Create Table Using CTAS

-- Create a table in Athena from another database
CREATE TABLE database_2.table_2
WITH (
    format = 'PARQUET',
    external_location = 's3://target-bucket/output/'
) AS
SELECT *
FROM database_1.table_1;

4. Using Common Table Expressions (CTEs)

WITH age_data AS (
    SELECT department,
           CAST(ROUND(AVG(age), 2) AS INT) AS avg_age,
           MIN(age) AS min_age,
           MAX(age) AS max_age
    FROM pipeline.employee_avro1
    GROUP BY department
    ORDER BY department
),
salary_data AS (
    SELECT e.department,
           CAST(ROUND(AVG(e.salary), 2) AS INT) AS average_salary,
           a.avg_age
    FROM pipeline.employee_avro1 e
    JOIN age_data a ON e.department = a.department
    GROUP BY e.department, a.avg_age
),
result_data AS (
    SELECT s.department,
           s.average_salary,
           s.avg_age,
           COUNT(emp_id) AS Total_employees
    FROM pipeline.employee_avro1 e
    JOIN salary_data s ON e.department = s.department
    GROUP BY s.department, s.average_salary, s.avg_age
)
SELECT *
FROM result_data;

Notes

Ensure your IAM user or role has the required permissions for Athena and S3.
The Query Location path must exist in S3 and be configured in the Athena console.
Use Workgroups to isolate query history and enforce cost or resource limits across teams.
For large datasets, set a Limit or apply filters to reduce execution costs.

PreviousAzure Blob Reader (Docker)NextBig Query Reader