Athena Query Executer
The Athena Query Executer component allows you to run SQL queries directly on external tables created in Amazon Athena. Amazon Athena is an interactive query service that enables analysis of data stored in Amazon S3 using standard SQL.
With this component, users can connect to Athena, execute queries against defined databases and tables, and retrieve results for downstream processing in the pipeline.
Configuration Sections
The Athena Query Executer configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Meta Information Tab
Region
AWS region where the S3 bucket is located.
us-east-1
Access Key
AWS Access Key for the account.
AKIA…
Secret Key
AWS Secret Key for the account.
********
Database Name
Name of the Athena database containing the table.
analytics_db
Table Name
Name of the external table in Athena.
employee
Data Source
Data source name configured in Athena (points to S3).
AwsDataCatalog
Workgroup
Athena workgroup name. Segregates query execution and history.
primary
Query Location
Path where Athena query results are stored in S3.
s3://my-query-results/
Limit
Maximum number of records to retrieve.
1000
Query
Spark SQL query to execute on Athena.
SELECT * FROM employee LIMIT 10;
Example Queries
1. Select Data
-- Select data from employee table
SELECT *
FROM employee
LIMIT 10;
2. Insert Data
-- Insert data into table_2 by selecting from table_1
INSERT INTO table_2
SELECT *
FROM table_1;
3. Create Table Using CTAS
-- Create a table in Athena from another database
CREATE TABLE database_2.table_2
WITH (
format = 'PARQUET',
external_location = 's3://target-bucket/output/'
) AS
SELECT *
FROM database_1.table_1;
4. Using Common Table Expressions (CTEs)
WITH age_data AS (
SELECT department,
CAST(ROUND(AVG(age), 2) AS INT) AS avg_age,
MIN(age) AS min_age,
MAX(age) AS max_age
FROM pipeline.employee_avro1
GROUP BY department
ORDER BY department
),
salary_data AS (
SELECT e.department,
CAST(ROUND(AVG(e.salary), 2) AS INT) AS average_salary,
a.avg_age
FROM pipeline.employee_avro1 e
JOIN age_data a ON e.department = a.department
GROUP BY e.department, a.avg_age
),
result_data AS (
SELECT s.department,
s.average_salary,
s.avg_age,
COUNT(emp_id) AS Total_employees
FROM pipeline.employee_avro1 e
JOIN salary_data s ON e.department = s.department
GROUP BY s.department, s.average_salary, s.avg_age
)
SELECT *
FROM result_data;
Notes
Ensure your IAM user or role has the required permissions for Athena and S3.
The Query Location path must exist in S3 and be configured in the Athena console.
Use Workgroups to isolate query history and enforce cost or resource limits across teams.
For large datasets, set a Limit or apply filters to reduce execution costs.