# Athena Query Executer

**Amazon Athena** is an interactive query service that easily analyzes data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

**Athena Query Executer** task enables users to read data directly from the external table created in AWS Athena.

{% hint style="success" %}
*<mark style="color:green;">**Please Note:**</mark>**&#x20;\*\*\*\*Please go through the below given demonstration to configure Athena Query Executer in Jobs.***
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6ZP8JhQPMmuTgMyLMcCU%2Fuploads%2FMdLOZUvs6iGi3aJ54zza%2F2024-06-10-13-12-56%20(online-video-cutter.com).mp4?alt=media&token=671ae1b7-e6a0-470f-b366-7b87849fbb6a>" %}
***Athena Query Executer***
{% endembed %}

### Configuring the Meta Information tab fields

* **Region:** Enter the region name where the bucket is located.
* **Access Key:** Enter the AWS Access Key of the account that must be used.
* **Secret Key:** Enter the AWS Secret Key of the account that must be used.
* **Table Name:** Enter the name of the external table created in Athena.
* **Database Name:** Name of the database in Athena in which the table has been created.
* **Limit:** Enter the number of records to be read from the table.
* **Data Source:** Enter the Data Source name configured in Athena. Data Source in Athena refers to your data's location, typically an S3 bucket.
* **Workgroup:** Enter the Workgroup name configured in Athena. The Workgroup in Athena is a resource type to separate query execution and query history between Users, Teams, or Applications running under the same AWS account.
* **Query location:** Enter the path where the results of the queries done in the Athena query editor are saved in the CSV format. Users can find this path under the ***Settings*** tab in the Athena query editor as ***Query Result Location***.
* **Query:** Enter the Spark SQL query.

<figure><img src="https://2386645923-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6ZP8JhQPMmuTgMyLMcCU%2Fuploads%2FolPDYFe7Dl4ImRMCndUZ%2Fimage.png?alt=media&#x26;token=8835e0ff-1063-4715-b0f1-cf81993105a4" alt=""><figcaption><p><em><strong>Meta information tab of Athena Reader</strong></em></p></figcaption></figure>

**Sample Spark SQL query that can be used in Athena Reader:**

```sql
// Selecting data from employee table:
select * from employee limit 10;



// insert data into table_2 by selecting data from table_1 in Athena database:
insert into table_2
select * from table_1;



//Creating a table in Athena database by selecting data from another table in a different database:
CREATE TABLE database_2.table_2
WITH (
    format = 'PARQUET',   --file format(Parquet, AVRO, CSV, JSON etc..)
    external_location = 's3_path' -- Where data will be stored
) AS
SELECT *
FROM database_1.table_1;



//Using CTE queries to get the results:
WITH age_data AS (
    SELECT department,
           CAST(ROUND(AVG(age), 2) AS INT) AS avg_age,
           MIN(age) AS min_age,
           MAX(age) AS max_age
    FROM pipeline.employee_avro1
    GROUP BY department
    ORDER BY department
),
salary_data AS (
    SELECT e.department,
           CAST(ROUND(AVG(e.salary), 2) AS INT) AS average_salary,
           a.avg_age
    FROM pipeline.employee_avro1 e
    JOIN age_data a ON e.department = a.department
    GROUP BY e.department, a.avg_age
),
result_data AS (
    SELECT s.department,
           s.average_salary,
           s.avg_age,
           COUNT(emp_id) AS Total_employees
    FROM pipeline.employee_avro1 e
    JOIN salary_data s ON e.department = s.department
    GROUP BY s.department, s.average_salary, s.avg_age
)
SELECT *
FROM result_data;
```
