# Athena Query Executer

**Amazon Athena** is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

**Athena Query Executer** component enables users to read data directly from the external table created in AWS Athena.

{% hint style="info" %}
*<mark style="color:green;">**Please Note:**</mark>**&#x20;\*\*\*\*Please go through the below given demonstration to configure Athena Query  component in the pipeline.***
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6ZP8JhQPMmuTgMyLMcCU%2Fuploads%2FC379cbUNcErgwlRAsh9K%2F2024-06-10-12-56-04%20(online-video-cutter.com).mp4?alt=media&token=977c4067-ecca-4883-a926-93c546b5931f>" %}
***Athena Query Executer***
{% endembed %}

### Configuring meta information of Athena Query Executer component:

* **Region:** Enter the region name where the bucket is located.
* **Access Key:** Enter the AWS Access Key of the AWS account which has to be used.
* **Secret Key:** Enter the AWS Secret Key of the AWS account which has to be used.
* **Table Name:** Enter the name of the external table created in Athena.
* **Database Name:** Name of the database in Athena in which the table has been created.
* **Limit:** Enter the number of records to be read from the table.
* **Data Source:** Enter the Data Source name configured in Athena. Data Source in Athena refers to the location where your data resides, typically an S3 bucket.
* **Workgroup:** Enter the Workgroup name configured in Athena. The Workgroup in Athena is a resource type used to separate query execution and query history between Users, Teams, or Applications running under the same AWS account.
* **Query location:** Enter the path where the results of queries done in the Athena query editor are saved in CSV format. You can find this path under the "***Settings***" tab in the Athena query editor in the AWS console, labeled as "***Query Result Location***".
* **Query:** Enter the Spark SQL query.

<figure><img src="https://2386645923-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6ZP8JhQPMmuTgMyLMcCU%2Fuploads%2F7MntmtjWiQEUQiRH3cBm%2Fimage.png?alt=media&#x26;token=d693a999-1d96-4b71-86ce-fe970e0830ab" alt=""><figcaption><p><em><strong>Meta information tab of Athena Reader</strong></em></p></figcaption></figure>

**Sample Spark SQL query that can be used in Athena Query Executer:**

<pre class="language-sql"><code class="lang-sql">// Selecting data from employee table:
SELECT * FROM employee LIMIT 10;



// insert data into table_2 by selecting data from table_1 in Athena database:
INSERT INTO table_2
SELECT * FROM table_1;

<strong>
</strong>
//Creating a table in Athena database by selecting data from another table in a different database:
CREATE TABLE database_2.table_2
WITH (
    format = 'PARQUET',   --file format(Parquet, AVRO, CSV, JSON etc..)
    external_location = 's3_path' -- Where data will be stored
) AS
SELECT *
FROM database_1.table_1;



//Using CTE queries to get the results:
WITH age_data AS (
    SELECT department,
           CAST(ROUND(AVG(age), 2) AS INT) AS avg_age,
           MIN(age) AS min_age,
           MAX(age) AS max_age
    FROM pipeline.employee_avro1
    GROUP BY department
    ORDER BY department
),
salary_data AS (
    SELECT e.department,
           CAST(ROUND(AVG(e.salary), 2) AS INT) AS average_salary,
           a.avg_age
    FROM pipeline.employee_avro1 e
    JOIN age_data a ON e.department = a.department
    GROUP BY e.department, a.avg_age
),
result_data AS (
    SELECT s.department,
           s.average_salary,
           s.avg_age,
           COUNT(emp_id) AS Total_employees
    FROM pipeline.employee_avro1 e
    JOIN salary_data s ON e.department = s.department
    GROUP BY s.department, s.average_salary, s.avg_age
)
SELECT *
FROM result_data;
</code></pre>
