AWS Athena

Learn via video courses
Topics Covered

Overview

Companies collect and train massive amounts of data to make better business decisions. To analyze and compute a large-scale dataset in less time, we require a high-performance computing engine to perform data analytics and engineering. Amazon Athena is a serverless interactive query service that is used to scan data in S3, which provides an unlimited storage capacity.

What is Amazon Athena?

Amazon Athena is a serverless interactive query service launched by AWS in the year 2016.

AWS S3 is the storage service provided by AWS. We can upload the data into S3, then run a query using Amazon Athena.

Amazon Athena uses standard SQL to query the data from S3.

SQL - Standard Querying Language :
Athena makes it easy to run simple queries as well as complex queries without setting up or managing infrastructures such as a data warehouse, data lake, or databases.

Features of Athena

  • Built-in Query editor.
  • We can connect with Athena by the following :
    • Rest API support for SDK in most languages (API).
    • JDBC and ODBS Drivers for Athena.
    • AWS Management Console.
    • CLI.
  • Supports Cloud Trail log query.
  • Support for AVRO and geospatial data, as well as querying via the GrOK filter, has been added.
  • Integration with AWS Glue, which is used for ETL use cases.
  • Integration with S3 inventory and EC2 systems manager inventory.
  • AWS Quicksight integration, which supports business insights use cases.

Benefits of Athena

  • Athena is a serverless and scalable query engine.
  • Athena is ACID compliant.
  • Athena supports nearly all the majorly used formats. CSV, TSV JSON, Parquet, ORC Avro, Logstash, CLoud trail, and Apache web server logs are a few examples.
  • Athena is used to querying all of our s3 bucket logs and files, such as AWS service logs, ALB logs, server logs, and so on.
  • The output of the query will be saved in the desired bucket which we choose.
  • Using IAM policies and ACL, we can secure Athena.

Limitations of Athena

  • If the file size is larger, we should break up the file into partitions. The bigger file size results in a slower query time and higher cost.
  • The maximum query size is 262144 bytes. If our query size is greater than 262144 bytes, we need to split our query to perform the scanning.

Amazon Athena Use Cases

Amazon Athena supports a wide variety of data formats irrespective of vendors.

The Athena architecture provides flexibility so that users can run queries without worrying about infrastructure management.

Log analysis :

  • In AWS, there are plenty of logs that can be stored in S3. For example, application load balancer logs, VPC flow logs, WAF logs, etc. Using Athena, we can run queries across the logs in s3.

Ad-hoc analysis and querying :

  • If we require a single query/request for our data on-demand, we can get the results right away using the Athena service.

Other usecase :
Data analytics, operational reporting, and serverless ETL.

AWS Athena vs Other Services

  • AWS Athena vs Microsoft SQL server
  • AWS Athena vs AWS Redshift
  • AWS Athena vs AWS Elastic MapReduce(EMR)
  • Athena Vs Glue
ServicesAthenaRedshiftMicrosoft Sql ServerEMRGlue
Descriptionserverless interactive query servicemanaged data warehouse serviceIndustry standard RDBMSorchestration tool for data analyticsmanaged ETL service
LanguageSQLSQLSQLJava, Hive, pig, etcAgnostic
OpensourcePresto, and HivePostgreSQLMicrosoft SQL serverHDFSJupyter
Integrated withAWS S3 , Glue , quick sightsAWS S3 , Glue , tableau, etcAzure cloudEC2Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum
Pricing$5 per TB scannedpay as you gofree version and price version availablepay as you gopay as you go
UsecasesServerless ETLDatawarehouse with native analyticsRDBMS for applicationBig data platformsOrchestrate the ETL

Pricing

Pricing is calculated on the amount of data scanned by Athena per query. The number of bytes scanned will be rounded up to the nearest megabyte, with a 10MB minimum per query. The query result will be stored in S3. The S3 storage cost will be incurred as per the storage class we choose.

$5.00 per TB of data scanned

Amazon Athena pricing

Conclusion

  • Amazon Athena provides flexibility to run the query on S3 as the data source.
  • It can be integrated with many AWS services to perform data engineering.
  • It is cheaper than other query services available in the cloud industry.
  • Amazon Athena's design is optimized to run 20 percent faster than other query services.