Join Tables in AWS

0

I want to join two tables.I have the tables in CSV format stored in S3 bucket 1.Is Amazon Glue studio,the right option? 2.What is the correct procedure? 3.What are the IAM permissions required? 4.Where to see the joined table output? Please throw some light

  • I believe you are looking for a solution to join 2 tables where the data files are in csv. You can follow the below solutions.

    1. Using Athena, you can create tables directly pointing to csv files. Athena supported CSV files. Then you can do the join and create a final table and point to S3 as well.
    2. Using Glue, you can create crawlers and then run the crawler and it will populate the meta data for those tables and you can see the tables in Athena as well and do the join in Athena.
    3. Using Glue crawler, create the meta data, using Glue studio, create a job which will do join and complex transformation ( if you have) and the final data set can be put it in S3 and crawled and see the data in Athena as well. Glue studio is having GUI based development environment and it is easy to use. Hope the above information helped you.
asked a year ago256 views
2 Answers
0

Hello, You can use Amazon Athena to query the csv data from S3 buckets by creating external tables. Keep the csv files belongs to two tables in two separate folders in S3. Then you can create a database / reuse exiting in Athena console and create tables from these csv files.

Alternatively, you can combine AWS Glue and Athena together to achieve this. You can create a crawler in Glue and add data source pointing to S3 buckets and run the crawler which will create the tables automatically for you. These tables will then appear in the Athena console and you can query them using SQLs and join.

Please refer to below where similar use case explained. https://stackoverflow.com/questions/73864748/query-to-multiple-csv-fles-at-s3-through-athena

https://stackoverflow.com/questions/74009011/access-s3-csv-file-in-amazon-athena

Please refer here for IAM for policies needed for accessing data catalogs. Also, you can use the AmazonAthenaFullAccess AWS managed policy assigned.

Additional resources for your reference : https://docs.aws.amazon.com/athena/latest/ug/data-sources-glue.html

https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html

AWS
answered a year ago
0

Athena will store the query result on the temporary S3 path, you can also ask Athena to create a table with the result of the query.
https://docs.aws.amazon.com/athena/latest/ug/ctas.html

If you use Glue Studio you would need to output the result of the join to some sink such an s3 path or table, but if you just want to do the join without needing extra transformations of connections, Athena is probably more practical if you are familiar with SQL

profile pictureAWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions