Extracting a SQL Server table to the data catalog in a job, fine, two tables, madness?


I have used the Glue Job editor to create a simple job that has a SQL Server DB as a source, does a filter by a column (SQL Query) and outputs it into an S3 bucket so I can use Athena to query. It works perfectly.

Now I wanted the same job to do the same with a number of other tables, so I edited the code just duplicating the block that starts at the "job = Job(glueContext)" line but no matter how I do it, the two tables are created and loaded weirdly, e.g. there should be 3 records on one and 2 on the other, they both end up with like 20 records each, with blank values for most of the rows.

What am I doing wrong? How else can I achieve this purpose? I thought of having crawlers for getting the schema and adding it into the data catalog first, but I create one simple crawler and it just spins and spins and fails with "Internal Service Exception". Not sure how else I can achieve this. Thanks for any insights.

asked 2 years ago532 views
1 Answer
Accepted Answer

I discovered something that is probably obvious to everyone but wasn't for me: Athena queries all the files in a folder as if they are part of the same table, i.e. I have to have each table in a separate folder. Duh.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions