Skip to content

How do I use glue crawler to create a table from a csv uploaded to s3?

0

I am attempting to use a crawler to create a table from a csv file I have uploaded to csv and it isn't creating it. I have tried to query the csv file using s3 object and it doesn't allow me to do that either. How can I fix this?

asked 5 months ago180 views
2 Answers
1
Accepted Answer

Hello.

Does the IAM role configured for the Glue crawler have the necessary IAM policies to read files from S3?
A common reason why tables cannot be created with the Glue crawler is due to incorrect IAM role configuration.
https://repost.aws/questions/QU5A9IETLQQ52BgrjBJZJFrQ/aws-glue-crawler-not-creating-table
https://repost.aws/knowledge-center/glue-crawler-not-adding-new-partitions

Also, when you run a crawl, logs are likely to be output to CloudWatch Logs, so there is a chance that errors will be recorded in those logs.
So, please check the logs output to CloudWatch Logs.
https://docs.aws.amazon.com/glue/latest/dg/logging-and-monitoring.html

EXPERT
answered 5 months ago
  • It was not creating the table due to the not having the correct IAM policies. I thought updating the IAM permissions would do that. Thank you very much!

0

To create a table from a CSV file in S3 using AWS Glue Crawler, follow these steps:

  1. Upload your CSV file to S3:
  • Ensure your CSV file is properly formatted with consistent data types.
  • If you want column names to be preserved, make sure your CSV file has a header row.
  1. Create and configure a Glue Crawler:
  • Go to the AWS Glue Console and create a new crawler.
  • Set the crawler source type to "Data Stores" and specify your S3 bucket path where the CSV file is located.
  • Configure the crawler output to create a new database or use an existing one in the Glue Data Catalog.
  • Ensure the IAM role for the crawler has the necessary permissions to access your S3 bucket.
  1. Run the crawler:
  • Start the crawler job to scan your CSV file.
  • The crawler will infer the schema from your data and create a table in the AWS Glue Data Catalog.

If your crawler isn't creating the table, check these common issues:

  • Permissions: Ensure the IAM role used by the crawler has proper permissions to access your S3 bucket and create tables in the Glue Data Catalog.
  • File format: Make sure your CSV file is properly formatted. If the header row is missing, the crawler will create generic column names like c_0, c_1, etc.
  • File location: Verify the S3 path provided to the crawler is correct and contains your CSV file.
  • CSV format: If your CSV uses non-standard delimiters or has other formatting peculiarities, the crawler might have trouble inferring the schema.

Once the table is created, you can query it using Amazon Athena with standard SQL statements. If you're still having issues, you can also manually create a table definition in Athena using a CREATE TABLE statement that specifies the column names, data types, and location of your CSV file in S3.
Sources
Reading archived S3 objects with Standard storage class - AWS Prescriptive Guidance
Query data from multiple sources in S3 on Athena? | AWS re:Post
Community | What happens when you run a query in Amazon Athena?

answered 5 months ago
EXPERT
reviewed 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.