- Newest
- Most votes
- Most comments
Yes, you can enforce a specific column type during the crawling process using AWS Glue. You can achieve this by creating a custom schema for your table in the Glue Data Catalog. Here's how you can do it:
-
Create a Custom Schema:
- Go to the AWS Glue console and navigate to the Tables section.
- Find the table corresponding to your Parquet files and click on it to view its details.
- In the table details page, click on the "Edit schema" button.
- Edit the schema to enforce the desired data types for the "x" and "y" columns. You can specify "double" as the data type for these columns.
-
Re-run the Glue Crawler:
- After updating the schema, re-run the Glue Crawler to re-crawl your S3 data and update the metadata in the Glue Data Catalog.
- The Glue Crawler will now enforce the specified data types for the "x" and "y" columns during the crawling process.
-
Query in Athena:
- Once the crawling process is complete, you can query the data in Athena. The columns "x" and "y" will now have the data type "double" enforced, ensuring consistency across all files.
By enforcing the column types in the Glue Data Catalog, you ensure consistency in the data types when querying the data in Athena, even if individual Parquet files have variations in their schema. This approach allows you to handle variations in the data seamlessly without needing to manually inspect each file.
I did that but I still have the error "HIVE_BAD_DATA: Field location_id's type INT64 in parquet file s3://xxxxxxxxxxxx.parquet is incompatible with type double defined in table schema"
Relevant content
- asked 8 months ago
- asked 3 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 years ago
submitted an answer, i hope you accept it !