- Newest
- Most votes
- Most comments
Hello, Thank you very much for your questions. Based on your questions and code, it seems like the issue is related to how the Glue Crawler is interpreting the schema of the new CSV files that are being added to the S3 bucket. In order to solve the problem, you could try the following options:
-
Check the Crawler Configuration: Ensure that the Crawler is configured correctly to handle the schema changes in the new CSV files. By default, the Crawler tries to infer the schema from the first few files it encounters. If the new files have a different schema, the Crawler may not pick it up correctly. You can try adjusting the Crawler's settings, such as increasing the number of files it samples for schema inference or specifying a custom classifier to handle the schema changes.
-
Use the "Subsequent crawler runs to Crawl new sub-folders only" option
-
Use a Lambda function to fix the schema: As you suggested, you could create an AWS Lambda function that triggers whenever a new file is uploaded to the S3 bucket. This Lambda function could read the file, fix the schema if necessary, and then write the file back to the same location with the correct schema.
Please access the following resources for further information:
Relevant content
- asked 2 years ago
- Accepted Answerasked a year ago
- asked 3 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 6 months ago