- Newest
- Most votes
- Most comments
Thanks for the answers. We find out that problem was on S3 files. They were added in the S3 with wrong encode setting. They should be encoded with UTF-8 so that Glue can read them properly.
Hi,
As such the Glue Crawlers do not have a column limit. During the first crawler run, the crawler reads either the first 1,000 records or the first megabyte of each file to infer the schema. The amount of data read depends on the file format and availability of a valid record. For CSV files, the crawler reads either the first 1000 records or the first 1 MB of data, whatever comes first. If the crawler can't infer the schema after reading the first 1 MB, then the crawler reads up to a maximum of 10 MB of the file, incrementing 1 MB at a time. The crawler compares the schemas inferred from all the subfolders and files, and then creates one or more tables. When a crawler creates a table, it considers the following factors:
- Data compatibility to check if the data is of the same format, compression type, and include path
- Schema similarity to check how closely similar the schemas are in terms partition threshold and the number of different schemas
However, please feel free raise a support case to further troubleshoot the issue and to find its root cause.
Thank you.
I did a test with a CSV file having 63 columns and I see the catalog table is showing all columns. So I can say for sure that there is no 50 column limit.
Having said that, try using a smaller file with say 100 rows and verify manually that you see data for all columns. Thinking out loud here, maybe if you are using files in multiple partitions and not all files have all data for columns populated, possibly you got unlucky and the crawler read only those rows where data was populated for under 50 columns. Or just try creating a completely new crawler and writing to a new database and table.
Relevant content
- asked 2 months ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 10 days ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 7 months ago
Glad you were able to sort this out. I request you close out this question by accepting an answer.