- 最新
- 最多得票
- 最多評論
To incorporate the ASCII 31 delimiter within a Glue Crawler, follow the steps below:
-
Create a Custom Classifier - Because ASCII 31 is non-printable, you'll need to use it's escape sequence. Under the classifier's "Delimiter" field, enter "\u001F" representing the unit separator.
-
Update your Crawler Configuration - In order to use the custom classifier created above, configure the Glue crawler's "CSV Classifier" settings by selecting the ASCII 31 custom classifier.
-
Modify Glue Job (Depending on Job Code) - If your job code involves delimiter handling logic, make sure it is updated to account for the updated "\u001F" delimiter.
Below are links to the official AWS documentation on writing custom classifiers and adding them to a Glue crawler: https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html
If you have any further questions or encounter further issues, feel free to reach out with more information!
I have similar issue with crawler, so used spark code as below. It may help you
delimiter_char31 = chr(31)
df = spark.read.option("header","false").option('delimiter',delimiter_char31 ).csv("s3://abc/test.txt")
相關內容
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前