Syntax error with Glue dynamic data frame

0

I tried to use glue ETL to convert nested json data to parquet. It works but since it does sampling, it couldn't determine data type of some fields and use struct for all possible value which changed the schema.

I tried to use ResolveChoice to force it use one type instead of struct. but I keep getting a syntax error. I followed the doc but still cannot figure it out. Here is my code, could someone help? what's the right syntax? does it support nested data?

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2")

Syntax Error: File "/tmp/g-7d4adc26f6e5bb15ba8d86e7b4fced4ba08ca29d-6847579850030189659/script_2019-04-30-06-11-30.py", line 30 resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2") SyntaxError: invalid syntax

Thanks, Juan

管理員
已提問 5 年前檢視次數 653 次
1 個回答
0
已接受的答案
dynamicframe0_with_cast = datasource0.resolveChoice(
    specs = [("in_reply_to_user_id", "cast:long"),
                    ("user.id", "cast:long"),
                    ("quoted_status.user.id", "cast:long"),
                    ("entities.user_mentions[].id", "cast:long"),
                    ("entities.media[].source_user_id", "cast:long"),
                    ("retweeted_status.user.id", "cast:long"),
                    ("extended_entities.media[].source_user_id", "cast:long")])

When casting embedded collections, "[]" must be used when specifying array types, not ".element" which is in the printSchema output.

AWS
Tom_B
已回答 5 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南